RE: Solr commits before documents are added

2009-10-20 Thread Feak, Todd
Any chance you are indexing to a Master, then synching to a Slave and you 
aren't seeing those last 20 on the Slave?

There is an issue with synching between Master and Slave that we've 
experienced. If the last commit is very small (20 sounds possible!) it can 
occur in the same clock second on that machine. The Master will see the commit 
and its index will show the data fine. However, the Slave cannot see the second 
commit on the same clock second, so it will be missing the last 20 due to sync 
between the two.

It's an edge case, but we ran into it recently.

-Todd

-Original Message-
From: SharmilaR [mailto:sranganat...@library.rochester.edu] 
Sent: Monday, October 19, 2009 1:07 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr commits before documents are added


Solr version is 1.3
I am indexing total of 1.4 million documents. Yes, I commit(waitFlush="true"
waitSearcher="true") every 100k documents and then one at the end. 
I have a counter next to addDoc(SolrDocument) statement to keep track of
number of documents added. When I query Solr after commit,  the total number
of documents returned does not match the number of documents added. This
happens only when I index millions of documents and not when I index like
500 documents. In this case, I know its the last 20 documents which are not
committed because each document has a field 'RECORD_ID' which is assigned
sequential number(in java code). When I query Solr using Solr admin
interface, the documents with last 20 RECORD_ID are missing.(example the
last id is 999,980 instead of 1,000,000)

- Sharmila


Feak, Todd wrote:
> 
> A few questions to help the troubleshooting.
> 
> Solr version #?
> 
> Is there just 1 commit through Solrj for the millions of documents? 
> 
> Or do you do it on a regular interval (every 100k documents for example)
> and then one at the end to be sure?
> 
> How are you observing that the last few didn't make it in? Are you looking
> at a slave or master?
> 
> -Todd
> 
> 
-Original Message-
From: Ranganathan, Sharmila [mailto:sranganat...@library.rochester.edu] 
Sent: Monday, October 19, 2009 9:19 AM
To: solr-user@lucene.apache.org
Subject: Solr commits before documents are added

Hi,

My application indexes huge number of documents(like in millions). Below
is the snapshot of my code where I add all documents to Solr, and then
at last issue commit command. I use Solrj. I find that last few
documents are not  committed to Solr. Is this because adding documents
to Solr took longer time and it reached commit command even before it
finished adding documents? Is there are way to ensure that solr waits
for all documents to be added and then commits? Please advise me how to
solve this issue.

 

For loop

solrServer.add(doc);   // Add document to Solr

End for loop

solrServer.commit();  // Commit to Solr

 

 

Thanks,

Sharmila





-- 
View this message in context: 
http://www.nabble.com/Solr-commits-before-documents-are-added-tp25961191p25964770.html
Sent from the Solr - User mailing list archive at Nabble.com.





RE: Solr commits before documents are added

2009-10-19 Thread Feak, Todd
A few questions to help the troubleshooting.

Solr version #?

Is there just 1 commit through Solrj for the millions of documents? 

Or do you do it on a regular interval (every 100k documents for example) and 
then one at the end to be sure?

How are you observing that the last few didn't make it in? Are you looking at a 
slave or master?

-Todd
-Original Message-
From: Ranganathan, Sharmila [mailto:sranganat...@library.rochester.edu] 
Sent: Monday, October 19, 2009 9:19 AM
To: solr-user@lucene.apache.org
Subject: Solr commits before documents are added

Hi,

My application indexes huge number of documents(like in millions). Below
is the snapshot of my code where I add all documents to Solr, and then
at last issue commit command. I use Solrj. I find that last few
documents are not  committed to Solr. Is this because adding documents
to Solr took longer time and it reached commit command even before it
finished adding documents? Is there are way to ensure that solr waits
for all documents to be added and then commits? Please advise me how to
solve this issue.

 

For loop

solrServer.add(doc);   // Add document to Solr

End for loop

solrServer.commit();  // Commit to Solr

 

 

Thanks,

Sharmila




RE: Solr Timeouts

2009-10-06 Thread Feak, Todd
i Fernandez-Kincade
>  wrote:
>> I'm fairly certain that all of the indexing jobs are calling SOLR with 
>> commit=false. They all construct the indexing URLs using a CLR function I 
>> wrote, which takes in a Commit parameter, which is always set to false.
>>
>> Also, I don't see any calls to commit in the Tomcat logs (whereas normally 
>> when I make a commit call I do).
>>
>> This suggests that Solr is doing it automatically, but the extract handler 
>> doesn't seem to be the problem:
>>  > class="org.apache.solr.handler.extraction.ExtractingRequestHandler" 
>> startup="lazy">
>>
>>  ignored_
>>  fileData
>>
>>  
>>
>>
>> There is no external config file specified, and I don't see anything about 
>> commits here.
>>
>> I've tried setting up more detailed indexer logging but haven't been able to 
>> get it to work:
>> true
>>
>> I tried relative and absolute paths, but no dice so far.
>>
>> Any other ideas?
>>
>> -Gio.
>>
>> -Original Message-
>> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
>> Sent: Monday, October 05, 2009 12:52 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Timeouts
>>
>>> This is what one of my SOLR requests look like:
>>>
>>> http://titans:8080/solr/update/extract/?literal.versionId=684936&literal.filingDate=1997-12-04T00:00:00Z&literal.formTypeId=95&literal.companyId=3567904&literal.sourceId=0&resource.name=684936.txt&commit=false
>>
>> Have you verified that all of your indexing jobs (you said you had 4
>> or 5) have commit=false?
>>
>> Also make sure that your extract handler doesn't have a default of
>> something that could cause a commit - like commitWithin or something.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>>
>> On Mon, Oct 5, 2009 at 12:44 PM, Giovanni Fernandez-Kincade
>>  wrote:
>>> Is there somewhere other than solrConfig.xml that the autoCommit feature is 
>>> enabled? I've looked through that file and found autocommit to be commented 
>>> out:
>>>
>>>
>>>
>>> 
>>>
>>>
>>>
>>
>>>
>>>
>>>
>>> -Original Message-
>>> From: Feak, Todd [mailto:todd.f...@smss.sony.com]
>>> Sent: Monday, October 05, 2009 12:40 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: RE: Solr Timeouts
>>>
>>>
>>>
>>> Actually, ignore my other response.
>>>
>>>
>>>
>>> I believe you are committing, whether you know it or not.
>>>
>>>
>>>
>>> This is in your provided stack trace
>>>
>>> org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor,
>>>  SolrParams, boolean) 
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest,
>>>  SolrQueryResponse)
>>>
>>>
>>>
>>> I think Yonik gave you additional information for how to make it faster.
>>>
>>>
>>>
>>> -Todd
>>>
>>>
>>>
>>> -Original Message-
>>>
>>> From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com]
>>>
>>> Sent: Monday, October 05, 2009 9:30 AM
>>>
>>> To: solr-user@lucene.apache.org
>>>
>>> Subject: RE: Solr Timeouts
>>>
>>>
>>>
>>> I'm not committing at all actually - I'm waiting for all 6 million to be 
>>> done.
>>>
>>>
>>>
>>> -Original Message-
>>>
>>> From: Feak, Todd [mailto:todd.f...@smss.sony.com]
>>>
>>> Sent: Monday, October 05, 2009 12:10 PM
>>>
>>> To: solr-user@lucene.apache.org
>>>
>>> Subject: RE: Solr Timeouts
>>>
>>>
>>>
>>> How often are you committing?
>>>
>>>
>>>
>>> Every time you commit, Solr will close the old index and open the new one. 
>>> If you are doing this in parallel from multiple jobs (4-5 you mention) then 
>>> eventually the server gets behind and you start to pile up commit requests. 
>>> Once this starts to happen, it will cascade out of control if the rate of 
>>> commits isn't slowed.
>>>
>>>
>>>
>>> -Todd
>&g

RE: using regular expressions in solr query

2009-10-06 Thread Feak, Todd
Any particular reason for the double quotes in the 2nd and 3rd query example, 
but not the 1st, or is this just an artifact of your email?

-Todd

-Original Message-
From: Rakhi Khatwani [mailto:rkhatw...@gmail.com] 
Sent: Tuesday, October 06, 2009 2:26 AM
To: solr-user@lucene.apache.org
Subject: using regular expressions in solr query

Hi,
  i have an example in which i want to use a regular expression in my
solr query:
for example: suppose i wanna search on a sample :
raakhi rajnish ninad goureya sheetal
ritesh rajnish ninad goureya sheetal
where my content field is of type text
when i type in
QUERY:   content:raa*
RESPONSE :   raakhi rajnish ninad goureya sheetal
QUERY: content:"ra*"
RESPONSE: 0 results
coz of this i am facing problems with the next query:
QUERY: content: "r* rajnish"
RESPONSE: 0 results
which should ideally return both the results.
any pointers??
Regards,
Raakhi



RE: cleanup old index directories on slaves

2009-10-05 Thread Feak, Todd
We use the snapcleaner script.

http://wiki.apache.org/solr/SolrCollectionDistributionScripts#snapcleaner

Will that do the job?

-Todd

-Original Message-
From: solr jay [mailto:solr...@gmail.com] 
Sent: Monday, October 05, 2009 1:58 PM
To: solr-user@lucene.apache.org
Subject: cleanup old index directories on slaves

Is there a reliable way to safely clean up index directories? This is needed
mainly on slave side as in several situations, an old index directory is
replaced with a new one, and I'd like to remove those that are no longer in
use.

Thanks,

-- 
J



RE: About SolrJ for XML

2009-10-05 Thread Feak, Todd
It looks like you have some confusion about queries vs. facets. You may want to 
look at the Solr wiki reqarding facets a bit. In the meanwhile, if you just 
want to query for that field containing "21"...

I would suggest that you don't set the query type, don't set any facet fields, 
and only set the query. Set the query to "field:21" where "field" should be 
replaced with the fieldname that has a "21" in it.

For example, if the field name is foo, try this instead:

SolrQuery query = new SolrQuery();
query.setQuery("foo:21");  
QueryResponse qr = server.query(query);
SolrDocumentList sdl = qr.getResults();


To delve into more detail, what your original code did was query for a "21" in 
the default field (check your solrconfig.xml to see what is default). It then 
faceted the query results by the "id" field and "weight" fields. Because there 
were no search results at all, the faceting request didn't do anything. I'm not 
sure why you switched the query type to DisMax, as you didn't issue a query 
that would leverage it.

-Todd

-Original Message-
From: Chaitali Gupta [mailto:chaitaligupt...@yahoo.com] 
Sent: Monday, October 05, 2009 2:05 PM
To: solr-user@lucene.apache.org
Subject: About SolrJ for XML 

Hi, 

I am new in Solr. I am using Solr version 1.3 

I would like to index XML files using SolrJ API. I have gone through solr 
mailing list's emails and have been able to index XML files. But when I try to 
query on those files using SolrJ, I get no output. Especially, I do not find 
correct results for numeric fields that I have specified in the schema.xml file 
in the config directory for my XML files. I have made those fields "indexed" 
and "stored" by using "indexed=true" and "stored=true". I am using the 
following code  in order to search for data (In the following code, I am trying 
to find out weight with values 21) - 

 SolrQuery query = new SolrQuery();
 query.setQueryType("dismax");
 query.setFacet(true);
 query.addFacetField("id");
 query.addFacetField("weight");
 query.setQuery("21");  
 QueryResponse qr = server.query(query);
 SolrDocumentList sdl = qr.getResults();

Am I doing anything wrong? Why do I get zero results even when there is a XML 
file with weight being 21. What are the other ways of doing the numeric queries 
in SolrJ ? 

Also, I would like to know how do I get the exact size of the index being 
generated by Solr. I am using a single machine to generate and query the index. 
When I look at the index directory, I see that the size of the files in the 
index directory is much lesser than the size reported by the "total" column in 
"ls -lh" command. Does anyone have any idea why is it the case? 

Thanks in advance. Waiting for your reply soon. 

Regards
Chaitali 



  



RE: Solr Timeouts

2009-10-05 Thread Feak, Todd
Actually, ignore my other response. 

I believe you are committing, whether you know it or not. 

This is in your provided stack trace
org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor,
 SolrParams, boolean) 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest,
 SolrQueryResponse)

I think Yonik gave you additional information for how to make it faster.

-Todd

-Original Message-
From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] 
Sent: Monday, October 05, 2009 9:30 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Timeouts

I'm not committing at all actually - I'm waiting for all 6 million to be done. 

-Original Message-----
From: Feak, Todd [mailto:todd.f...@smss.sony.com] 
Sent: Monday, October 05, 2009 12:10 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr Timeouts

How often are you committing?

Every time you commit, Solr will close the old index and open the new one. If 
you are doing this in parallel from multiple jobs (4-5 you mention) then 
eventually the server gets behind and you start to pile up commit requests. 
Once this starts to happen, it will cascade out of control if the rate of 
commits isn't slowed.

-Todd


From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com]
Sent: Monday, October 05, 2009 9:04 AM
To: solr-user@lucene.apache.org
Subject: Solr Timeouts

Hi,
I'm attempting to index approximately 6 million HTML/Text files using SOLR 
1.4/Tomcat6 on Windows Server 2003 x64. I'm running 64 bit Tomcat and JVM. I've 
fired up 4-5 different jobs that are making indexing requests using the 
ExtractionRequestHandler, and everything works well for about 30-40 minutes, 
after which all indexing requests start timing out. I profiled the server and 
found that all of the threads are getting blocked by this call to flush the 
Lucene index to disk (see below).

This leads me to a few questions:

1.   Is this normal?

2.   Can I reduce the frequency with which this happens somehow? I've 
greatly increased the indexing options in SolrConfig.xml (attached here) to no 
avail.

3.   During these flushes, resource utilization (CPU, I/O, Memory 
Consumption) is significantly down compared to when requests are being handled. 
Is there any way to make this index go faster? I have plenty of bandwidth on 
the machine.

I appreciate any insight you can provide. We're currently using MS SQL 2005 as 
our full-text solution and are pretty much miserable. So far SOLR has been a 
great experience.

Thanks,
Gio.

http-8080-Processor21 [RUNNABLE] CPU time: 9:51
java.io.RandomAccessFile.seek(long)
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[],
 int, int)
org.apache.lucene.store.BufferedIndexInput.refill()
org.apache.lucene.store.BufferedIndexInput.readByte()
org.apache.lucene.store.IndexInput.readVInt()
org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
org.apache.lucene.index.SegmentTermEnum.next()
org.apache.lucene.index.SegmentTermEnum.scanTo(Term)
org.apache.lucene.index.TermInfosReader.get(Term, boolean)
org.apache.lucene.index.TermInfosReader.get(Term)
org.apache.lucene.index.SegmentTermDocs.seek(Term)
org.apache.lucene.index.DocumentsWriter.applyDeletes(IndexReader, int)
org.apache.lucene.index.DocumentsWriter.applyDeletes(SegmentInfos)
org.apache.lucene.index.IndexWriter.applyDeletes()
org.apache.lucene.index.IndexWriter.doFlushInternal(boolean, boolean)
org.apache.lucene.index.IndexWriter.doFlush(boolean, boolean)
org.apache.lucene.index.IndexWriter.flush(boolean, boolean, boolean)
org.apache.lucene.index.IndexWriter.closeInternal(boolean)
org.apache.lucene.index.IndexWriter.close(boolean)
org.apache.lucene.index.IndexWriter.close()
org.apache.solr.update.SolrIndexWriter.close()
org.apache.solr.update.DirectUpdateHandler2.closeWriter()
org.apache.solr.update.DirectUpdateHandler2.commit(CommitUpdateCommand)
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(CommitUpdateCommand)
org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor,
 SolrParams, boolean)
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest,
 SolrQueryResponse)
org.apache.solr.handler.RequestHandlerBase.handleRequest(SolrQueryRequest, 
SolrQueryResponse)
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(SolrQueryRequest,
 SolrQueryResponse)
org.apache.solr.core.SolrCore.execute(SolrRequestHandler, SolrQueryRequest, 
SolrQueryResponse)
org.apache.solr.servlet.SolrDispatchFilter.execute(HttpServletRequest, 
SolrRequestHandler, SolrQueryRequest, SolrQueryResponse)
org.apache.solr.servlet.SolrDispatchFilter.doFilter(ServletRequest, 
ServletResponse, FilterChain)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ServletRequest,
 ServletResponse)
org.apache.catalina.core.Applic

RE: Solr Timeouts

2009-10-05 Thread Feak, Todd
Ok. Guess that isn't a problem. :)

A second consideration... I could see lock contention being an issue with 
multiple clients indexing at once. Is there any disadvantage to serializing the 
clients to remove lock contention?

-Todd

-Original Message-
From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] 
Sent: Monday, October 05, 2009 9:30 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Timeouts

I'm not committing at all actually - I'm waiting for all 6 million to be done. 

-Original Message-----
From: Feak, Todd [mailto:todd.f...@smss.sony.com] 
Sent: Monday, October 05, 2009 12:10 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr Timeouts

How often are you committing?

Every time you commit, Solr will close the old index and open the new one. If 
you are doing this in parallel from multiple jobs (4-5 you mention) then 
eventually the server gets behind and you start to pile up commit requests. 
Once this starts to happen, it will cascade out of control if the rate of 
commits isn't slowed.

-Todd


From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com]
Sent: Monday, October 05, 2009 9:04 AM
To: solr-user@lucene.apache.org
Subject: Solr Timeouts

Hi,
I'm attempting to index approximately 6 million HTML/Text files using SOLR 
1.4/Tomcat6 on Windows Server 2003 x64. I'm running 64 bit Tomcat and JVM. I've 
fired up 4-5 different jobs that are making indexing requests using the 
ExtractionRequestHandler, and everything works well for about 30-40 minutes, 
after which all indexing requests start timing out. I profiled the server and 
found that all of the threads are getting blocked by this call to flush the 
Lucene index to disk (see below).

This leads me to a few questions:

1.   Is this normal?

2.   Can I reduce the frequency with which this happens somehow? I've 
greatly increased the indexing options in SolrConfig.xml (attached here) to no 
avail.

3.   During these flushes, resource utilization (CPU, I/O, Memory 
Consumption) is significantly down compared to when requests are being handled. 
Is there any way to make this index go faster? I have plenty of bandwidth on 
the machine.

I appreciate any insight you can provide. We're currently using MS SQL 2005 as 
our full-text solution and are pretty much miserable. So far SOLR has been a 
great experience.

Thanks,
Gio.

http-8080-Processor21 [RUNNABLE] CPU time: 9:51
java.io.RandomAccessFile.seek(long)
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[],
 int, int)
org.apache.lucene.store.BufferedIndexInput.refill()
org.apache.lucene.store.BufferedIndexInput.readByte()
org.apache.lucene.store.IndexInput.readVInt()
org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
org.apache.lucene.index.SegmentTermEnum.next()
org.apache.lucene.index.SegmentTermEnum.scanTo(Term)
org.apache.lucene.index.TermInfosReader.get(Term, boolean)
org.apache.lucene.index.TermInfosReader.get(Term)
org.apache.lucene.index.SegmentTermDocs.seek(Term)
org.apache.lucene.index.DocumentsWriter.applyDeletes(IndexReader, int)
org.apache.lucene.index.DocumentsWriter.applyDeletes(SegmentInfos)
org.apache.lucene.index.IndexWriter.applyDeletes()
org.apache.lucene.index.IndexWriter.doFlushInternal(boolean, boolean)
org.apache.lucene.index.IndexWriter.doFlush(boolean, boolean)
org.apache.lucene.index.IndexWriter.flush(boolean, boolean, boolean)
org.apache.lucene.index.IndexWriter.closeInternal(boolean)
org.apache.lucene.index.IndexWriter.close(boolean)
org.apache.lucene.index.IndexWriter.close()
org.apache.solr.update.SolrIndexWriter.close()
org.apache.solr.update.DirectUpdateHandler2.closeWriter()
org.apache.solr.update.DirectUpdateHandler2.commit(CommitUpdateCommand)
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(CommitUpdateCommand)
org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor,
 SolrParams, boolean)
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest,
 SolrQueryResponse)
org.apache.solr.handler.RequestHandlerBase.handleRequest(SolrQueryRequest, 
SolrQueryResponse)
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(SolrQueryRequest,
 SolrQueryResponse)
org.apache.solr.core.SolrCore.execute(SolrRequestHandler, SolrQueryRequest, 
SolrQueryResponse)
org.apache.solr.servlet.SolrDispatchFilter.execute(HttpServletRequest, 
SolrRequestHandler, SolrQueryRequest, SolrQueryResponse)
org.apache.solr.servlet.SolrDispatchFilter.doFilter(ServletRequest, 
ServletResponse, FilterChain)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ServletRequest,
 ServletResponse)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ServletRequest, 
ServletResponse)
org.apache.catalina.core.StandardWrapperValve.invoke(Request, Response)
org.apache.catalina.core.

RE: Solr Timeouts

2009-10-05 Thread Feak, Todd
How often are you committing?

Every time you commit, Solr will close the old index and open the new one. If 
you are doing this in parallel from multiple jobs (4-5 you mention) then 
eventually the server gets behind and you start to pile up commit requests. 
Once this starts to happen, it will cascade out of control if the rate of 
commits isn't slowed.

-Todd


From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com]
Sent: Monday, October 05, 2009 9:04 AM
To: solr-user@lucene.apache.org
Subject: Solr Timeouts

Hi,
I'm attempting to index approximately 6 million HTML/Text files using SOLR 
1.4/Tomcat6 on Windows Server 2003 x64. I'm running 64 bit Tomcat and JVM. I've 
fired up 4-5 different jobs that are making indexing requests using the 
ExtractionRequestHandler, and everything works well for about 30-40 minutes, 
after which all indexing requests start timing out. I profiled the server and 
found that all of the threads are getting blocked by this call to flush the 
Lucene index to disk (see below).

This leads me to a few questions:

1.   Is this normal?

2.   Can I reduce the frequency with which this happens somehow? I've 
greatly increased the indexing options in SolrConfig.xml (attached here) to no 
avail.

3.   During these flushes, resource utilization (CPU, I/O, Memory 
Consumption) is significantly down compared to when requests are being handled. 
Is there any way to make this index go faster? I have plenty of bandwidth on 
the machine.

I appreciate any insight you can provide. We're currently using MS SQL 2005 as 
our full-text solution and are pretty much miserable. So far SOLR has been a 
great experience.

Thanks,
Gio.

http-8080-Processor21 [RUNNABLE] CPU time: 9:51
java.io.RandomAccessFile.seek(long)
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[],
 int, int)
org.apache.lucene.store.BufferedIndexInput.refill()
org.apache.lucene.store.BufferedIndexInput.readByte()
org.apache.lucene.store.IndexInput.readVInt()
org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
org.apache.lucene.index.SegmentTermEnum.next()
org.apache.lucene.index.SegmentTermEnum.scanTo(Term)
org.apache.lucene.index.TermInfosReader.get(Term, boolean)
org.apache.lucene.index.TermInfosReader.get(Term)
org.apache.lucene.index.SegmentTermDocs.seek(Term)
org.apache.lucene.index.DocumentsWriter.applyDeletes(IndexReader, int)
org.apache.lucene.index.DocumentsWriter.applyDeletes(SegmentInfos)
org.apache.lucene.index.IndexWriter.applyDeletes()
org.apache.lucene.index.IndexWriter.doFlushInternal(boolean, boolean)
org.apache.lucene.index.IndexWriter.doFlush(boolean, boolean)
org.apache.lucene.index.IndexWriter.flush(boolean, boolean, boolean)
org.apache.lucene.index.IndexWriter.closeInternal(boolean)
org.apache.lucene.index.IndexWriter.close(boolean)
org.apache.lucene.index.IndexWriter.close()
org.apache.solr.update.SolrIndexWriter.close()
org.apache.solr.update.DirectUpdateHandler2.closeWriter()
org.apache.solr.update.DirectUpdateHandler2.commit(CommitUpdateCommand)
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(CommitUpdateCommand)
org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor,
 SolrParams, boolean)
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest,
 SolrQueryResponse)
org.apache.solr.handler.RequestHandlerBase.handleRequest(SolrQueryRequest, 
SolrQueryResponse)
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(SolrQueryRequest,
 SolrQueryResponse)
org.apache.solr.core.SolrCore.execute(SolrRequestHandler, SolrQueryRequest, 
SolrQueryResponse)
org.apache.solr.servlet.SolrDispatchFilter.execute(HttpServletRequest, 
SolrRequestHandler, SolrQueryRequest, SolrQueryResponse)
org.apache.solr.servlet.SolrDispatchFilter.doFilter(ServletRequest, 
ServletResponse, FilterChain)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ServletRequest,
 ServletResponse)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ServletRequest, 
ServletResponse)
org.apache.catalina.core.StandardWrapperValve.invoke(Request, Response)
org.apache.catalina.core.StandardContextValve.invoke(Request, Response)
org.apache.catalina.core.StandardHostValve.invoke(Request, Response)
org.apache.catalina.valves.ErrorReportValve.invoke(Request, Response)
org.apache.catalina.core.StandardEngineValve.invoke(Request, Response)
org.apache.catalina.connector.CoyoteAdapter.service(Request, Response)
org.apache.coyote.http11.Http11Processor.process(InputStream, OutputStream)
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(TcpConnection,
 Object[])
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(Socket, TcpConnection, 
Object[])
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(Object[])
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run()
java.lang.Thread.run()



RE: NGramTokenFilter behaviour

2009-09-30 Thread Feak, Todd
My understanding of a NGramTokenizing is to help with languages that don't 
necessarily contain spaces as a word delimiter (Japanese et al). In that case 
bi-gramming is used to find words contained within a stream of unbroken 
characters. In that case, you want to find all of the bi-grams that you input 
for the search query. An "OR" wouldn't work as well, as you would find tons of 
hits.

-Todd Feak

-Original Message-
From: aod...@gmail.com [mailto:aod...@gmail.com] 
Sent: Wednesday, September 30, 2009 10:54 AM
To: solr-user@lucene.apache.org
Subject: NGramTokenFilter behaviour

If I index the following text: "I live in Dublin Ireland where
Guinness is brewed"

Then search for: duvlin

Should Solr return a match?

In the admin interface under the analysis section, Solr highlights
some NGram matches?

When I enter the following query string into my browser address bar, I
get 0 results?

http://localhost:8983/solr/select/?q=duvlin&debugQuery=true

Nor do I get results for dub, dubli, ublin, dublin (du does return a result).

I also notice when I use debugQuery=true, the parsed query is a
PhraseQuery. This doesn't make sense to me, as surely the point of the
NGram is to use a Boolean OR between each Gram??

However, if I don't use an NGramFilterFactory at query time, I can get
results for: dub, ublin, du, but not duvlin.


  



  


Can someone please clarify what the purpose of the
NGramFilter/tokenizer is, if not to allow for
misspellings/morphological variation and also, what the correct
configuration is in terms of use at index/query time.

Any help appreciated!

Aodh.

Solr 1.3, JDK 1.6




RE: Re: WebLogic 10 Compatibility Issue - StackOverflowError

2009-01-30 Thread Feak, Todd
Are the issues ran into due to non-standard code in Solr, or is there
some WebLogic inconsistency?

-Todd Feak

-Original Message-
From: news [mailto:n...@ger.gmane.org] On Behalf Of Ilan Rabinovitch
Sent: Friday, January 30, 2009 1:11 AM
To: solr-user@lucene.apache.org
Subject: Re: WebLogic 10 Compatibility Issue - StackOverflowError

I created a wiki page shortly after posting to the list:

http://wiki.apache.org/solr/SolrWeblogic

 From what we could tell Solr itself was fully functional, it was only 
the admin tools that were failing.

Regards,
Ilan Rabinovitch

---
SCALE 7x: 2009 Southern California Linux Expo
Los Angeles, CA
http://www.socallinuxexpo.org


On 1/29/09 4:34 AM, Mark Miller wrote:
> We should get this on the wiki.
>
> - Mark
>
>
> Ilan Rabinovitch wrote:
>>
>> We were able to deploy Solr 1.3 on Weblogic 10.0 earlier today. Doing
>> so required two changes:
>>
>> 1) Creating a weblogic.xml file in solr.war's WEB-INF directory. The
>> weblogic.xml file is required to disable Solr's filter on FORWARD.
>>
>> The contents of weblogic.xml should be:
>>
>> 
>> > xmlns="http://www.bea.com/ns/weblogic/90";
>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>> xsi:schemaLocation="http://www.bea.com/ns/weblogic/90
>> http://www.bea.com/ns/weblogic/90/weblogic-web-app.xsd";>
>>
>> 
>>
>>
false
>>
>> 
>>
>> 
>>
>>
>> 2) Remove the pageEncoding attribute from line 1 of
solr/admin/header.jsp
>>
>>
>>
>>
>> On 1/17/09 2:02 PM, KSY wrote:
>>> I hit a major roadblock while trying to get Solr 1.3 running on
WebLogic
>>> 10.0.
>>>
>>> A similar message was posted before - (
>>>
http://www.nabble.com/Solr-1.3-stack-overflow-when-accessing-solr-admin-
page-td20157873.html
>>>
>>>
http://www.nabble.com/Solr-1.3-stack-overflow-when-accessing-solr-admin-
page-td20157873.html
>>>
>>> ) - but it seems like it hasn't been resolved yet, so I'm re-posting
>>> here.
>>>
>>> I am sure I configured everything correctly because it's working
fine on
>>> Resin.
>>>
>>> Has anyone successfully run Solr 1.3 on WebLogic 10.0 or higher?
Thanks.
>>>
>>>
>>> SUMMARY:
>>>
>>> When accessing /solr/admin page, StackOverflowError occurs due to an
>>> infinite recursion in SolrDispatchFilter
>>>
>>>
>>> ENVIRONMENT SETTING:
>>>
>>> Solr 1.3.0
>>> WebLogic 10.0
>>> JRockit JVM 1.5
>>>
>>>
>>> ERROR MESSAGE:
>>>
>>> SEVERE: javax.servlet.ServletException: java.lang.StackOverflowError
>>> at
>>>
weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche
rImpl.java:276)
>>>
>>> at
>>>
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:273)
>>>
>>> at
>>>
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:
42)
>>>
>>> at
>>>
weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDis
patcherImpl.java:526)
>>>
>>> at
>>>
weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche
rImpl.java:261)
>>>
>>> at
>>>
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:273)
>>>
>>> at
>>>
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:
42)
>>>
>>> at
>>>
weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDis
patcherImpl.java:526)
>>>
>>> at
>>>
weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche
rImpl.java:261)
>>>
>>> at
>>>
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:273)
>>>
>>> at
>>>
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:
42)
>>>
>>> at
>>>
weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDis
patcherImpl.java:526)
>>>
>>> at
>>>
weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche
rImpl.java:261)
>>>
>>> at
>>>
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:273)
>>>
>>>
>>
>>
>
>






RE: warmupTime : 0

2009-01-29 Thread Feak, Todd
This usually represents anything less then 8ms if you are on a Windows
system. The granularity on timing on Windows systems is around 16ms.

-Todd feak

-Original Message-
From: sunnyfr [mailto:johanna...@gmail.com] 
Sent: Thursday, January 29, 2009 9:13 AM
To: solr-user@lucene.apache.org
Subject: warmupTime : 0


Hi,

Do you think it's normal to have warmupTime : 0 ??

searcher  
class:  org.apache.solr.search.SolrIndexSearcher  
version:1.0  
description:index searcher  
stats:  searcherName : searc...@6f7cf6b6 main
caching : true
numDocs : 8207035
maxDoc : 8239991
readerImpl : ReadOnlyMultiSegmentReader
readerDir :
org.apache.lucene.store.FSDirectory@/data/solr/video/data/index
indexVersion : 1228743257996
openedAt : Thu Jan 29 17:42:08 CET 2009
registeredAt : Thu Jan 29 17:42:09 CET 2009
warmupTime : 0 

I've around 12M of data.




   


  



thanks a lot,

-- 
View this message in context:
http://www.nabble.com/warmupTime-%3A-0-tp21731301p21731301.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: solr as the data store

2009-01-28 Thread Feak, Todd
Although the idea that you will need to rebuild from scratch is
unlikely, you might want to fully understand the cost of recovery if you
*do* have to.

If it's incredibly expensive(time or money), you need to keep that in
mind.

-Todd


-Original Message-
From: Ian Connor [mailto:ian.con...@gmail.com] 
Sent: Wednesday, January 28, 2009 12:38 PM
To: solr
Subject: solr as the data store

Hi All,

Is anyone using Solr (and thus the lucene index) as there database
store.

Up to now, we have been using a database to build Solr from. However,
given
that lucene already keeps the stored data intact, and that rebuilding
from
solr to solr can be very fast, the need for the separate database does
not
seem so necessary.

It seems totally possible to maintain just the solr shards and treat
them as
the database (backups, redundancy, etc are already built right in). The
idea
that we would need to rebuild from scratch seems unlikely and the speed
boost by using solr shards for data massaging and reindexing seems very
appealing.

Has anyone else thought about this or done this and ran into problems
that
caused them to go back to a seperate database model? Is there a critical
need you can think is missing?

-- 
Regards,

Ian Connor


RE: QTime in microsecond

2009-01-23 Thread Feak, Todd
The easiest way is to run maybe 100,000 or more queries and take an
average. A single microsecond value for a query would be incredibly
inaccurate.

-ToddFeak



-Original Message-
From: AHMET ARSLAN [mailto:iori...@yahoo.com] 
Sent: Friday, January 23, 2009 1:33 AM
To: solr-user@lucene.apache.org
Subject: QTime in microsecond 

Is there a way to get QTime in microsecond from solr?

I have small set of collection and my response time (QTime) is 0 or 1
milliseconds. I am running benchmark tests and I need more sensitive
running times for comparision.

Thanks for your help.


  



RE: Performance "dead-zone" due to garbage collection

2009-01-23 Thread Feak, Todd
Can you share your experience with the IBM JDK once you've evaluated it?
You are working with a heavy load, I think many would benefit from the
feedback.

-Todd Feak

-Original Message-
From: wojtekpia [mailto:wojte...@hotmail.com] 
Sent: Thursday, January 22, 2009 3:46 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance "dead-zone" due to garbage collection


I'm not sure if you suggested it, but I'd like to try the IBM JVM. Aside
from
setting my JRE paths, is there anything else I need to do run inside the
IBM
JVM? (e.g. re-compiling?)


Walter Underwood wrote:
> 
> What JVM and garbage collector setting? We are using the IBM JVM with
> their concurrent generational collector. I would strongly recommend
> trying a similar collector on your JVM. Hint: how much memory is in
> use after a full GC? That is a good approximation to the working set.
> 
> 

-- 
View this message in context:
http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collect
ion-tp21588427p21616078.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Performance "dead-zone" due to garbage collection

2009-01-21 Thread Feak, Todd
A ballpark calculation would be 

Collected Amount (From GC logging)/ # of Requests.

The GC logging can tell you how much it collected each time, no need to
try and snapshot before and after heap sizes. However (big caveat here),
this is a ballpark figure. The garbage collector is not guaranteed to
collect everything, every time. It can stop collecting depending on how
much time it spent. It may only collect from certain sections within
memory (Eden, survivor, tenured), etc.

This may still be enough to make broad comparisons to see if you've
decreased the overall garbage/request (via cache changes), but it will
be quite a rough estimate.

-Todd

-Original Message-
From: wojtekpia [mailto:wojte...@hotmail.com] 
Sent: Wednesday, January 21, 2009 3:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance "dead-zone" due to garbage collection


(Thanks for the responses)

My filterCache hit rate is ~60% (so I'll try making it bigger), and I am
CPU
bound. 

How do I measure the size of my per-request garbage? Is it (total heap
size
before collection - total heap size after collection) / # of requests to
cause a collection?

I'll try your suggestions and post back any useful results.

-- 
View this message in context:
http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collect
ion-tp21588427p21593661.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Performance "dead-zone" due to garbage collection

2009-01-21 Thread Feak, Todd
>From a high level view, there is a certain amount of garbage collection
that must occur. That garbage is generated per request, through a
variety of means (buffers, request, response, cache expulsion). The only
thing that JVM parameters can address is *when* that collection occurs. 

It can occur often in small chunks, or rarely in large chunks (or
anywhere in between). If you are CPU bound (which it sounds like you may
be), then you really have a decision to make. Do you want an overall
drop in performance, as more time is spent garbage collecting, OR do you
want spikes in garbage collection that are more rare, but have a
stronger impact. Realistically it becomes a question of one or the
other. You *must* pay the cost of garbage collection at some point in
time.

It is possible that increasing cache size will decrease overall garbage
collection, as the churn caused by caused by cache misses creates
additional garbage. Decreasing the churn could decrease garbage. BUT,
this really depends on your cache hit rates. If they are pretty high
(>90%) then it's probably not much of a factor. However, if you are in
the 50%-60% range, larger caches may help you in a number of ways.

-Todd Feak

-Original Message-
From: wojtekpia [mailto:wojte...@hotmail.com] 
Sent: Wednesday, January 21, 2009 11:14 AM
To: solr-user@lucene.apache.org
Subject: Re: Performance "dead-zone" due to garbage collection


I'm using a recent version of Sun's JVM (6 update 7) and am using the
concurrent generational collector. I've tried several other collectors,
none
seemed to help the situation.

I've tried reducing my heap allocation. The search performance got worse
as
I reduced the heap. I didn't monitor the garbage collector in those
tests,
but I imagine that it would've gotten better. (As a side note, I do lots
of
faceting and sorting, I have 10M records in this index, with an
approximate
index file size of 10GB).

This index is on a single machine, in a single Solr core. Would
splitting it
across multiple Solr cores on a single machine help? I'd like to find
the
limit of this machine before spreading the data to more machines.

Thanks,

Wojtek
-- 
View this message in context:
http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collect
ion-tp21588427p21590150.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Performance "dead-zone" due to garbage collection

2009-01-21 Thread Feak, Todd
The large drop in old generation from 27GB->6GB indicates that things
are getting into your old generation prematurely. They really don't need
to get there at all, and should be collected sooner (more frequently).

Look into increasing young generation sizes via JVM parameters. Also
look into concurrent collection.

You could even consider decreasing your JVM max memory. Obviously you
aren't using it all, decreasing it will force the JVM to do more
frequent (and therefore smaller) collections. You're average collection
time may go up, but you will get smaller performance decreases.

Great details on memory tuning on Sun JDKs here 

http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html

There are other articles for 1.6 and 1.4 as well.

-Todd

-Original Message-
From: wojtekpia [mailto:wojte...@hotmail.com] 
Sent: Wednesday, January 21, 2009 9:49 AM
To: solr-user@lucene.apache.org
Subject: Performance "dead-zone" due to garbage collection


I'm intermittently experiencing severe performance drops due to Java
garbage
collection. I'm allocating a lot of RAM to my Java process (27GB of the
32GB
physically available). Under heavy load, the performance drops
approximately
every 10 minutes, and the drop lasts for 30-40 seconds. This coincides
with
the size of the old generation heap dropping from ~27GB to ~6GB. 

Is there a way to reduce the impact of garbage collection? A couple
ideas
we've come up with (but haven't tried yet) are: increasing the minimum
heap
size, more frequent (but hopefully less costly) garbage collection.

Thanks,

Wojtek

-- 
View this message in context:
http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collect
ion-tp21588427p21588427.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: New to Solr/Lucene design question

2009-01-20 Thread Feak, Todd
Yes, that's what I was suggesting. :)

Might have to be careful with the extra underscore "_" characters. Not
sure if those will cause issue with dynamic fields.

-Todd Feak

-Original Message-
From: Yogesh Chawla - PD [mailto:premiergenerat...@yahoo.com] 
Sent: Tuesday, January 20, 2009 3:14 PM
To: solr-user@lucene.apache.org
Subject: Re: New to Solr/Lucene design question

Hi Todd,
I think I see what you are saying here.

In our schema.xml we can define it like this:



   
   

   


and then add data like this:


  
Yogesh
Chawla
myMiddleName
  


If we need to add other types of dynamic data types, we can do that at a
later time
by adding a different type of dynamic field.

This way we are not querying a single field 'stash-content' but rather
just the fields we are interested
in and there is no need to change the java code or the schema.xml.

Are we on the same wave length here?

Thanks a lot for the suggestion,
Yogesh







- Original Message 
From: "Feak, Todd" 
To: solr-user@lucene.apache.org
Sent: Tuesday, January 20, 2009 4:49:56 PM
Subject: RE: New to Solr/Lucene design question

A third option - Use dynamic fields.

Add a dynamic field call "*_stash". This will allow new fields for
documents to be added down the road without changing schema.xml, yet
still allow you to query on fields like "arresteeFirstName_stash"
without extra overhead.

-Todd Feak

-Original Message-
From: Yogesh Chawla - PD [mailto:premiergenerat...@yahoo.com] 
Sent: Tuesday, January 20, 2009 2:30 PM
To: solr-user@lucene.apache.org
Subject: New to Solr/Lucene design question

Hello All,
We are using SOLR/Lucene as the search engine for an application
we are designing.  The application is a workflow application that can
receive different types of documents.

For example, we are currently working on getting booking documents but
will also accept arrest documents later this year.

We have defined a custom schema that incorporates some schemas designed
by federal consortiums.  From those schemas we pluck out values that we
want 
SOLR/Lucene to index and search on and we go from our instance document
to
a SOLR document.

The fields in our schema.xml look like this:




   
   
   
   


Above, there is a field called "stash-content".  The goal is to take any
search able data from
any document type and put it in this field.  For example, we would store
data like this in XML format:



  
arrestee_firstname_Yogesh
arrestee_lastname_Chawla
arrestee_middlename_myMiddleName
  

The advantage to such an approach is that we can add new document types
to search on and as long
as they use the same semantics such as arrestee_firstname
that we won't to update any code.  It also makes
the code simple and generic for any document type.

We can search on first name like this for a starts with
query:arrestee_firstname_Y*.  We had to use
the _ instead of a space so that each word would not be searched when a
query was performed and only
a single string would be searched.  (hope that makes sense).

The cons could be a performance hit.  

The other approach is to add fields explicitly like this:


  
Yogesh
Chawla
myMiddleName
  

This approach seems more traditional.  The pros of it are that it is
straight forward.  The cons are that every time
we add a new document type to search on, we have to update schema.xml
and the java code that creates SOLR
documents.

The number of documents that we will eventually want to search on is
about 5 million.  However, this will take a while
to ramp up to and we are more immediately looking at searching on about
100,000.

I am new to SOLR and just inherited this project with approach number 1.
Is this something that is going to bite us in the
future?

Thanks,
Yogesh



RE: New to Solr/Lucene design question

2009-01-20 Thread Feak, Todd
A third option - Use dynamic fields.

Add a dynamic field call "*_stash". This will allow new fields for
documents to be added down the road without changing schema.xml, yet
still allow you to query on fields like "arresteeFirstName_stash"
without extra overhead.

-Todd Feak

-Original Message-
From: Yogesh Chawla - PD [mailto:premiergenerat...@yahoo.com] 
Sent: Tuesday, January 20, 2009 2:30 PM
To: solr-user@lucene.apache.org
Subject: New to Solr/Lucene design question

Hello All,
We are using SOLR/Lucene as the search engine for an application
we are designing.  The application is a workflow application that can
receive different types of documents.

For example, we are currently working on getting booking documents but
will also accept arrest documents later this year.

We have defined a custom schema that incorporates some schemas designed
by federal consortiums.  From those schemas we pluck out values that we
want 
SOLR/Lucene to index and search on and we go from our instance document
to
a SOLR document.

The fields in our schema.xml look like this:

 


   
   
   
   


Above, there is a field called "stash-content".  The goal is to take any
search able data from
any document type and put it in this field.  For example, we would store
data like this in XML format:



  
arrestee_firstname_Yogesh
arrestee_lastname_Chawla
arrestee_middlename_myMiddleName
  

The advantage to such an approach is that we can add new document types
to search on and as long
as they use the same semantics such as arrestee_firstname
that we won't to update any code.  It also makes
the code simple and generic for any document type.

We can search on first name like this for a starts with
query:arrestee_firstname_Y*.  We had to use
the _ instead of a space so that each word would not be searched when a
query was performed and only
a single string would be searched.  (hope that makes sense).

The cons could be a performance hit.  

The other approach is to add fields explicitly like this:


  
Yogesh
Chawla
myMiddleName
  

This approach seems more traditional.  The pros of it are that it is
straight forward.  The cons are that every time
we add a new document type to search on, we have to update schema.xml
and the java code that creates SOLR
documents.

The number of documents that we will eventually want to search on is
about 5 million.  However, this will take a while
to ramp up to and we are more immediately looking at searching on about
100,000.

I am new to SOLR and just inherited this project with approach number 1.
Is this something that is going to bite us in the
future?

Thanks,
Yogesh



RE: How to select *actual* match from a multi-valued field

2009-01-20 Thread Feak, Todd
Anyone that can shed some insight?

-Todd

-Original Message-
From: Feak, Todd [mailto:todd.f...@smss.sony.com] 
Sent: Friday, January 16, 2009 9:55 AM
To: solr-user@lucene.apache.org
Subject: How to select *actual* match from a multi-valued field

At a high level, I'm trying to do some more intelligent searching using
an app that will send multiple queries to Solr. My current issue is
around multi-valued fields and determining which entry actually
generated the "hit" for a particular query.

 

For example, let's say that I have a multi-valued field containing
people's names, associated with the document (trying to be non-specific
on purpose). In one document, I have the following names:

Jane Smith, Bob Smith, Roger Smith, Jane Doe. If the user performs a
search for Bob Smith, this document is returned. What I want to know is
that this document was returned because of "Bob Smith", not because of
Jane or Roger. I've tried using the highlighting settings. They do
provide some help, as the Jane Doe entry doesn't come back highlighted,
but both Jane and Roger do. I've tried using hl.requireFieldMatch, but
that seems to pertain only to fields, not entries within a multi-valued
field.

 

Using Solr, is there a way to get the information I am looking for?
Specifically, that "Bob Smith" is the value in the multi-valued field
that triggered the hit?

 

-Todd Feak



How to select *actual* match from a multi-valued field

2009-01-16 Thread Feak, Todd
At a high level, I'm trying to do some more intelligent searching using
an app that will send multiple queries to Solr. My current issue is
around multi-valued fields and determining which entry actually
generated the "hit" for a particular query.

 

For example, let's say that I have a multi-valued field containing
people's names, associated with the document (trying to be non-specific
on purpose). In one document, I have the following names:

Jane Smith, Bob Smith, Roger Smith, Jane Doe. If the user performs a
search for Bob Smith, this document is returned. What I want to know is
that this document was returned because of "Bob Smith", not because of
Jane or Roger. I've tried using the highlighting settings. They do
provide some help, as the Jane Doe entry doesn't come back highlighted,
but both Jane and Roger do. I've tried using hl.requireFieldMatch, but
that seems to pertain only to fields, not entries within a multi-valued
field.

 

Using Solr, is there a way to get the information I am looking for?
Specifically, that "Bob Smith" is the value in the multi-valued field
that triggered the hit?

 

-Todd Feak



RE: Commiting index while time-consuming query is running

2009-01-13 Thread Feak, Todd
I believe that when you commit, a new IndexReader is created, which is
warmed, etc. New incoming queries will be sent to this new IndexReader.
Once all previously existing queries have been answered, the old
IndexReader will shut down.

The commit doesn't wait for the query to finish, but it shouldn't impact
the results of that query either. What may be impacted is overall system
performance while you have 2 IndexReaders in play. There will always be
some amount of overlap, but it may be drawn out by the long query.

-Todd Feak

-Original Message-
From: wojtekpia [mailto:wojte...@hotmail.com] 
Sent: Tuesday, January 13, 2009 2:18 PM
To: solr-user@lucene.apache.org
Subject: Commiting index while time-consuming query is running


Once in a while my Solr instance receives a query that takes a really
long
time to execute (several minutes or more). What will happen if I update
my
index (and commit) while one of these really long queries is executing?
Will
Solr wait for the query to complete before it commits my update?

(on a side note, I'm re-working my UI to eliminate these queries)

Thanks!
-- 
View this message in context:
http://www.nabble.com/Commiting-index-while-time-consuming-query-is-runn
ing-tp21445704p21445704.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Snapinstaller vs Solr Restart

2009-01-06 Thread Feak, Todd
Kind of a side-note, but I think it may be worth your while.

If your queryResultCache hit rate is 65%, consider putting a reverse
proxy in front of Solr. It can give performance boosts over the query
cache in Solr, as it doesn't have to pay the cost of reformulating the
response. I've used Varnish with great results. Squid is another option.

-Todd Feak

-Original Message-
From: wojtekpia [mailto:wojte...@hotmail.com] 
Sent: Tuesday, January 06, 2009 1:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Snapinstaller vs Solr Restart


I use my warm up queries to fill the field cache (or at least that's the
idea). My filterCache hit rate is ~99% & queryResultCache is ~65%. 

I update my index several times a day with no 'optimize', and
performance is
seemless. I also update my index once nightly with an 'optimize', and
that's
where I see the performance drop.

I'll try turning autowarming on.

Could this have to do with file caching by the OS? 


Otis Gospodnetic wrote:
> 
> Is autowarm count of 0 a good idea, though?
> If you don't want to autowarm any caches, doesn't that imply that you
have
> very low hit rate and therefore don't care to autowarm?  And if you
have a
> very low hit rate, then perhaps caches are not needed at all?
> 
> 
> How about this.  Do you optimize your index at any point?
> 

-- 
View this message in context:
http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21319344.
html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Using query functions against a "type" field

2009-01-06 Thread Feak, Todd
Thanks Yonik!

I still may investigate the query function stuff that was discussed, as
Hoss indicated it may hold value.

-Todd Feak

-Original Message-
From: Yonik Seeley [mailto:ysee...@gmail.com] 
Sent: Tuesday, January 06, 2009 10:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Using query functions against a "type" field

On Tue, Jan 6, 2009 at 1:05 PM, Feak, Todd 
wrote:
> I'm not sure I followed all that Yonik.
>
> Are you saying that I can achieve this affect now with a bq setting in
> my DisMax query instead of via a bf setting?

Yep, a "const" QParser would enable that.

bq={!const}foo:bar

-Yonik



RE: Using query functions against a "type" field

2009-01-06 Thread Feak, Todd
I'm not sure I followed all that Yonik.

Are you saying that I can achieve this affect now with a bq setting in
my DisMax query instead of via a bf setting?

-Todd Feak

-Original Message-
From: Yonik Seeley [mailto:ysee...@gmail.com] 
Sent: Tuesday, January 06, 2009 9:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Using query functions against a "type" field

On Tue, Jan 6, 2009 at 10:41 AM, Feak, Todd 
wrote:
> The boost queries are true queries, so the amount boost can be
affected
> by things like term frequency for the query.

Sounds like a constant score query is a general way to do this.

Possible QParser syntax:
{!const}tag:FOO OR tag:BAR

Could be implemented via
ConstantScoreQuery(QueryWrapperFilter(theQuery))

The value could be the boost, optionally set within this QParser...
{!const v=2.0}tag:FOO OR tag:BAR

-Yonik



RE: Snapinstaller vs Solr Restart

2009-01-06 Thread Feak, Todd
First suspect would be Filter Cache settings and Query Cache settings.

If they are auto-warming at all, then there is a definite difference
between the first start behavior and the post-commit behavior. This
affects what's in memory, caches, etc.

-Todd Feak

-Original Message-
From: wojtekpia [mailto:wojte...@hotmail.com] 
Sent: Tuesday, January 06, 2009 9:46 AM
To: solr-user@lucene.apache.org
Subject: Snapinstaller vs Solr Restart


I'm running load tests against my Solr instance. I find that it
typically
takes ~10 minutes for my Solr setup to "warm-up" while I throw my test
queries at it. Also, I have the same two warm-up queries specified for
the
firstSearcher and newSearcher event listeners. 

I'm now benchmarking the affect of updating an index under load. I'm
finding
that after running snapinstaller, Solr takes ~1 hour to get back to the
same
performance numbers I was getting 10 minutes after a restart. If I can
justify being offline for a few moments, it seems like I'll be better
off
restarting Solr rather than running Snapinstaller.

Any ideas why?

Thanks.
-- 
View this message in context:
http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21315273.
html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Using query functions against a "type" field

2009-01-06 Thread Feak, Todd
:It should be fairly predictible, can you elaborate on what problems you

:have just adding boost queries for the specific types?

The boost queries are true queries, so the amount boost can be affected
by things like term frequency for the query. The functions aren't
affected by this and therefore more predictable over the life of the
index. If I want to boost documents via multiple factors, their
interaction is very important. If that interaction slowly changes over
the life of the index, I lose that control.

:a generic Parser/ValueSource that let you specific term=>float mappings
in 
:it's init params would certianly make a cool patch for Solr.

I do believe I will work on this (may take me a bit). Once I nail it
down, I've got a couple of other easier query functions I would like to
add as well, if they hold value for the community.

-Hoss



RE: Ngram Repeats

2009-01-05 Thread Feak, Todd
To get the unique brand names, you are wandering in to the Facet query 
territory that I mentioned.

You could consider a separate index, and that will probably provide the best 
performance. Especially if you are hitting it on a per-keystroke basis to 
update that auto-complete box. Creating a separate index also allows you to 
scale this section of your search infrastructure separately, if necessary.

You *can* put the separate index within the same Tomcat instance if you need 
to. The context snippets in Tomcat can be used to provide a different URL for 
those queries.

-Todd Feak

-Original Message-
From: Jeff Newburn [mailto:jnewb...@zappos.com] 
Sent: Wednesday, December 24, 2008 2:30 PM
To: solr-user@lucene.apache.org
Subject: Re: Ngram Repeats

You are correct on the layout.  The reason we are trying to do the ngrams is
we want to do a drop down box for autocomplete.  The ngrams are extremely
fast and the recommended way to do this according to the user group.  They
work wonderfully except this one issue.  So do we basically have to do a
separate index for this or is there a dedup setting to only return unique
brand names.


On 12/24/08 7:51 AM, "Feak, Todd"  wrote:

> It sounds like you want to get a list of "brands" that start with a particular
> string, out of your index. But your index is based on products, not brands. Is
> that correct?
> 
> If so, that has nothing to do with NGrams (or even tokenizing for that matter)
> I think you should be doing a Facet query instead of a standard query. Take a
> look at Facets on the Solr Wiki.
> 
> http://wiki.apache.org/solr/SolrFacetingOverview
> 
> -Todd Feak
> -Original Message-
> From: Jeff Newburn [mailto:jnewb...@zappos.com]
> Sent: Wednesday, December 24, 2008 7:39 AM
> To: solr-user@lucene.apache.org
> Subject: Ngram Repeats
> 
> I have set up an ngram filter and have run into a problem.  Our index is
> basically composed of products as the unique id.  Each product also has a
> brand name assigned to it.  There are much fewer unique brand names than
> products in the index.  I tried to set up an ngram based on the brand name
> but it is returning the same brand name over and over for each product.
> Essentially if you try for the brand name starting with ³as² you will get
> the brand ³asus² 15 times.  Is there a way to make the ngram only return
> unique brand name?  I have attached the configuration below.
> 
>  positionIncrementGap="1">
> 
> 
> 
>  minGramSize="1" maxGramSize="20"/>
> 
> 
> 
> 
> 
> 
> -Jeff




RE: Ngram Repeats

2008-12-24 Thread Feak, Todd
It sounds like you want to get a list of "brands" that start with a particular 
string, out of your index. But your index is based on products, not brands. Is 
that correct?

If so, that has nothing to do with NGrams (or even tokenizing for that matter) 
I think you should be doing a Facet query instead of a standard query. Take a 
look at Facets on the Solr Wiki.

http://wiki.apache.org/solr/SolrFacetingOverview

-Todd Feak
-Original Message-
From: Jeff Newburn [mailto:jnewb...@zappos.com] 
Sent: Wednesday, December 24, 2008 7:39 AM
To: solr-user@lucene.apache.org
Subject: Ngram Repeats

I have set up an ngram filter and have run into a problem.  Our index is
basically composed of products as the unique id.  Each product also has a
brand name assigned to it.  There are much fewer unique brand names than
products in the index.  I tried to set up an ngram based on the brand name
but it is returning the same brand name over and over for each product.
Essentially if you try for the brand name starting with ³as² you will get
the brand ³asus² 15 times.  Is there a way to make the ngram only return
unique brand name?  I have attached the configuration below.












-Jeff


RE: Using query functions against a "type" field

2008-12-22 Thread Feak, Todd
If I do that, how do I turn off the boosting for some queries but not
others?

This needs to be done at query time, I believe.

-Todd Feak

-Original Message-
From: Walter Underwood [mailto:wunderw...@netflix.com] 
Sent: Monday, December 22, 2008 10:33 AM
To: solr-user@lucene.apache.org
Subject: Re: Using query functions against a "type" field

Try document boost at index time. --wunder

On 12/22/08 9:28 AM, "Feak, Todd"  wrote:

> I would like to use a query function to boost documents of a certain
> "type". I realize that I can use a boost query for this, but in
> analyzing the scoring it doesn't seem as predictable as the query
> functions.
> 
>  
> 
> So, imagine I have a field called "foo". Foo contains a value that
> indicates what type of document this is. For now there are only
document
> types of "BAR" and "BAZ". I would like documents of type BAR to be
> boosted much more strongly then documents of type BAZ. As far as I can
> all of the query functions seem to work with fields that contain
> numbers. The only exception being the ord() functions, but those don't
> provide the stability I would like, as I can always introduce a new
> document type down the road and risk screwing up my results.
> 
>  
> 
> Can this be done with function queries?
> 
>  
> 
> As a follow up, how difficult would it be for me to write my own
> function (and plug it into Solr) that allowed me to return a 1.0 or
0.0
> if a field had a particular string value in it? A function that would
> look something like "fieldEq(foo,BAR)"
> 
>  
> 
> -Todd Feak
> 




Using query functions against a "type" field

2008-12-22 Thread Feak, Todd
I would like to use a query function to boost documents of a certain
"type". I realize that I can use a boost query for this, but in
analyzing the scoring it doesn't seem as predictable as the query
functions.

 

So, imagine I have a field called "foo". Foo contains a value that
indicates what type of document this is. For now there are only document
types of "BAR" and "BAZ". I would like documents of type BAR to be
boosted much more strongly then documents of type BAZ. As far as I can
all of the query functions seem to work with fields that contain
numbers. The only exception being the ord() functions, but those don't
provide the stability I would like, as I can always introduce a new
document type down the road and risk screwing up my results.

 

Can this be done with function queries?

 

As a follow up, how difficult would it be for me to write my own
function (and plug it into Solr) that allowed me to return a 1.0 or 0.0
if a field had a particular string value in it? A function that would
look something like "fieldEq(foo,BAR)"

 

-Todd Feak



RE: looking for multilanguage indexing best practice/hint

2008-12-17 Thread Feak, Todd
Don't forget to consider scaling concerns (if there are any). There are
strong differences in the number of searches we receive for each
language. We chose to create separate schema and config per language so
that we can throw servers at a particular language (or set of languages)
if we needed to. We see 2 orders of magnitude difference between our
most popular language and our least popular.

-Todd Feak

-Original Message-
From: Julian Davchev [mailto:j...@drun.net] 
Sent: Wednesday, December 17, 2008 11:31 AM
To: solr-user@lucene.apache.org
Subject: looking for multilanguage indexing best practice/hint

Hi,
>From my study on solr and lucene so far it seems that I will use single
scheme.at least don't see scenario where I'd need more than that.
So question is how do I approach multilanguage indexing and multilang
searching. Will it really make sense for just searching word..or rather
I should supply lang param to search as well.

I see there are those filters and already advised on them but I guess
question is more of a best practice.
solr.ISOLatin1AccentFilterFactory, solr.SnowballPorterFilterFactory

So solution I see is using copyField I have same field in different
langs or something using distinct filter.
Cheers





RE: Query Performance while updating teh index

2008-12-12 Thread Feak, Todd
Sorry, my bad. Didn't read the entire thread.

Look at your filter cache first. You are autowarming 1000, and there is
exactly 1000 in there. Yet it looks like there may be tens of thousands
of filter queries in your system. I would try autowarming more. Try
10,000 or 20,000 and see if it helps.

Second look at your document cache. Document caches don't use autowarm.
But you can add queries to your firstSeacher and newSearcher entries in
your solrconfig to pre-populate the document cache during warming.

-Todd Feak


-Original Message-
From: oleg_gnatovskiy [mailto:oleg_gnatovs...@citysearch.com] 
Sent: Friday, December 12, 2008 11:19 AM
To: solr-user@lucene.apache.org
Subject: RE: Query Performance while updating teh index


The auto warm time is not an issue. We take the server off the load
balancer
while it is autowarming. It seems that the slowness occurs after
autowarm is
done.



Feak, Todd wrote:
> 
> It's spending 4-5 seconds warming up your query cache. If 4-5 seconds
is
> too much, you could reduce the number of queries to auto-warm with on
> that cache.
> 
> Notice that the 4-5 seconds is spent only putting about 420 queries
into
> the query cache. Your autowarm of 5 for the query cache seems a
bit
> high. If you need to reduce that autowarm time below 5 seconds, you
may
> have to set that value in the hundreds, as opposed to tens of
thousands.
> 
> -Todd Feak
> 
> -Original Message-
> From: oleg_gnatovskiy [mailto:oleg_gnatovs...@citysearch.com] 
> Sent: Friday, December 12, 2008 10:08 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Query Performance while updating teh index
> 
> 
> Here's what we have on one of the data slaves for the autowarming.
> 
>  
> 
> --
> 
> Dec 12, 2008 8:46:02 AM org.apache.solr.search.SolrIndexSearcher warm
> 
> INFO: autowarming searc...@3f32ca2b main from searc...@443ad545 main
> 
>
>
filterCache{lookups=351993,hits=347055,hitratio=0.98,inserts=8332,evicti
>
ons=0,size=8245,warmupTime=215,cumulative_lookups=2837676,cumulative_hit
>
s=2766551,cumulative_hitratio=0.97,cumulative_inserts=72050,cumulative_e
> victions=0}
> 
> Dec 12, 2008 8:46:02 AM org.apache.solr.search.SolrIndexSearcher warm
> 
> INFO: autowarming result for searc...@3f32ca2b main
> 
>
>
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=1000,evictions=0,size
>
=1000,warmupTime=317,cumulative_lookups=2837676,cumulative_hits=2766551,
>
cumulative_hitratio=0.97,cumulative_inserts=72050,cumulative_evictions=0
> }
> 
> Dec 12, 2008 8:46:02 AM org.apache.solr.search.SolrIndexSearcher warm
> 
> INFO: autowarming searc...@3f32ca2b main from searc...@443ad545 main
> 
>
>
queryResultCache{lookups=5309,hits=5223,hitratio=0.98,inserts=422,evicti
>
ons=0,size=421,warmupTime=4628,cumulative_lookups=77802,cumulative_hits=
>
77216,cumulative_hitratio=0.99,cumulative_inserts=424,cumulative_evictio
> ns=0}
> 
> --
> 
> Dec 12, 2008 8:46:07 AM org.apache.solr.search.SolrIndexSearcher warm
> 
> INFO: autowarming result for searc...@3f32ca2b main
> 
>
>
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=421,evictions=0,
>
size=421,warmupTime=5536,cumulative_lookups=77804,cumulative_hits=77218,
>
cumulative_hitratio=0.99,cumulative_inserts=424,cumulative_evictions=0}
> 
> Dec 12, 2008 8:46:07 AM org.apache.solr.search.SolrIndexSearcher warm
> 
> INFO: autowarming searc...@3f32ca2b main from searc...@443ad545 main
> 
>
>
documentCache{lookups=87216,hits=86686,hitratio=0.99,inserts=570,evictio
>
ns=0,size=570,warmupTime=0,cumulative_lookups=1270773,cumulative_hits=12
>
68318,cumulative_hitratio=0.99,cumulative_inserts=2455,cumulative_evicti
> ons=0}
> 
> Dec 12, 2008 8:46:07 AM org.apache.solr.search.SolrIndexSearcher warm
> 
> INFO: autowarming result for searc...@3f32ca2b main
> 
>
>
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
>
0,warmupTime=0,cumulative_lookups=1270773,cumulative_hits=1268318,cumula
> tive_hitratio=0.99,cumulative_inserts=2455,cumulative_evictions=0}
> 
> --
> 
>  
> 
> This is our current values after I've messed with them a few times
> trying to
> get better performance.
> 
>  
> 
>  
>   class="solr.LRUCache"
> 
>   size="3"
> 
>   initialSize="15000"
> 
>   autowarmCount="1000"/>
> 
>  
>   class="solr.LRUCache"
> 
>   size="6"
> 
>   initialSize="3"
> 
>   autowarmCount="5"/>
> 
>  
>   class="solr.LRUCache"
> 
>   size="20"
> 
>   initialSize="125000"
> 
>   autowarmCount="0"/>
> 
> 
> -- 
> View this message in context:
>
http://www.nabble.com/Query-Performance-while-updating-the-index-tp20452
> 835p20980669.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 

-- 
View this message in context:
http://www.nabble.com/Query-Performance-while-updating-the-index-tp20452
835p20981862.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Query Performance while updating teh index

2008-12-12 Thread Feak, Todd
It's spending 4-5 seconds warming up your query cache. If 4-5 seconds is
too much, you could reduce the number of queries to auto-warm with on
that cache.

Notice that the 4-5 seconds is spent only putting about 420 queries into
the query cache. Your autowarm of 5 for the query cache seems a bit
high. If you need to reduce that autowarm time below 5 seconds, you may
have to set that value in the hundreds, as opposed to tens of thousands.

-Todd Feak

-Original Message-
From: oleg_gnatovskiy [mailto:oleg_gnatovs...@citysearch.com] 
Sent: Friday, December 12, 2008 10:08 AM
To: solr-user@lucene.apache.org
Subject: Re: Query Performance while updating teh index


Here's what we have on one of the data slaves for the autowarming.

 

--

Dec 12, 2008 8:46:02 AM org.apache.solr.search.SolrIndexSearcher warm

INFO: autowarming searc...@3f32ca2b main from searc...@443ad545 main

   
filterCache{lookups=351993,hits=347055,hitratio=0.98,inserts=8332,evicti
ons=0,size=8245,warmupTime=215,cumulative_lookups=2837676,cumulative_hit
s=2766551,cumulative_hitratio=0.97,cumulative_inserts=72050,cumulative_e
victions=0}

Dec 12, 2008 8:46:02 AM org.apache.solr.search.SolrIndexSearcher warm

INFO: autowarming result for searc...@3f32ca2b main

   
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=1000,evictions=0,size
=1000,warmupTime=317,cumulative_lookups=2837676,cumulative_hits=2766551,
cumulative_hitratio=0.97,cumulative_inserts=72050,cumulative_evictions=0
}

Dec 12, 2008 8:46:02 AM org.apache.solr.search.SolrIndexSearcher warm

INFO: autowarming searc...@3f32ca2b main from searc...@443ad545 main

   
queryResultCache{lookups=5309,hits=5223,hitratio=0.98,inserts=422,evicti
ons=0,size=421,warmupTime=4628,cumulative_lookups=77802,cumulative_hits=
77216,cumulative_hitratio=0.99,cumulative_inserts=424,cumulative_evictio
ns=0}

--

Dec 12, 2008 8:46:07 AM org.apache.solr.search.SolrIndexSearcher warm

INFO: autowarming result for searc...@3f32ca2b main

   
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=421,evictions=0,
size=421,warmupTime=5536,cumulative_lookups=77804,cumulative_hits=77218,
cumulative_hitratio=0.99,cumulative_inserts=424,cumulative_evictions=0}

Dec 12, 2008 8:46:07 AM org.apache.solr.search.SolrIndexSearcher warm

INFO: autowarming searc...@3f32ca2b main from searc...@443ad545 main

   
documentCache{lookups=87216,hits=86686,hitratio=0.99,inserts=570,evictio
ns=0,size=570,warmupTime=0,cumulative_lookups=1270773,cumulative_hits=12
68318,cumulative_hitratio=0.99,cumulative_inserts=2455,cumulative_evicti
ons=0}

Dec 12, 2008 8:46:07 AM org.apache.solr.search.SolrIndexSearcher warm

INFO: autowarming result for searc...@3f32ca2b main

   
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
0,warmupTime=0,cumulative_lookups=1270773,cumulative_hits=1268318,cumula
tive_hitratio=0.99,cumulative_inserts=2455,cumulative_evictions=0}

--

 

This is our current values after I've messed with them a few times
trying to
get better performance.

 








-- 
View this message in context:
http://www.nabble.com/Query-Performance-while-updating-the-index-tp20452
835p20980669.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: move /solr directory from /tomcat/bin/

2008-12-11 Thread Feak, Todd
You can set the home directory in your Tomcat context snippet/file.

http://wiki.apache.org/solr/SolrTomcat#head-7036378fa48b79c0797cc8230a8a
a0965412fb2e

This controls where Solr looks for solrconfig.xml and schema.xml. The
solrconfig.xml in turn specifies where to find the data directory.

-Original Message-
From: Marc Sturlese [mailto:marc.sturl...@gmail.com] 
Sent: Thursday, December 11, 2008 12:20 PM
To: solr-user@lucene.apache.org
Subject: move /solr directory from /tomcat/bin/


Hey there,
I would like to change the default directory where solr looks for the
config
files and index.
Let's say I would like to put:
/opt/tomcat/bin/solr/data/index in /var/searchengine_data/index
and
/opt/tomcat/bin/solr/conf in /usr/home/searchengine_files/conf

Is there any way to do it via configuration or I should modify the
SolrResourceLoader?

Thanks in advance
-- 
View this message in context:
http://www.nabble.com/move--solr-directory-from--tomcat-bin--tp20963811p
20963811.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Issue with Search when using wildcard(*) in search term.

2008-12-09 Thread Feak, Todd
I'm pretty sure "*" isn't supported by DisMax.

>From the Solr Wiki on DisMaxRequestHandler overview

http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=(dismax)#head
-ce5517b6c702a55af5cc14a2c284dbd9f18a18c2

"This query handler supports an extremely simplified subset of the
Lucene QueryParser syntax. Quotes can be used to group phrases, and +/-
can be used to denote mandatory and optional clauses ... but all other
Lucene query parser special characters are escaped to simplify the user
experience.."

-Todd Feak

-Original Message-
From: payalsharma [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 09, 2008 4:51 AM
To: solr-user@lucene.apache.org
Subject: Issue with Search when using wildcard(*) in search term.


Hi All,

I am searching a term on Solr by using wildcard character "*" like this
:

http://delpearsonwebapps:8080/apache-solr-1.3.0/core51043/select/?q=
alle*

here the search term(word) is : alle*
This query gives me proper result , but as i give dismaxrequest as
parameter
in the query , no results are returned , query with dismax parameter
goes
like this :

http://delpearsonwebapps:8080/apache-solr-1.3.0/core51043/select/?q=
alle*&qt=dismaxrequest


Can anybody let me know the reason behind this behavior, also do I need
to
make any changes in my SolrConfig.XML  in order to make the query run
with
both Wildcard as well as dismaxrequest.

Thanks in advance.

Payal
-- 
View this message in context:
http://www.nabble.com/Issue-with-Search-when-using-wildcard%28*%29-in-se
arch-term.-tp20914102p20914102.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Sorting on text-fields with international characters

2008-12-08 Thread Feak, Todd
One option is to add an additional field for sorting. Create a copy of the 
field you want to sort on and modify the data you insert there so that it will 
sort the way you want it to.

-ToddFeak

-Original Message-
From: Joel Karlsson [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 08, 2008 2:38 PM
To: solr-user@lucene.apache.org
Subject: Sorting on text-fields with international characters

Hello,

Is there any way to get Solr to sort properly on a text field containing
international, in my case swedish, letters? It doesn't sort å,ä and ö in the
proper order. Also, is there any way to get Solr to sort, i.e, á, à or â
together with the "regular" a's?

Thanks in advance! // Joel


RE: Encoded search string & qt=Dismax

2008-12-02 Thread Feak, Todd
Do you have a "dismaxrequest" request handler defined in your solr config xml? 
Or is it "dismax"?

-Todd Feak

-Original Message-
From: tushar kapoor [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 02, 2008 10:07 AM
To: solr-user@lucene.apache.org
Subject: Encoded search string & qt=Dismax


Hi,

I am facing problems while searching for some encoded text as part of the
search query string. The results don't come up when I use some url encoding
with qt=dismaxrequest.

I am searching a Russian word by posting a URL encoded UTF8 transformation
of the word. The query works fine for normal request. However, no docs are
fetched when qt=dismaxrequest is appended as part of the query string.

The word being searched is -
Russian Word - Предварительное 

UTF8 Java Encoding -
\u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435

Posted query string (URL Encoded) - 
%5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435

Following are the two queries and the difference in results

Query 1 - this one works fine

?q=%5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435

Result -

 
 
 
  0 
  0 
 
  \u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435
 
  
  
 
 
  productIndex 
  productIndex 
  4100018 
  4100018 
 
  productIndex 
  product 
  Предварительное K математики учебная книга 
  4100018 
  4100018 
  21125 
  91048 
  91047 
  
  21125 
  21125 
 
  91048 
  91047 
  
  Предварительное K математики учебная
книга 
  Предварительное K математики учебная
книга 
  product 
  product 
 
  91048 
  91047 
  
  20081202T08:14:05.63Z 
  
  
  

Query 2 - qt=dismaxrequest - This doesnt work

?q=%5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435&qt=dismaxrequest

Result -
   
 
 
  0 
  109 
 
  \u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435
 
  dismaxrequest 
  
  
   
  

Dont know why there is a difference on appending qt=dismaxrequest. Any help
would be appreciated.


Regards,
Tushar.
-- 
View this message in context: 
http://www.nabble.com/Encoded--search-string---qt%3DDismax-tp20797703p20797703.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: maxWarmingSearchers

2008-12-01 Thread Feak, Todd
The commit after each one may be hurting you.

I believe that a new searcher is created after each commit. That searcher then 
runs through its warm up, which can be costly depending on your warming 
settings. Even if it's not overly costly, creating another one while the first 
one is running makes both of them run just a bit slower. Then creating a third 
exacerbates it, etc. If you are commiting faster then it can warm, you will get 
the pile-up of searchers you are seeing. And the more that pile up, the longer 
it takes each one to finish up.

I would suggest trying to group those 4-10 documents into a single update job 
and doing a single commit. That way only 1 searcher is created per 4 minute 
window.

Also (sorry I forgot this earlier) you can see how long your searcher is 
spending warming up by looking at the stats page under the admin. 
(/admin/stats.jsp) There is timing information on how long it took for the 
searcher and caches to warm up.

-Todd Feak

-Original Message-
From: dudes dudes [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 01, 2008 1:46 PM
To: solr-user@lucene.apache.org
Subject: RE: maxWarmingSearchers




> Subject: RE: maxWarmingSearchers
> Date: Mon, 1 Dec 2008 13:35:53 -0800
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> 
> Ok sounds reasonable. When you index/update those 4-10 documents, are
> you doing a single commit? OR are you doing a commit after each one?

well, commits after each one..

> How big is your index? How big are your documents? Ballpark figures are
> ok.

more than couple of MBs 

one final piece of information: I only have 2 G of RAM on that machine( linux 
on VMware environment ) and increased the memory of tomcat to 1 G

thanks


> 
> -ToddFeak
> 
> -Original Message-
> From: dudes dudes [mailto:[EMAIL PROTECTED] 
> Sent: Monday, December 01, 2008 1:24 PM
> To: solr-user@lucene.apache.org
> Subject: RE: maxWarmingSearchers
> 
> 
> Hi ToddFeak, 
> 
> thanks for your response... 
> 
> solr version is 1.3. Roughly about every 4 minutes there are
> indexing/updaing of 4 to 10 documents that is from multiple clients
> to one master server... 
> 
> It is also worth  mentioning that I have 
> 
> 
>   
>   
>   
>   postCommit uncommented under solrconfig ... QueryCache and
> FilterCache settings are left as default 
> 
> thanks
> ak
> 
> 
> 
> 
> 
> 
> > Subject: RE: maxWarmingSearchers
> > Date: Mon, 1 Dec 2008 13:13:15 -0800
> > From: [EMAIL PROTECTED]
> > To: solr-user@lucene.apache.org
> > 
> > Probably going to need a bit more information.
> > 
> > Such as: 
> > What version of Solr and a little info on doc count, index size, etc.
> > How often are you sending updates to your Master? 
> > How often are you committing? 
> > What are your QueryCache and FilterCache settings for autowarm?
> > Do you have queries set up for newSearcher and firstSearcher?
> > 
> > To start looking for your problem, you usually get a pile up of
> > searchers if you are committing too fast, and/or the warming of new
> > searchers is taking an extraordinary long time. If is happening in a
> > repeatable fashion, increasing the number of warming searchers
> probably
> > won't fix the issue, just delay it.
> > 
> > -ToddFeak
> > 
> > -Original Message-
> > From: dudes dudes [mailto:[EMAIL PROTECTED] 
> > Sent: Monday, December 01, 2008 12:13 PM
> > To: solr-user@lucene.apache.org
> > Subject: maxWarmingSearchers
> > 
> > 
> > Hello all, 
> > 
> > I'm having this issue and I hope I get some help.. :)
> > 
> > This following happens quite often ... even though searching  and
> > indexing are on a safe side... 
> > 
> > SolrException: HTTP code=503, reason=Error opening new searcher.
> > exceeded
> > 
> > limit of maxWarmingSearchers=4, try again later.
> > 
> > I have increased the value of  maxWarmingSearchers to 8 and I still
> > experience the same problem 
> > 
> > This issue is happening to the master solr server  changing
> > maxWarmingSearchers  to higher value would help overcoming this issue
> ?
> > or I should consider some other points ?
> > 
> > Another question is ? from your experience, do you think such error
> > introduces server crash ? 
> > 
> > 
> > thanks for your time..
> > ak 
> > 
> > 
> > 
> > _
> > Get a bird's eye view of the world with Multimap
> > http://clk.atdmt.com/GBL/go/115454059/direct/01/
> 
> _
> Get Windows Live Messenger on your Mobile
> http://clk.atdmt.com/UKM/go/msnnkmgl001001ukm/direct/01/

_
Imagine a life without walls.  See the possibilities. 
http://clk.atdmt.com/UKM/go/122465943/direct/01/


RE: maxWarmingSearchers

2008-12-01 Thread Feak, Todd
Ok sounds reasonable. When you index/update those 4-10 documents, are
you doing a single commit? OR are you doing a commit after each one?

How big is your index? How big are your documents? Ballpark figures are
ok.

-ToddFeak

-Original Message-
From: dudes dudes [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 01, 2008 1:24 PM
To: solr-user@lucene.apache.org
Subject: RE: maxWarmingSearchers


Hi ToddFeak, 

thanks for your response... 

solr version is 1.3. Roughly about every 4 minutes there are
indexing/updaing of 4 to 10 documents that is from multiple clients
to one master server... 

It is also worth  mentioning that I have 





postCommit uncommented under solrconfig ... QueryCache and
FilterCache settings are left as default 

thanks
ak






> Subject: RE: maxWarmingSearchers
> Date: Mon, 1 Dec 2008 13:13:15 -0800
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> 
> Probably going to need a bit more information.
> 
> Such as: 
> What version of Solr and a little info on doc count, index size, etc.
> How often are you sending updates to your Master? 
> How often are you committing? 
> What are your QueryCache and FilterCache settings for autowarm?
> Do you have queries set up for newSearcher and firstSearcher?
> 
> To start looking for your problem, you usually get a pile up of
> searchers if you are committing too fast, and/or the warming of new
> searchers is taking an extraordinary long time. If is happening in a
> repeatable fashion, increasing the number of warming searchers
probably
> won't fix the issue, just delay it.
> 
> -ToddFeak
> 
> -Original Message-
> From: dudes dudes [mailto:[EMAIL PROTECTED] 
> Sent: Monday, December 01, 2008 12:13 PM
> To: solr-user@lucene.apache.org
> Subject: maxWarmingSearchers
> 
> 
> Hello all, 
> 
> I'm having this issue and I hope I get some help.. :)
> 
> This following happens quite often ... even though searching  and
> indexing are on a safe side... 
> 
> SolrException: HTTP code=503, reason=Error opening new searcher.
> exceeded
> 
> limit of maxWarmingSearchers=4, try again later.
> 
> I have increased the value of  maxWarmingSearchers to 8 and I still
> experience the same problem 
> 
> This issue is happening to the master solr server  changing
> maxWarmingSearchers  to higher value would help overcoming this issue
?
> or I should consider some other points ?
> 
> Another question is ? from your experience, do you think such error
> introduces server crash ? 
> 
> 
> thanks for your time..
> ak 
> 
> 
> 
> _
> Get a bird's eye view of the world with Multimap
> http://clk.atdmt.com/GBL/go/115454059/direct/01/

_
Get Windows Live Messenger on your Mobile
http://clk.atdmt.com/UKM/go/msnnkmgl001001ukm/direct/01/


RE: maxWarmingSearchers

2008-12-01 Thread Feak, Todd
Probably going to need a bit more information.

Such as: 
What version of Solr and a little info on doc count, index size, etc.
How often are you sending updates to your Master? 
How often are you committing? 
What are your QueryCache and FilterCache settings for autowarm?
Do you have queries set up for newSearcher and firstSearcher?

To start looking for your problem, you usually get a pile up of
searchers if you are committing too fast, and/or the warming of new
searchers is taking an extraordinary long time. If is happening in a
repeatable fashion, increasing the number of warming searchers probably
won't fix the issue, just delay it.

-ToddFeak

-Original Message-
From: dudes dudes [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 01, 2008 12:13 PM
To: solr-user@lucene.apache.org
Subject: maxWarmingSearchers


Hello all, 

I'm having this issue and I hope I get some help.. :)

This following happens quite often ... even though searching  and
indexing are on a safe side... 

SolrException: HTTP code=503, reason=Error opening new searcher.
exceeded

limit of maxWarmingSearchers=4, try again later.

I have increased the value of  maxWarmingSearchers to 8 and I still
experience the same problem 

This issue is happening to the master solr server  changing
maxWarmingSearchers  to higher value would help overcoming this issue ?
or I should consider some other points ?

Another question is ? from your experience, do you think such error
introduces server crash ? 


thanks for your time..
ak 



_
Get a bird's eye view of the world with Multimap
http://clk.atdmt.com/GBL/go/115454059/direct/01/


RE: WordDelimeterFilter and its Factory: access to charTypeTable

2008-11-20 Thread Feak, Todd
I've found that creating a custom filter and filter factory isn't too
burdensome when the filter doesn't "quite" do what I need. You could
grab the source and create your own version.

-Todd Feak

-Original Message-
From: Jerven Bolleman [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 20, 2008 1:56 AM
To: solr-user@lucene.apache.org
Subject: WordDelimeterFilter and its Factory: access to charTypeTable

Hi Solr Community,

I was wondering if it is possible to access and modify the charTypeTable
of the WordDelimeterFilter. 

The use case is that I do not want to split on a '*' char. Which the
filter currently does. If I could modify the charTypeTable I could
change the behaviour of the filter. Or am I barking up the wrong tree
and should I use a different approach?

Thanks,

Jerven Bolleman





RE: Newbie: For stopword query - All objects being returned

2008-11-20 Thread Feak, Todd
Could you provide your schema and the exact query that you issued?

Things to consider... If you just searched for "the", it used the
default search field, which is declared in your schema. The filters
associated with that default field are what determine whether or not the
stopword list is invoked during the query (and/or indexing time).

-Todd Feak

-Original Message-
From: Sanjay Suri [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 20, 2008 12:31 AM
To: solr-user@lucene.apache.org
Subject: Newbie: For stopword query - All objects being returned

Hi ,
I realize this might be too simple - Can someone tell me where to look?
I'm
new to solr and have to fix this for a demo asap.

If my search query is "the", all 91 objects are returned as search
results.
I expect 0 results.

-- 
Sanjay Suri

Videocrux Inc.
http://videocrux.com
+91 99102 66626


RE: Searchable/indexable newsgroups

2008-11-19 Thread Feak, Todd
Can Nutch crawl newsgroups? Anyone?

-Todd Feak

-Original Message-
From: John Martyniak [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 19, 2008 3:06 PM
To: solr-user@lucene.apache.org
Subject: Searchable/indexable newsgroups

Does anybody know of a good way to index newsgroups using SOLR?   
Basically would like to build a searchable list of newsgroup content.

Any help would be greatly appreciated.

-John




RE: Solr security

2008-11-17 Thread Feak, Todd
I see value in this in the form of protecting the client from itself.

For example, our Solr isn't accessible from the Internet. It's all
behind firewalls. But, the client applications can make programming
mistakes. I would love the ability to lock them down to a certain number
of rows, just in case someone typos and puts in 1000 instead of 100, or
the like.

Admittedly, testing and QA should catch these things, but sometimes it's
nice to put in a few safeguards to stop the obvious mistakes from
occurring.

-Todd Feak

-Original Message-
From: Matthias Epheser [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 17, 2008 9:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr security

Ryan McKinley schrieb:
  however I have found that in any site where
> stability/load and uptime are a serious concern, this is better
handled 
> in a tier in front of java -- typically the loadbalancer / haproxy / 
> whatever -- and managed by people more cautious then me.

Full ack. What do you think about the only solr related thing "left",
the 
paramter filtering/blocking (eg. rows<1000). Is this suitable to do it
in a 
Filter delivered by solr? Of course as an optional alternative.

> 
> ryan
> 
> 




RE: maxCodeLen in the doublemetaphone solr analyzer

2008-11-13 Thread Feak, Todd
There's a patch in to do that as a separate filter. See
https://issues.apache.org/jira/browse/SOLR-813

You could just take the patch. It's the full filter and factory.

-Todd Feak

-Original Message-
From: Brian Whitman [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 13, 2008 12:31 PM
To: solr-user@lucene.apache.org
Subject: maxCodeLen in the doublemetaphone solr analyzer

I want to change the maxCodeLen param that is in Solr 1.3's
doublemetaphone
plugin. Doc is here:
http://commons.apache.org/codec/apidocs/org/apache/commons/codec/languag
e/DoubleMetaphone.html
Is this something I can do in solrconfig or do I need to change it and
recompile?


RE: solr 1.3 Modification field in schema.xml

2008-11-13 Thread Feak, Todd
I believe (someone correct me if I'm wrong) that the only fields you
need to store are those fields which you wish returned from the query.
In other words, if you will never put the field on the list of fields
(fl) to return, there is no need to store it.

It would be advantageous not to store more then you have to. It reduces
disk access, index size, memory usage, etc. However, you have to balance
this against future needs. If re-indexing is costly just to start
storing 1 more field, it may be worth it to just leave it in.

-Todd Feak

-Original Message-
From: sunnyfr [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 13, 2008 9:13 AM
To: solr-user@lucene.apache.org
Subject: solr 1.3 Modification field in schema.xml


Hi everybody,

I don't get really when do I have to re index datas or not.
I did a full import but I realised I stored too many fields which I
don't
need.

So I have to change some fields inedexed which are stored to not stored.
And I don't know if I have to re index my datas or not and in which case
really do I have to re index datas.

Another question, I would like to know which field must be stored, I
thought
it was field which use function for boosting, but I just tried to boost
one
field indexed but not stored and it worked.

Thanks a lot for putting some light on my questions,

-- 
View this message in context:
http://www.nabble.com/solr-1.3--Modification-field-in-schema.xml-tp20483
691p20483691.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: NIO not working yet

2008-11-12 Thread Feak, Todd
Is support for setting the FSDirectory this way built into 1.3.0
release? Or is it necessary to grab a trunk build.

-Todd Feak

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Wednesday, November 12, 2008 11:59 AM
To: solr-user@lucene.apache.org
Subject: NIO not working yet

NIO support in the latest Solr development versions does not work yet
(I previously advised that some people with possible lock contention
problems try it out).  We'll let you know when it's fixed, but in the
meantime you can always set the system property
"org.apache.lucene.FSDirectory.class" to
"org.apache.lucene.store.NIOFSDirectory" to try it out.

for example:

java
-Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.NIOFSDirec
tory
  -jar start.jar

-Yonik



RE: Throughput Optimization

2008-11-05 Thread Feak, Todd
Yonik said something about the FastLRUCache giving the most gain for
high hit-rates and the LRUCache being faster for low hit-rates. It's in
his Nov 1 comment on SOLR-667. I'm not sure if anything changed since
then, as it's an active issue, but you may want to try the LRUCache for
your query cache.

It sounds like you are memory bound already, but you may want to
investigate the tradeoffs of your filter cache vs. document cache. High
document hit-rate was a big performance boost for us, as document
garbage collection is a lot of overhead. I believe that would show up as
CPU usage though, so it may not be your bottleneck.

This also brings up an interesting question. 3% hit rate on your query
cache seems low to me. Are you sure your load test is mimicking
realistic query patterns from your user base? I realize this probably
isn't part of your bottleneck, just curious.

-Todd Feak

-Original Message-
From: wojtekpia [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 05, 2008 11:08 AM
To: solr-user@lucene.apache.org
Subject: RE: Throughput Optimization


My documentCache hit rate is ~.7, and my queryCache is ~.03. I'm using
FastLRUCache on all 3 of the caches.


Feak, Todd wrote:
> 
> What are your other cache hit rates looking like?
> Which caches are you using the FastLRUCache on?
> 
> -Todd Feak
> 
> -Original Message-
> From: wojtekpia [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, November 05, 2008 8:15 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Throughput Optimization
> 
> 
> Yes, I am seeing evictions. I've tried setting my filterCache higher,
> but
> then I start getting Out Of Memory exceptions. My filterCache hit
ratio
> is >
> .99. It looks like I've hit a RAM bound here.
> 
> I ran a test without faceting. The response times / throughput were
both
> significantly higher, there were no evictions from the filter cache,
but
> I
> still wasn't getting > 50% CPU utilization. Any thoughts on what
> physical
> bound I've hit in this case?
> 
> 
> 
> Erik Hatcher wrote:
>> 
>> One quick question are you seeing any evictions from your  
>> filterCache?  If so, it isn't set large enough to handle the faceting
> 
>> you're doing.
>> 
>>  Erik
>> 
>> 
>> On Nov 4, 2008, at 8:01 PM, wojtekpia wrote:
>> 
>>>
>>> I've been running load tests over the past week or 2, and I can't  
>>> figure out
>>> my system's bottle neck that prevents me from increasing throughput.
> 
>>> First
>>> I'll describe my Solr setup, then what I've tried to optimize the  
>>> system.
>>>
>>> I have 10 million records and 59 fields (all are indexed, 37 are  
>>> stored, 17
>>> have termVectors, 33 are multi-valued) which takes about 15GB of  
>>> disk space.
>>> Most field values are very short (single word or number), and  
>>> usually about
>>> half the fields have any data at all. I'm running on an 8-core, 64- 
>>> bit, 32GB
>>> RAM Redhat box. I allocate about 24GB of memory to the java process,
> 
>>> and my
>>> filterCache size is 700,000. I'm using a version of Solr between 1.3
> 
>>> and the
>>> current trunk (including the latest SOLR-667 (FastLRUCache) patch),

>>> and
>>> Tomcat 6.0.
>>>
>>> I'm running a ramp-test, increasing the number of users every few  
>>> minutes. I
>>> measure the maximum number of requests that Solr can handle per  
>>> second with
>>> a fixed response time, and call that my throughput. I'd like to see

>>> a single
>>> physical resource be maxed out at some point during my test so I  
>>> know it is
>>> my bottle neck. I generated random queries for my dataset  
>>> representing a
>>> more or less realistic scenario. The queries include faceting by up

>>> to 6
>>> fields, and quering by up to 8 fields.
>>>
>>> I ran a baseline on the un-optimized setup, and saw peak CPU usage  
>>> of about
>>> 50%, IO usage around 5%, and negligible network traffic.  
>>> Interestingly, the
>>> CPU peaked when I had 8 concurrent users, and actually dropped down

>>> to about
>>> 40% when I increased the users beyond 8. Is that because I have 8  
>>> cores?
>>>
>>> I changed a few settings and observed the effect on throughput:
>>>
>>> 1. Increased filterCache size, and throughput increased by about  
>>> 50%, but it
>>> seems to peak.
>>> 2. Put

RE: Throughput Optimization

2008-11-05 Thread Feak, Todd
What are your other cache hit rates looking like?
Which caches are you using the FastLRUCache on?

-Todd Feak

-Original Message-
From: wojtekpia [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 05, 2008 8:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Throughput Optimization


Yes, I am seeing evictions. I've tried setting my filterCache higher,
but
then I start getting Out Of Memory exceptions. My filterCache hit ratio
is >
.99. It looks like I've hit a RAM bound here.

I ran a test without faceting. The response times / throughput were both
significantly higher, there were no evictions from the filter cache, but
I
still wasn't getting > 50% CPU utilization. Any thoughts on what
physical
bound I've hit in this case?



Erik Hatcher wrote:
> 
> One quick question are you seeing any evictions from your  
> filterCache?  If so, it isn't set large enough to handle the faceting

> you're doing.
> 
>   Erik
> 
> 
> On Nov 4, 2008, at 8:01 PM, wojtekpia wrote:
> 
>>
>> I've been running load tests over the past week or 2, and I can't  
>> figure out
>> my system's bottle neck that prevents me from increasing throughput.

>> First
>> I'll describe my Solr setup, then what I've tried to optimize the  
>> system.
>>
>> I have 10 million records and 59 fields (all are indexed, 37 are  
>> stored, 17
>> have termVectors, 33 are multi-valued) which takes about 15GB of  
>> disk space.
>> Most field values are very short (single word or number), and  
>> usually about
>> half the fields have any data at all. I'm running on an 8-core, 64- 
>> bit, 32GB
>> RAM Redhat box. I allocate about 24GB of memory to the java process,

>> and my
>> filterCache size is 700,000. I'm using a version of Solr between 1.3

>> and the
>> current trunk (including the latest SOLR-667 (FastLRUCache) patch),  
>> and
>> Tomcat 6.0.
>>
>> I'm running a ramp-test, increasing the number of users every few  
>> minutes. I
>> measure the maximum number of requests that Solr can handle per  
>> second with
>> a fixed response time, and call that my throughput. I'd like to see  
>> a single
>> physical resource be maxed out at some point during my test so I  
>> know it is
>> my bottle neck. I generated random queries for my dataset  
>> representing a
>> more or less realistic scenario. The queries include faceting by up  
>> to 6
>> fields, and quering by up to 8 fields.
>>
>> I ran a baseline on the un-optimized setup, and saw peak CPU usage  
>> of about
>> 50%, IO usage around 5%, and negligible network traffic.  
>> Interestingly, the
>> CPU peaked when I had 8 concurrent users, and actually dropped down  
>> to about
>> 40% when I increased the users beyond 8. Is that because I have 8  
>> cores?
>>
>> I changed a few settings and observed the effect on throughput:
>>
>> 1. Increased filterCache size, and throughput increased by about  
>> 50%, but it
>> seems to peak.
>> 2. Put the entire index on a RAM disk, and significantly reduced the

>> average
>> response time, but my throughput didn't change (i.e. even though my  
>> response
>> time was 10X faster, the maximum number of requests I could make per

>> second
>> didn't increase). This makes no sense to me, unless there is another

>> bottle
>> neck somewhere.
>> 3. Reduced the number of records in my index. The throughput  
>> increased, but
>> the shape of all my graphs stayed the same, and my CPU usage was  
>> identical.
>>
>> I have a few questions:
>> 1. Can I get more than 50% CPU utilization?
>> 2. Why does CPU utilization fall when I make more than 8 concurrent
>> requests?
>> 3. Is there an obvious bottleneck that I'm missing?
>> 4. Does Tomcat have any settings that affect Solr performance?
>>
>> Any input is greatly appreciated.
>>
>> -- 
>> View this message in context:
>>
http://www.nabble.com/Throughput-Optimization-tp20335132p20335132.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context:
http://www.nabble.com/Throughput-Optimization-tp20335132p20343425.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Throughput Optimization

2008-11-05 Thread Feak, Todd
If you are seeing < 90% CPU usage and are not IO (File or Network)
bound, then you are most probably bound by lock contention. If your CPU
usage goes down as you throw more threads at the box, that's an even
bigger indication that that is the issue.

A good profiling tool should help you locate this. I'm not endorsing it
in any way, but I've use YourKit locally and have been able to see where
the actual contention was coming from. That lead to my interest in the
SOLR-667 cache fixes which provided enormous benefit.

-Todd


-Original Message-
From: wojtekpia [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 05, 2008 8:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Throughput Optimization


Yes, I am seeing evictions. I've tried setting my filterCache higher,
but
then I start getting Out Of Memory exceptions. My filterCache hit ratio
is >
.99. It looks like I've hit a RAM bound here.

I ran a test without faceting. The response times / throughput were both
significantly higher, there were no evictions from the filter cache, but
I
still wasn't getting > 50% CPU utilization. Any thoughts on what
physical
bound I've hit in this case?



Erik Hatcher wrote:
> 
> One quick question are you seeing any evictions from your  
> filterCache?  If so, it isn't set large enough to handle the faceting

> you're doing.
> 
>   Erik
> 
> 
> On Nov 4, 2008, at 8:01 PM, wojtekpia wrote:
> 
>>
>> I've been running load tests over the past week or 2, and I can't  
>> figure out
>> my system's bottle neck that prevents me from increasing throughput.

>> First
>> I'll describe my Solr setup, then what I've tried to optimize the  
>> system.
>>
>> I have 10 million records and 59 fields (all are indexed, 37 are  
>> stored, 17
>> have termVectors, 33 are multi-valued) which takes about 15GB of  
>> disk space.
>> Most field values are very short (single word or number), and  
>> usually about
>> half the fields have any data at all. I'm running on an 8-core, 64- 
>> bit, 32GB
>> RAM Redhat box. I allocate about 24GB of memory to the java process,

>> and my
>> filterCache size is 700,000. I'm using a version of Solr between 1.3

>> and the
>> current trunk (including the latest SOLR-667 (FastLRUCache) patch),  
>> and
>> Tomcat 6.0.
>>
>> I'm running a ramp-test, increasing the number of users every few  
>> minutes. I
>> measure the maximum number of requests that Solr can handle per  
>> second with
>> a fixed response time, and call that my throughput. I'd like to see  
>> a single
>> physical resource be maxed out at some point during my test so I  
>> know it is
>> my bottle neck. I generated random queries for my dataset  
>> representing a
>> more or less realistic scenario. The queries include faceting by up  
>> to 6
>> fields, and quering by up to 8 fields.
>>
>> I ran a baseline on the un-optimized setup, and saw peak CPU usage  
>> of about
>> 50%, IO usage around 5%, and negligible network traffic.  
>> Interestingly, the
>> CPU peaked when I had 8 concurrent users, and actually dropped down  
>> to about
>> 40% when I increased the users beyond 8. Is that because I have 8  
>> cores?
>>
>> I changed a few settings and observed the effect on throughput:
>>
>> 1. Increased filterCache size, and throughput increased by about  
>> 50%, but it
>> seems to peak.
>> 2. Put the entire index on a RAM disk, and significantly reduced the

>> average
>> response time, but my throughput didn't change (i.e. even though my  
>> response
>> time was 10X faster, the maximum number of requests I could make per

>> second
>> didn't increase). This makes no sense to me, unless there is another

>> bottle
>> neck somewhere.
>> 3. Reduced the number of records in my index. The throughput  
>> increased, but
>> the shape of all my graphs stayed the same, and my CPU usage was  
>> identical.
>>
>> I have a few questions:
>> 1. Can I get more than 50% CPU utilization?
>> 2. Why does CPU utilization fall when I make more than 8 concurrent
>> requests?
>> 3. Is there an obvious bottleneck that I'm missing?
>> 4. Does Tomcat have any settings that affect Solr performance?
>>
>> Any input is greatly appreciated.
>>
>> -- 
>> View this message in context:
>>
http://www.nabble.com/Throughput-Optimization-tp20335132p20335132.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context:
http://www.nabble.com/Throughput-Optimization-tp20335132p20343425.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: SOLR Performance

2008-11-04 Thread Feak, Todd
Most desktops nowadays have at least a dual-core and 1GB, you may be
able to get a semi-realistic feel for performance on a local desktop. If
you have access to something meaty in a desktop, you may not have to
spend a dime to find out what it's going to take in a server.

-T

-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 03, 2008 4:25 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR Performance

If you never execute any queries, a gig should be more than enough.

Of course, I've never played around with a .8 billion doc corpus on  
one machine.

-Mike

On 3-Nov-08, at 2:16 PM, Alok Dhir wrote:

> in terms of RAM -- how to size that on the indexer?
>
> ---
> Alok K. Dhir
> Symplicity Corporation
> www.symplicity.com
> (703) 351-0200 x 8080
> [EMAIL PROTECTED]
>
> On Nov 3, 2008, at 4:07 PM, Walter Underwood wrote:
>
>> The indexing box can be much smaller, especially in terms of CPU.
>> It just needs one fast thread and enough disk.
>>
>> wunder
>>
>> On 11/3/08 2:58 PM, "Alok Dhir" <[EMAIL PROTECTED]> wrote:
>>
>>> I was afraid of that.  Was hoping not to need another big fat box  
>>> like
>>> this one...
>>>
>>> ---
>>> Alok K. Dhir
>>> Symplicity Corporation
>>> www.symplicity.com
>>> (703) 351-0200 x 8080
>>> [EMAIL PROTECTED]
>>>
>>> On Nov 3, 2008, at 4:53 PM, Feak, Todd wrote:
>>>
>>>> I believe this is one of the reasons that a master/slave  
>>>> configuration
>>>> comes in handy. Commits to the Master don't slow down queries on  
>>>> the
>>>> Slave.
>>>>
>>>> -Todd
>>>>
>>>> -Original Message-
>>>> From: Alok Dhir [mailto:[EMAIL PROTECTED]
>>>> Sent: Monday, November 03, 2008 1:47 PM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: SOLR Performance
>>>>
>>>> We've moved past this issue by reducing date precision -- thanks to
>>>> all for the help.  Now we're at another problem.
>>>>
>>>> There is relatively constant updating of the index -- new log  
>>>> entries
>>>> are pumped in from several applications continuously.  Obviously,  
>>>> new
>>>> entries do not appear in searches until after a commit occurs.
>>>>
>>>> The problem is, issuing a commit causes searches to come to a
>>>> screeching halt for up to 2 minutes.  We're up to around 80M docs.
>>>> Index size is 27G.  The number of docs will soon be 800M, which
>>>> doesn't bode well for these "pauses" in search performance.
>>>>
>>>> I'd appreciate any suggestions.
>>>>
>>>> ---
>>>> Alok K. Dhir
>>>> Symplicity Corporation
>>>> www.symplicity.com
>>>> (703) 351-0200 x 8080
>>>> [EMAIL PROTECTED]
>>>>
>>>> On Oct 29, 2008, at 4:30 PM, Alok Dhir wrote:
>>>>
>>>>> Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core  
>>>>> machine.
>>>>>
>>>>> Fairly simple schema -- no large text fields, standard request
>>>>> handler.  4 small facet fields.
>>>>>
>>>>> The index is an event log -- a primary search/retrieval  
>>>>> requirement
>>>>> is date range queries.
>>>>>
>>>>> A simple query without a date range subquery is ridiculously  
>>>>> fast -
>>>>> 2ms.  The same query with a date range takes up to 30s (30,000ms).
>>>>>
>>>>> Concrete example, this query just look 18s:
>>>>>
>>>>> instance:client\-csm.symplicity.com AND dt:[2008-10-01T04:00:00Z
>>>> TO
>>>>> 2008-10-30T03:59:59Z] AND label_facet:"Added to Position"
>>>>>
>>>>> The exact same query without the date range took 2ms.
>>>>>
>>>>> I saw a thread from Apr 2008 which explains the problem being  
>>>>> due to
>>>>> too much precision on the DateField type, and the range expansion
>>>>> leading to far too many elements being checked.  Proposed solution
>>>>> appears to be a hack where you index date fields as strings and
>>>>> hacking together date functions to generate proper queries/format
>>>>> results.
>>>>>
>>>>> Does this remain the recommended solution to this issue?
>>>>>
>>>>> Thanks
>>>>>
>>>>> ---
>>>>> Alok K. Dhir
>>>>> Symplicity Corporation
>>>>> www.symplicity.com
>>>>> (703) 351-0200 x 8080
>>>>> [EMAIL PROTECTED]
>>>>>
>>>>
>>>>
>>>
>>
>




RE: SOLR Performance

2008-11-03 Thread Feak, Todd
I believe this is one of the reasons that a master/slave configuration
comes in handy. Commits to the Master don't slow down queries on the
Slave.

-Todd

-Original Message-
From: Alok Dhir [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 03, 2008 1:47 PM
To: solr-user@lucene.apache.org
Subject: SOLR Performance

We've moved past this issue by reducing date precision -- thanks to  
all for the help.  Now we're at another problem.

There is relatively constant updating of the index -- new log entries  
are pumped in from several applications continuously.  Obviously, new  
entries do not appear in searches until after a commit occurs.

The problem is, issuing a commit causes searches to come to a  
screeching halt for up to 2 minutes.  We're up to around 80M docs.   
Index size is 27G.  The number of docs will soon be 800M, which  
doesn't bode well for these "pauses" in search performance.

I'd appreciate any suggestions.

---
Alok K. Dhir
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8080
[EMAIL PROTECTED]

On Oct 29, 2008, at 4:30 PM, Alok Dhir wrote:

> Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core machine.
>
> Fairly simple schema -- no large text fields, standard request  
> handler.  4 small facet fields.
>
> The index is an event log -- a primary search/retrieval requirement  
> is date range queries.
>
> A simple query without a date range subquery is ridiculously fast -  
> 2ms.  The same query with a date range takes up to 30s (30,000ms).
>
> Concrete example, this query just look 18s:
>
>   instance:client\-csm.symplicity.com AND dt:[2008-10-01T04:00:00Z
TO  
> 2008-10-30T03:59:59Z] AND label_facet:"Added to Position"
>
> The exact same query without the date range took 2ms.
>
> I saw a thread from Apr 2008 which explains the problem being due to  
> too much precision on the DateField type, and the range expansion  
> leading to far too many elements being checked.  Proposed solution  
> appears to be a hack where you index date fields as strings and  
> hacking together date functions to generate proper queries/format  
> results.
>
> Does this remain the recommended solution to this issue?
>
> Thanks
>
> ---
> Alok K. Dhir
> Symplicity Corporation
> www.symplicity.com
> (703) 351-0200 x 8080
> [EMAIL PROTECTED]
>




RE: Custom sort (score + custom value)

2008-11-03 Thread Feak, Todd
Have you looked into the "bf" and "bq" arguments on the
DisMaxRequestHandler?

http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=(dismax)#head
-6862070cf279d9a09bdab971309135c7aea22fb3

-Todd

-Original Message-
From: George [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 03, 2008 9:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Custom sort (score + custom value)

Ok Yonik, thank you.

I've tried to execute the following query: "{!boost b=log(myrank)
defType=dismax}q" and it works great.

Do you know if I can do the same (combine a DisjunctionMaxQuery with a
BoostedQuery) in solrconfig.xml?

George

On Sun, Nov 2, 2008 at 3:01 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Sun, Nov 2, 2008 at 5:09 AM, George <[EMAIL PROTECTED]> wrote:
> > I want to implement a custom sort in Solr based on a combination of
> > relevance (Solr gives me it yet => score) and a custom value I've
> calculated
> > previously for each document. I see two options:
> >
> > 1. Use a function query (I'm using a DisMaxRequestHandler).
> > 2. Create a component that set SortSpec with a sort that has a
custom
> > ComparatorSource (similar to QueryElevationComponent).
> >
> > The first option has the problem: While the relevance value changes
for
> > every query, my custom value is constant for each doc.
>
> Yes, that can be an issue when adding unrelated scores.
> Multiplying them might give you better results:
>
>
http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlu
gin.html
>
> -Yonik
>


RE: Performanec Lucene / Solr

2008-10-30 Thread Feak, Todd
I realize you said caching won't help because the searches are
different, but what about Document caching? Is every document returned
different? What's your hit rate on the Document cache? Can you throw
memory at the problem by increasing Document cache size?

I ask all this, as the Document cache was the biggest win for my
application when it came to increasing performance. Hit rates of 50%
resulted in 30% GC time. Hit rates > 95% had GC rates below 2%.

-Todd

-Original Message-
From: Kraus, Ralf | pixelhouse GmbH [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 30, 2008 6:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Performanec Lucene / Solr

Grant Ingersoll schrieb:
> Have you gone through 
> http://wiki.apache.org/solr/SolrPerformanceFactors ?
>
> Can you explain a little more about your testcase, maybe even share 
> code?  I only know a little PHP, but maybe someone else who is better 
> versed might spot something.
I just wrote my JSP script for using solrj instead
performence is much much better now !

Greets -Ralf-



RE: date range query performance

2008-10-29 Thread Feak, Todd
It strikes me that removing just the seconds could very well reduce
overhead to 1/60 of original. 30 second query turns into 500ms query.
Just a swag though.

-Todd

-Original Message-
From: Alok Dhir [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 29, 2008 1:48 PM
To: solr-user@lucene.apache.org
Subject: Re: date range query performance

Well, no - we don't care so much about the seconds, but hours &  
minutes are indeed crucial.

---
Alok K. Dhir
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8080
[EMAIL PROTECTED]

On Oct 29, 2008, at 4:41 PM, Chris Harris wrote:

> Do you need to search down to the minutes and seconds level? If  
> searching by
> date provides sufficient granularity, for instance, you can  
> normalize all
> the time-of-day portions of the timestamps to midnight while  
> indexing. (So
> index any event happening on Oct 01, 2008 as 2008-10-01T00:00:00Z.)  
> That
> would give Solr many fewer unique timestamp values to go through.
>
> On Wed, Oct 29, 2008 at 1:30 PM, Alok Dhir <[EMAIL PROTECTED]>  
> wrote:
>
>> Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core machine.
>>
>> Fairly simple schema -- no large text fields, standard request  
>> handler.  4
>> small facet fields.
>>
>> The index is an event log -- a primary search/retrieval requirement  
>> is date
>> range queries.
>>
>> A simple query without a date range subquery is ridiculously fast -  
>> 2ms.
>> The same query with a date range takes up to 30s (30,000ms).
>>
>> Concrete example, this query just look 18s:
>>
>>   instance:client\-csm.symplicity.com AND dt: 
>> [2008-10-01T04:00:00Z TO
>> 2008-10-30T03:59:59Z] AND label_facet:"Added to Position"
>>
>> The exact same query without the date range took 2ms.
>>
>> I saw a thread from Apr 2008 which explains the problem being due  
>> to too
>> much precision on the DateField type, and the range expansion  
>> leading to far
>> too many elements being checked.  Proposed solution appears to be a  
>> hack
>> where you index date fields as strings and hacking together date  
>> functions
>> to generate proper queries/format results.
>>
>> Does this remain the recommended solution to this issue?
>>
>> Thanks
>>
>> ---
>> Alok K. Dhir
>> Symplicity Corporation
>> www.symplicity.com
>> (703) 351-0200 x 8080
>> [EMAIL PROTECTED]
>>
>>




RE: exceeded limit of maxWarmingSearchers

2008-10-29 Thread Feak, Todd
Have you looked at how long your warm up is taking? 

If it's taking longer to warm up a searcher then it does for you to do
an update, you will be behind the curve and eventually run into this no
matter how big that number.

-Original Message-
From: news [mailto:[EMAIL PROTECTED] On Behalf Of Jon Drukman
Sent: Wednesday, October 29, 2008 11:56 AM
To: solr-user@lucene.apache.org
Subject: exceeded limit of maxWarmingSearchers

I am getting this error quite frequently on my Solr installation:

SEVERE: org.apache.solr.common.SolrException: Error opening new 
searcher. exceeded limit of maxWarmingSearchers=8, try again later.


I've done some googling but the common explanation of it being related 
to autocommit doesn't apply.

Our server is not even in public use yet, it's serving maybe one query 
every second, or less.  I don't understand what could be causing this.

We do a commit on every update, but updates are very infrequent.  One 
every few minutes, and it's a very small update as well.

-jsd-




RE: Question about textTight

2008-10-28 Thread Feak, Todd
You may want to take a very close look at what the WordDelimiterFilter
is doing. I believe the underscore is dropped entirely during indexing
AND searching as it's not alphanumeric.

Wiki doco here
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=(t
okenizer)#head-1c9b83870ca7890cd73b193cefed83c283339089

The admin analysis page and query debug will help a lot to see what's
going on.

-Todd

-Original Message-
From: Stephen Weiss [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 27, 2008 10:32 PM
To: solr-user@lucene.apache.org
Subject: Question about textTight

Hi,

So I've been using the textTight field to hold filenames, and I've run  
into a weird problem.  Basically, people want to search by part of a  
filename (say, the filename is stm0810m_ws_001ftws and they want to  
find everything starting with stm0810m_ (stm0810m_*).  I'm hoping  
someone might have done this before (I bet someone has).

Lots of things work - you can search for stm0810m_ws_001ftws and get a  
result, or (stm 0810 m*), or various other combinations.  What does  
not work, is searching for (stm0810m_*) or (stm 0810 m_*) or anything  
like that - a problem, because often they don't want things with ma_  
or mx_, but just m_.  It's almost like underscores just break  
everything, escaping them does nothing.

Here's the field definition (it should be what came with my solr):

 
   
 
 
 
 
 
 
 
   
 

and usage:




Now, I thought textTight would be good because it's the one best  
suited for SKU's, but I guess I'm wrong.  What should I be using for  
this?  Would changing any of these "generateWordParts" or  
"catenateAll" options help?  I can't seem to find any documentation so  
I'm really not sure what it would do, but reindexing this whole thing  
will take quite some time so I'd rather know what will actually work  
before I just start changing things.

Thanks so much for any insight!

--
Steve



RE: One document inserted but nothing showing up ? SOLR 1.3

2008-10-23 Thread Feak, Todd
Unless "q=ALL" is a special query I don't know about, the only reason you would 
get results is if "ALL" showed up in the default field of the single document 
that was inserted/updated.

You could try a query of "*:*" instead. Don't forget to URL encode if you are 
doing this via URL.

-Todd


-Original Message-
From: sunnyfr [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 23, 2008 9:17 AM
To: solr-user@lucene.apache.org
Subject: One document inserted but nothing showing up ? SOLR 1.3


Hi 

Can somebody help me ?
How can I see all my documents, I just did a full import :

Indexing completed. Added/Updated: 1 documents. Deleted 0 documents.


and when I do :8180/solr/video/select/?q=ALL, I've no result ?

−

0
0
−

ALL





Thanks a lot,

-- 
View this message in context: 
http://www.nabble.com/One-document-inserted-but-nothing-showing-up---SOLR-1.3-tp20134357p20134357.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Re[4]: Question about copyField

2008-10-22 Thread Feak, Todd
My bad. I misunderstood what you wanted. 

The example I gave was for the searching side of things. Not the data
representation in the document.

-Todd

-Original Message-
From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 22, 2008 11:14 AM
To: Feak, Todd
Subject: Re[4]: Question about copyField



FT> I would suggest doing this in your schema, then starting up Solr and
FT> using the analysis admin page to see if it will index and search the
way
FT> you want. That way you don't have to pay the cost of actually
indexing
FT> the data to find out.

Thanks. I did it exactly like you said.

I created a fieldType "ex" (short for experiment), defined
corresponding  and try it on the analysis page. Here is what
I got (I uploaded the page, so you can see it): 

http://tut-i-tam.com.ua/static/analysis.jsp.htm

I want the final token "samsung spinpoint p spn hard drive gb ata" to
be the actual "ex" value. So I expect such response:



 samsung spinpoint p spn hard drive gb
ata
 SP2514N
 Samsung SpinPoint12 P120 SP2514N -
hard drive - 250 GB - ATA-133
 


But when I'm searching this doc, I got this:



 Samsung SpinPoint12 P120 SP2514N - hard
drive - 250 GB - ATA-133
 SP2514N
 Samsung SpinPoint12 P120 SP2514N -
hard drive - 250 GB - ATA-133
 


As you can see "description" and "ex" filed are identical.
The result of filter chain wasn't actually stored in the "ex" filed :(

Anyway, thank you :)

FT> -Todd

FT> -Original Message-
FT> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT> Sent: Wednesday, October 22, 2008 9:24 AM
FT> To: Feak, Todd
FT> Subject: Re[2]: Question about copyField


FT> Thanks for reply. I want to make your point more exact, cause I'm
not
FT> sure that I correctly understood you :)

FT> As far as I know (correct me please, if I wrong) type defines the
way
FT> in which the field is indexed and queried. But I don't want to index
FT> or query "suggestion" field in different way, I want "suggestion"
field
FT> store different value (like in example I wrote in first mail). 

FT> So you are saying that I can tell to slor (using filedType) how solr
FT> should process string before saving it? Yes?

FT>> The filters and tokenizer that are applied to the copy field are
FT>> determined by it's type in the schema. Simply create a new field
FT> type in
FT>> your schema with the filters you would like, and use that type for
FT> your
FT>> copy field. So, the field description would have it's old type, but
FT> the
FT>> field suggestion would get a new type.

FT>> -Todd Feak

FT>> -Original Message-
FT>> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT>> Sent: Wednesday, October 22, 2008 8:28 AM
FT>> To: solr-user@lucene.apache.org
FT>> Subject: Question about copyField


FT>> Hello.

FT>> I have field "description" in my schema. And I want make a filed
FT>> "suggestion" with the same content. So I added following line to my
FT>> schema.xml:

FT>>

FT>> But I also want to modify "description" string before copying it to
FT>> "suggestion" field. I want to remove all comas, dots and slashes.
FT> Here
FT>> is an example of such transformation:

FT>> "TvPL/st, SAMSUNG, SML200"  => "TvPL st SAMSUNG SML200"

FT>> And so as result I want to have such doc:

FT>> 
FT>>  8asydauf9nbcngfaad
FT>>  TvPL/st, SAMSUNG, SML200
FT>>  TvPL st SAMSUNG SML200
FT>> 

FT>> I think it would be nice to use solr.PatternReplaceFilterFactory
for
FT>> this purpose. So the question is: Can I use solr filters for
FT>> processing "description" string before copying it to "suggestion"
FT>> field?

FT>> Thank you for your attention.







-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey mailto:[EMAIL PROTECTED]




RE: Re[2]: Question about copyField

2008-10-22 Thread Feak, Todd
Yes, using fieldType, you can have Solr run the PatternReplaceFilter for
you.

So, for example, you can declare something like this:
--
  
...


  
... Put the PatternReplaceFilter in here. At least for indexing, maybe
for query as well

...


...
 
---

I would suggest doing this in your schema, then starting up Solr and
using the analysis admin page to see if it will index and search the way
you want. That way you don't have to pay the cost of actually indexing
the data to find out.

-Todd

-Original Message-
From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 22, 2008 9:24 AM
To: Feak, Todd
Subject: Re[2]: Question about copyField


Thanks for reply. I want to make your point more exact, cause I'm not
sure that I correctly understood you :)

As far as I know (correct me please, if I wrong) type defines the way
in which the field is indexed and queried. But I don't want to index
or query "suggestion" field in different way, I want "suggestion" field
store different value (like in example I wrote in first mail). 

So you are saying that I can tell to slor (using filedType) how solr
should process string before saving it? Yes?

FT> The filters and tokenizer that are applied to the copy field are
FT> determined by it's type in the schema. Simply create a new field
type in
FT> your schema with the filters you would like, and use that type for
your
FT> copy field. So, the field description would have it's old type, but
the
FT> field suggestion would get a new type.

FT> -Todd Feak

FT> -Original Message-
FT> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT> Sent: Wednesday, October 22, 2008 8:28 AM
FT> To: solr-user@lucene.apache.org
FT> Subject: Question about copyField


FT> Hello.

FT> I have field "description" in my schema. And I want make a filed
FT> "suggestion" with the same content. So I added following line to my
FT> schema.xml:

FT>

FT> But I also want to modify "description" string before copying it to
FT> "suggestion" field. I want to remove all comas, dots and slashes.
Here
FT> is an example of such transformation:

FT> "TvPL/st, SAMSUNG, SML200"  => "TvPL st SAMSUNG SML200"

FT> And so as result I want to have such doc:

FT> 
FT>  8asydauf9nbcngfaad
FT>  TvPL/st, SAMSUNG, SML200
FT>  TvPL st SAMSUNG SML200
FT> 

FT> I think it would be nice to use solr.PatternReplaceFilterFactory for
FT> this purpose. So the question is: Can I use solr filters for
FT> processing "description" string before copying it to "suggestion"
FT> field?

FT> Thank you for your attention.




-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey mailto:[EMAIL PROTECTED]




RE: Question about copyField

2008-10-22 Thread Feak, Todd
The filters and tokenizer that are applied to the copy field are
determined by it's type in the schema. Simply create a new field type in
your schema with the filters you would like, and use that type for your
copy field. So, the field description would have it's old type, but the
field suggestion would get a new type.

-Todd Feak

-Original Message-
From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 22, 2008 8:28 AM
To: solr-user@lucene.apache.org
Subject: Question about copyField


Hello.

I have field "description" in my schema. And I want make a filed
"suggestion" with the same content. So I added following line to my
schema.xml:

   

But I also want to modify "description" string before copying it to
"suggestion" field. I want to remove all comas, dots and slashes. Here
is an example of such transformation:

"TvPL/st, SAMSUNG, SML200"  => "TvPL st SAMSUNG SML200"

And so as result I want to have such doc:


 8asydauf9nbcngfaad
 TvPL/st, SAMSUNG, SML200
 TvPL st SAMSUNG SML200


I think it would be nice to use solr.PatternReplaceFilterFactory for
this purpose. So the question is: Can I use solr filters for
processing "description" string before copying it to "suggestion"
field?

Thank you for your attention.

-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey




RE: solr1.3 / tomcat55 / MySql but character_set_client && character_set_connection LATIN1

2008-10-21 Thread Feak, Todd
Any chance this is a MySql server configuration issue?

http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html

-Todd

-Original Message-
From: sunnyfr [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 21, 2008 1:09 PM
To: solr-user@lucene.apache.org
Subject: Re: solr1.3 / tomcat55 / MySql but character_set_client &&
character_set_connection LATIN1


Any idea,? What can I do?


sunnyfr wrote:
> 
> Hi,
> 
> How can I do to manage that ?? 
> | character_set_client| latin1

> | 
> | character_set_connection| latin1

> | 
> | character_set_database  | utf8

> | 
> | character_set_filesystem| binary

> | 
> | character_set_results   | latin1

> | 
> | character_set_server| utf8

> | 
> | character_set_system| utf8

> | 
> | character_sets_dir  |
> /usr/local/mysql-5.0.51b-sphinx/share/mysql/charsets/ | 
> | collation_connection| latin1_swedish_ci

> | 
> | collation_database  | utf8_general_ci

> | 
> | collation_server| utf8_general_ci
> 
> Thanks a lot,
> 
> 

-- 
View this message in context:
http://www.nabble.com/solr1.3---tomcat55---MySql-but-character_set_clien
tcharacter_set_connection---LATIN1-tp20090455p20098329.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Problem implementing a BinaryQueryResponseWriter

2008-10-21 Thread Feak, Todd
I do have that in my config. It's existence doesn't seem to affect this
particular issue. I've tried it with and without.

-Todd

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 20, 2008 4:36 PM
To: solr-user@lucene.apache.org
Subject: Re: Problem implementing a BinaryQueryResponseWriter

do you have handleSelect set to true in solrconfig?

   
...

if not, it would use a Servlet that is now deprecated



On Oct 20, 2008, at 4:52 PM, Feak, Todd wrote:

> I found out what's going on.
>
> My test queries from existing Solr (not 1.3.0) that I am using have  
> *2*
> "select" in the URL. http://host:port/select/select?q=foo . Not sure
> why, but that's a separate issue. The result is that it is following a
> codepath that bypasses this decision point, and it falls back on
> something that assumes it will *not* be a BinaryQueryResponseWriter,
> even though it does correctly locate and use my new writer.
>
> The solution was to map /select/select to a new handler.
>
> Not sure if this raises another issue or not, but for me it solves the
> problem. Thanks for the help.
>
> -Todd
>
> -Original Message-
> From: Grant Ingersoll [mailto:[EMAIL PROTECTED]
> Sent: Monday, October 20, 2008 1:09 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Problem implementing a BinaryQueryResponseWriter
>
> I'd start by having a look at SolrDispatchFilter and put in a debug
> breakpoint at:
>
>   QueryResponseWriter responseWriter =
> core.getQueryResponseWriter(solrReq);
>
> response.setContentType(responseWriter.getContentType(solrReq,
> solrRsp));
>   if (Method.HEAD != reqMethod) {
> if (responseWriter instanceof
> BinaryQueryResponseWriter) {
>   BinaryQueryResponseWriter binWriter =
> (BinaryQueryResponseWriter) responseWriter;
>   binWriter.write(response.getOutputStream(),
> solrReq, solrRsp);
> } else {
>   PrintWriter out = response.getWriter();
>   responseWriter.write(out, solrReq, solrRsp);
>
> }
>
>
> On Oct 20, 2008, at 3:59 PM, Feak, Todd wrote:
>
>> Yes.
>>
>> I've gotten it to the point where my class is called, but the wrong
>> method on it is called.
>>
>> -Todd
>>
>> -Original Message-
>> From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
>> Sent: Monday, October 20, 2008 12:19 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Problem implementing a BinaryQueryResponseWriter
>>
>> Hi Todd,
>>
>> Did you add your response writer in solrconfig.xml?
>>
>> > class="org.apache.solr.request.XMLResponseWriter" default="true"/>
>>
>> On Mon, Oct 20, 2008 at 9:35 PM, Feak, Todd <[EMAIL PROTECTED]>
>> wrote:
>>
>>> I switched from dev group for this specific question, in case other
>>> users have similar issue.
>>>
>>>
>>>
>>> I'm implementing my own BinaryQueryResponseWriter. I've implemented
>> the
>>> interface and successfully plugged it into the Solr configuration.
>>> However, the application always calls the Writer method on the
>> interface
>>> instead of the OutputStream method.
>>>
>>>
>>>
>>> So, how does Solr determine *which* one to call? Is there a setting
>>> somewhere I am missing maybe?
>>>
>>>
>>>
>>> For troubleshooting purposes, I am using 1.3.0 release version. If I
>> try
>>> using the BinaryResponseWriter (javabin) as the wt, I get the
>> exception
>>> indicating that Solr is doing the same thing with that writer as
>>> well.
>>> This leads me to believe I am somehow misconfigured, OR this isn't
>>> supported with 1.3.0 release.
>>>
>>>
>>>
>>> -Todd
>>>
>>>
>>
>>
>> -- 
>> Regards,
>> Shalin Shekhar Mangar.
>
> --
> Grant Ingersoll
> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
> http://www.lucenebootcamp.com
>
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>




RE: Problem implementing a BinaryQueryResponseWriter

2008-10-20 Thread Feak, Todd
I found out what's going on. 

My test queries from existing Solr (not 1.3.0) that I am using have *2*
"select" in the URL. http://host:port/select/select?q=foo . Not sure
why, but that's a separate issue. The result is that it is following a
codepath that bypasses this decision point, and it falls back on
something that assumes it will *not* be a BinaryQueryResponseWriter,
even though it does correctly locate and use my new writer.

The solution was to map /select/select to a new handler.

Not sure if this raises another issue or not, but for me it solves the
problem. Thanks for the help.

-Todd

-Original Message-
From: Grant Ingersoll [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 20, 2008 1:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Problem implementing a BinaryQueryResponseWriter

I'd start by having a look at SolrDispatchFilter and put in a debug  
breakpoint at:

QueryResponseWriter responseWriter =  
core.getQueryResponseWriter(solrReq);

response.setContentType(responseWriter.getContentType(solrReq,  
solrRsp));
   if (Method.HEAD != reqMethod) {
 if (responseWriter instanceof  
BinaryQueryResponseWriter) {
   BinaryQueryResponseWriter binWriter =  
(BinaryQueryResponseWriter) responseWriter;
   binWriter.write(response.getOutputStream(),  
solrReq, solrRsp);
 } else {
   PrintWriter out = response.getWriter();
   responseWriter.write(out, solrReq, solrRsp);

 }


On Oct 20, 2008, at 3:59 PM, Feak, Todd wrote:

> Yes.
>
> I've gotten it to the point where my class is called, but the wrong
> method on it is called.
>
> -Todd
>
> -Original Message-
> From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> Sent: Monday, October 20, 2008 12:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Problem implementing a BinaryQueryResponseWriter
>
> Hi Todd,
>
> Did you add your response writer in solrconfig.xml?
>
>  class="org.apache.solr.request.XMLResponseWriter" default="true"/>
>
> On Mon, Oct 20, 2008 at 9:35 PM, Feak, Todd <[EMAIL PROTECTED]>
> wrote:
>
>> I switched from dev group for this specific question, in case other
>> users have similar issue.
>>
>>
>>
>> I'm implementing my own BinaryQueryResponseWriter. I've implemented
> the
>> interface and successfully plugged it into the Solr configuration.
>> However, the application always calls the Writer method on the
> interface
>> instead of the OutputStream method.
>>
>>
>>
>> So, how does Solr determine *which* one to call? Is there a setting
>> somewhere I am missing maybe?
>>
>>
>>
>> For troubleshooting purposes, I am using 1.3.0 release version. If I
> try
>> using the BinaryResponseWriter (javabin) as the wt, I get the
> exception
>> indicating that Solr is doing the same thing with that writer as  
>> well.
>> This leads me to believe I am somehow misconfigured, OR this isn't
>> supported with 1.3.0 release.
>>
>>
>>
>> -Todd
>>
>>
>
>
> -- 
> Regards,
> Shalin Shekhar Mangar.

--
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ












RE: Problem implementing a BinaryQueryResponseWriter

2008-10-20 Thread Feak, Todd
Yes. 

I've gotten it to the point where my class is called, but the wrong
method on it is called.

-Todd

-Original Message-
From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 20, 2008 12:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Problem implementing a BinaryQueryResponseWriter

Hi Todd,

Did you add your response writer in solrconfig.xml?



On Mon, Oct 20, 2008 at 9:35 PM, Feak, Todd <[EMAIL PROTECTED]>
wrote:

> I switched from dev group for this specific question, in case other
> users have similar issue.
>
>
>
> I'm implementing my own BinaryQueryResponseWriter. I've implemented
the
> interface and successfully plugged it into the Solr configuration.
> However, the application always calls the Writer method on the
interface
> instead of the OutputStream method.
>
>
>
> So, how does Solr determine *which* one to call? Is there a setting
> somewhere I am missing maybe?
>
>
>
> For troubleshooting purposes, I am using 1.3.0 release version. If I
try
> using the BinaryResponseWriter (javabin) as the wt, I get the
exception
> indicating that Solr is doing the same thing with that writer as well.
> This leads me to believe I am somehow misconfigured, OR this isn't
> supported with 1.3.0 release.
>
>
>
> -Todd
>
>


-- 
Regards,
Shalin Shekhar Mangar.


RE: Japonish language seems to don't work on solr 1.3

2008-10-20 Thread Feak, Todd
I would look real closely at the data between MySQL and Solr. I don't
know how it got from the database to the index, but I would try and get
a debugger running and look at the actual data as it's moving along.

Possible suspects include, JDBC driver, JDBC driver settings, HTTP
client (whatever sends the data to Solr).

Also, you could play around with the Admin analysis page to make sure
it's not cropping up in one of the Tokenizers or Analyzers. But I saw
you are using CJK, which most probably doesn't have this issue.

-Todd

-Original Message-
From: sunnyfr [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 20, 2008 9:40 AM
To: solr-user@lucene.apache.org
Subject: RE: Japonish language seems to don't work on solr 1.3


So maybe when I import my data from mysql I loose it.
??


sunnyfr wrote:
> 
> I did create again my index ? I'm using mySql when I request japon
video
> I've got result correctly.
> And yes I did try to index again data, it takes one minute so it's not
a
> problem, but now I don't know what can I do?
> 
> 

-- 
View this message in context:
http://www.nabble.com/Japan-language-seems-to-don%27t-work-on-solr-1.3-t
p20070938p20073767.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Japonish language seems to don't work on solr 1.3

2008-10-20 Thread Feak, Todd
That looks like the data in the index is incorrectly encoded. 

If the inserts into your index came in via HTTP GET and your Tomcat wasn't 
configured for UTF-8 at the time, I could see it going into the index 
corrupted. But I'm not sure if that's even possible (depends on Update)

Is it hard to re-create your index after that configuration change? If it's a 
quick thing to do, it may be worth doing again to eliminate as a possibility.

-Todd Feak

-Original Message-
From: sunnyfr [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 20, 2008 9:05 AM
To: solr-user@lucene.apache.org
Subject: RE: Japonish language seems to don't work on solr 1.3


Hi Todd, 

It does definitly work better, it was the server.xml file, sorry I should
have checked, but I still have a dodgy problem, it's like it doesn't encode
it in the good way,
Because if I'm looking for straight in the URL
... :8180/solr/video/select/?q=豐田真奈美
My result is :


0
0

豐田真奈美





And if I look for :
:8180/solr/video/select/?q=ALL
My result is :


0
0

ALL




2006-10-10T05:29:32Z

All Japan Women's Pro-wrestling
WWWA Champion Title Match
豐田真奈美 VS 井上京子


813343
JA
40

Toyota Manami VS Inoue Kyoko

1421
false
false
2008-10-20T15:57:27.197Z
Toyota Manami VS Inoue Kyoko


This : 豐田真奈美 VS 井上京子  Should be  豐田真奈美 
An Idea ?

Thanks a lot :)

-- 
View this message in context: 
http://www.nabble.com/Japan-language-seems-to-don%27t-work-on-solr-1.3-tp20070938p20073108.html
Sent from the Solr - User mailing list archive at Nabble.com.




Problem implementing a BinaryQueryResponseWriter

2008-10-20 Thread Feak, Todd
I switched from dev group for this specific question, in case other
users have similar issue.

 

I'm implementing my own BinaryQueryResponseWriter. I've implemented the
interface and successfully plugged it into the Solr configuration.
However, the application always calls the Writer method on the interface
instead of the OutputStream method.

 

So, how does Solr determine *which* one to call? Is there a setting
somewhere I am missing maybe? 

 

For troubleshooting purposes, I am using 1.3.0 release version. If I try
using the BinaryResponseWriter (javabin) as the wt, I get the exception
indicating that Solr is doing the same thing with that writer as well.
This leads me to believe I am somehow misconfigured, OR this isn't
supported with 1.3.0 release.

 

-Todd



RE: Japonish language seems to don't work on solr 1.3

2008-10-20 Thread Feak, Todd
Two potential issues I see there.

1. Shouldn't your query string on the URL be encoded? 

2. Are you using Tomcat, and did you set it up to use UTF-8 encoding? If not, 
your connector node in Tomcat needs to have the URIEncoding set to UTF-8. 
Documentation here 
http://struts.apache.org/2.0.11.2/docs/how-to-support-utf-8-uriencoding-with-tomcat.html

-Todd Feak

-Original Message-
From: sunnyfr [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 20, 2008 8:06 AM
To: solr-user@lucene.apache.org
Subject: Japonish language seems to don't work on solr 1.3


Hi, I don't get what am I doing wrong but when I request :
.com:8180/solr/video/select/?q=初恋+-+村下孝蔵&version=2.2&start=0&rows=10&indent=on

my result is :

−

0
0
−

on
0
初恋 - 村下孝蔵
10
2.2


−

−

2006-09-05T11:20:52Z
612530
JA
150
−

PUSHIM, RHYMESTER, MABOROSHI, May J.

21049
false
false
2008-10-20T14:58:30.799Z


My schema is :



  

  


...
 



...


-- 
View this message in context: 
http://www.nabble.com/Japonish-language-seems-to-don%27t-work-on-solr-1.3-tp20070938p20070938.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Lucene 2.4 released

2008-10-15 Thread Feak, Todd
The current Subversion trunk has the new Lucene 2.4.0 libraries
committed. So, it's definitely under way.

-Todd

-Original Message-
From: Julio Castillo [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 15, 2008 9:48 AM
To: solr-user@lucene.apache.org
Subject: Lucene 2.4 released

Any ideas when solr 1.3 can be patched to use the official release of
Lucene
(rather than a Lucene snapshot)?

Should I submit a JIRA request?

thanks

Julio Castillo
Edgenuity Inc.




RE: Practical number of Solr instances per machine

2008-10-14 Thread Feak, Todd
Sorry Yonik, I hope this didn't come off as criticism. 

Far from it. We are very happy with the performance we are getting. I
just happen to be the performance junkie trying to get every little bit
out.

That being said, I'm happy to hear it's going to get even better!

-Todd


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Tuesday, October 14, 2008 1:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Practical number of Solr instances per machine

On Tue, Oct 14, 2008 at 4:29 PM, Feak, Todd <[EMAIL PROTECTED]>
wrote:
> In our load testing, the limit for utilizing all of the processor time
> on a box was locking (synchronize, mutex, monitor, pick one). There
were
> a couple of locking points that we saw.
>
> 1. Lucene's locking on the index for simultaneous read/write
protection.
> 2. Solr's locking on the LRUCaches for update protection.

Luckily, both of these are very close to being improved:

1.  Lucene 2.4 has NIO support (lockless) except for Windows, and
there is already a Solr patch to add support for that.

2.  Solr already has a patch (soon to be committed) for an LRUCache
based on ConcurrentHashMap that should work better with multiple CPUs.

-Yonik



RE: Practical number of Solr instances per machine

2008-10-14 Thread Feak, Todd
In our load testing, the limit for utilizing all of the processor time
on a box was locking (synchronize, mutex, monitor, pick one). There were
a couple of locking points that we saw.

1. Lucene's locking on the index for simultaneous read/write protection.
2. Solr's locking on the LRUCaches for update protection.

If you've gotten Solr configured to the point where *most* of your work
is done in memory, then multiple instances of Solr would essentially
distribute this locking and create less contention enabling you to
utilize more of the CPU. This assumes that the creation of another JVM
won't hinder your in memory caching.

Please note, this was only for *our* Solr configuration. It doesn't
necessarily reflect anyone else's configuration. It does, however,
provide at least one scenario where multiple instances could increase
performance.

-Todd



-Original Message-
From: Phillip Farber [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 14, 2008 12:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Practical number of Solr instances per machine

Otis, you have a good memory :-)  I guess the main thing that prompted 
my question me was Mike Klass' statement that he runs 2 instance per 
machine to "squeeze" performance out of the box.  That raised the 
question in my mind as to just how this could benefit performance over a

single instance in one box.

Phil



Otis Gospodnetic wrote:
> Hi,
> 
> Did you not ask this question a while back?  I may be mixing things...
(hah, no, just checked)
> In short, it depends on a number of factors, such as index sizes,
query rates, complexity of queries, amount of RAM, your target query
latency, etc. etc.  So there is no super clear cut answer.  If you have
some concrete numbers, that will be easier to answer :)
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: Phillip Farber <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, October 8, 2008 5:34:58 PM
>> Subject: Practical number of Solr instances per machine
>>
>>
>> Hello everyone,
>>
>> What is the generally accepted number of solr instances it makes
sense 
>> to run on a single machine given solr/lucene threading? Servers now 
>> commonly have 4 or 8 cpus.  Obviously the more instances you run the 
>> bigger your JVM heap needs to be and that takes away from OS cache.
Is 
>> the  sweet spot just one instance per machine?  What is the right way
to 
>> think about this issue?
>>
>> Thanks,
>>
>> Phil
>