Re: improve performance after commit

2007-03-07 Thread Kaan Erdener

On Mar 7, 2007, at 11:34 AM, Chris Hostetter wrote:


: back in just now. Here's an example trying to warm using a sort on
: field name "subject". I tried query of
: "allMessageContent:trying;subject+asc" as well as
: "allMessageContent:trying;subject" (without "+asc") - either way

when expressing params in XML (either as init params for a request
handler, or in a QuerySenderListener the params don't need to be URL
escaped ... they just need to be XML escaped, try something like...


 
   
 
 allMessageContent:test; subject asc
 0
 10
 
   
 

-Hoss


Thanks to you and Ryan for that suggestion, that was indeed the  
problem. Using a warming query of "allMessageContent:trying;subject  
asc" (without my hand-escaped whitespace) worked great.


In the end, this is what I've got in my solrconfig.xml, and the  
overall query performance is now consistently fast, even after  
post'ing a commit message.



  
 text:trying;date asc name="start">0 50 
 text:trying;refId asc name="start">0 50 
 text:trying;subject asc name="start">0 50 
 text:trying;name asc name="start">0 50 

  



  
 text:trying;date asc name="start">0 50 
 text:trying;refId asc name="start">0 50 
 text:trying;subject asc name="start">0 50 
 text:trying;name asc name="start">0 50 

  


Thanks again,
Kaan


Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace
Managed Hosting. Any dissemination, distribution or copying of the enclosed
material is prohibited. If you receive this transmission in error, please
notify us immediately by e-mail at [EMAIL PROTECTED], and delete the
original message. Your cooperation is appreciated.



Re: improve performance after commit

2007-03-06 Thread Kaan Erdener


On Mar 6, 2007, at 9:50 PM, Yonik Seeley wrote:


On 3/6/07, Kaan Erdener <[EMAIL PROTECTED]> wrote:

From what I can see in the logs, these are both invoked after the
commit. However, the query times after a commit are still slow
(around 20 seconds).


Your warming script didn't do any sorts.
Why don't you also show the part of the log with the slow query...
that would make it much easier for people to help.

-Yonik


Right, I had some initially, but Solr threw exceptions. I put them  
back in just now. Here's an example trying to warm using a sort on  
field name "subject". I tried query of  
"allMessageContent:trying;subject+asc" as well as  
"allMessageContent:trying;subject" (without "+asc") - either way  
throws an exception. Both are shown below. The generated exception  
isn't clear (not to me, anyway), and I didn't find any examples of  
this elsewhere for reference. What's the correct way to specify a  
sort and direction when setting up a listener?


thanks,
Kaan

 allMessageContent:test;subject+asc name="start">0 50 


Mar 6, 2007 11:47:27 PM org.apache.solr.core.SolrCore execute
INFO: rows=50&start=0&q=allMessageContent:test;subject%2Basc 0 4
Mar 6, 2007 11:47:27 PM org.apache.solr.core.QuerySenderListener  
newSearcher

INFO: QuerySenderListener done.
Mar 6, 2007 11:47:27 PM org.apache.solr.core.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.solr.search.QueryParsing.parseSort 
(QueryParsing.java:189)
at  
org.apache.solr.request.StandardRequestHandler.handleRequest 
(StandardRequestHandler.java:115)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
at org.apache.solr.core.QuerySenderListener.newSearcher 
(QuerySenderListener.java:51)

at org.apache.solr.core.SolrCore$3.call(SolrCore.java:451)
at java.util.concurrent.FutureTask$Sync.innerRun 
(FutureTask.java:269)

at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask 
(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run 
(ThreadPoolExecutor.java:675)

at java.lang.Thread.run(Thread.java:595)

 allMessageContent:test;subject+asc name="start">0 50 


Mar 6, 2007 11:51:57 PM org.apache.solr.core.SolrCore execute
INFO: rows=50&start=0&q=allMessageContent:test;subject 0 2
Mar 6, 2007 11:51:57 PM org.apache.solr.core.QuerySenderListener  
newSearcher

INFO: QuerySenderListener done.
Mar 6, 2007 11:51:57 PM org.apache.solr.core.SolrException log
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.solr.search.QueryParsing.parseSort 
(QueryParsing.java:189)
at  
org.apache.solr.request.StandardRequestHandler.handleRequest 
(StandardRequestHandler.java:115)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
at org.apache.solr.core.QuerySenderListener.newSearcher 
(QuerySenderListener.java:51)

at org.apache.solr.core.SolrCore$3.call(SolrCore.java:451)
at java.util.concurrent.FutureTask$Sync.innerRun 
(FutureTask.java:269)

at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask 
(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run 
(ThreadPoolExecutor.java:675)

at java.lang.Thread.run(Thread.java:595)



Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace
Managed Hosting. Any dissemination, distribution or copying of the enclosed
material is prohibited. If you receive this transmission in error, please
notify us immediately by e-mail at [EMAIL PROTECTED], and delete the
original message. Your cooperation is appreciated.



Re: improve performance after commit

2007-03-06 Thread Kaan Erdener


On Mar 6, 2007, at 1:55 PM, Yonik Seeley wrote:


On 3/6/07, Kaan Erdener <[EMAIL PROTECTED]> wrote:

I'm looking for some tips / suggestions around reducing the query
time for Solr after I've post'ed a commit request. My Lucene index
contains around 2,000,000 documents, and I have a job that
periodically removes artibrary documents from Lucene and replaces
them with fresh copies from a database. Whenever that cycle occurs, I
send a commit to Solr to expose the updates. The problem is that
immediately after the commit, a Solr query that previously took
5-20ms now takes 20-25 seconds. Ouch.


If this is a normal query (no faceting) then most likely the time  
is spent

populating a lucene FieldCache entry used for sorting results.
Put a static warming entry in solrconfig.xml that queries for a small
number of documents and sorts that query by all the fields you
commonly sort by.

-Yonik


I'm not exactly sure this is what you meant, but I did some more  
research and it looks close. I added the following to my solrconfig.xml:



  
 allMessageContent:test name="start">0 10 

  


and also:


  
 allMessageContent:trying name="start">0 10 

  


From what I can see in the logs, these are both invoked after the  
commit. However, the query times after a commit are still slow  
(around 20 seconds). I'm guessing I didn't set up the warming  
correctly? I had some sorting parameters in there, but the syntax was  
wrong, produced errors on startup, so I took them out for now.


Mar 6, 2007 4:51:52 PM org.apache.solr.update.DirectUpdateHandler2  
commit

INFO: end_commit_flush
Mar 6, 2007 4:51:52 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
documentCache 
{lookups=10,hits=0,hitratio=0.00,inserts=20,evictions=0,size=20,cumulati 
ve_lookups=120,cumulative_hits=68,cumulative_hitratio=0.56,cumulative_in 
serts=52,cumulative_evictions=0}

Mar 6, 2007 4:51:52 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
documentCache 
{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_ 
lookups=120,cumulative_hits=68,cumulative_hitratio=0.56,cumulative_inser 
ts=52,cumulative_evictions=0}
Mar 6, 2007 4:51:52 PM org.apache.solr.core.QuerySenderListener  
newSearcher

INFO: QuerySenderListener sending requests to [EMAIL PROTECTED] main
Mar 6, 2007 4:51:52 PM org.apache.solr.core.SolrCore execute
INFO: rows=10&start=0&q=allMessageContent:trying 0 410
Mar 6, 2007 4:51:52 PM org.apache.solr.core.QuerySenderListener  
newSearcher




Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace
Managed Hosting. Any dissemination, distribution or copying of the enclosed
material is prohibited. If you receive this transmission in error, please
notify us immediately by e-mail at [EMAIL PROTECTED], and delete the
original message. Your cooperation is appreciated.



improve performance after commit

2007-03-06 Thread Kaan Erdener

hello,

I'm looking for some tips / suggestions around reducing the query  
time for Solr after I've post'ed a commit request. My Lucene index  
contains around 2,000,000 documents, and I have a job that  
periodically removes artibrary documents from Lucene and replaces  
them with fresh copies from a database. Whenever that cycle occurs, I  
send a commit to Solr to expose the updates. The problem is that  
immediately after the commit, a Solr query that previously took  
5-20ms now takes 20-25 seconds. Ouch.


I know that commit can be expensive, although I don't know by how  
much, or what I might do to mitigate the expense. I haven't much doc  
around this topic. I've also tried different cache settings  
(basically using high values for cache and auto-warm sizes) but that  
doesn't seem to make much of a difference.


I'll keep investigating on my own, but if anyone has any suggestions  
or additional info, I would greatly appreciate it.


thanks,
Kaan


Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace
Managed Hosting. Any dissemination, distribution or copying of the enclosed
material is prohibited. If you receive this transmission in error, please
notify us immediately by e-mail at [EMAIL PROTECTED], and delete the
original message. Your cooperation is appreciated.



Re: changes in Lucene not visible through Solr

2006-11-28 Thread Kaan Erdener
I'm glad I asked. I probably wouldn't have discovered that on my  
own... :)


This worked great:
curl http://localhost:8983/solr/update --data-binary ''

thanks,
Kaan

On Nov 29, 2006, at 12:31 AM, Mike Klaas wrote:


On 11/28/06, Kaan Erdener <[EMAIL PROTECTED]> wrote:


I thought it might be a caching issue, but I have all of the cache
options disabled in solfconfig.xml and the problem persists. I also
ran Lucene optimization while Solr was running, but again no fix. If
anyone has any suggestions for configuring / poking Solr somehow so
that it will see new changes in Lucene, please let me know.


Changes to the lucene index are not visible until you perform
''.  This is true regardless of whether you are modifying the
index directly or through solr's xml interface.

regards,
-MIke




changes in Lucene not visible through Solr

2006-11-28 Thread Kaan Erdener

hello,

I'm pulling data into Lucene several times an hour, approaching a  
total document count of  ~2 million. Sometimes I pull in brand new  
data, other times I replace an existing document with an updated  
copy. The number of documents that I update in Lucene will pretty  
much never be more than a thousand or so.


I have a Solr interface exposed to another part of our system, and  
it's basically sitting on top of Lucene as a read-only view into the  
index. I can perform updates and optimizations in Lucene and Solr  
will keep searching just fine, but I've discovered that changes in  
Lucene are not visible through Solr.


For example, say there is no matching document in Lucene for id=1000,  
so if I query Solr using id:1000, I will correctly find 0 matches.  
But then I import new data into Lucene, pulling in a new document  
where id=1000. At that point, the query for id:1000 should find one  
match, but it doesn't (0 still). If I bounce Solr, I can see the  
results just fine.


I thought it might be a caching issue, but I have all of the cache  
options disabled in solfconfig.xml and the problem persists. I also  
ran Lucene optimization while Solr was running, but again no fix. If  
anyone has any suggestions for configuring / poking Solr somehow so  
that it will see new changes in Lucene, please let me know.


cheers,
Kaan