Re: JVM random crashes
It will probably turn out to be a hardware problem - a bad RAM chip. I removed it and today I will test Solr again to make sure everything is fine. On 3/5/07, Bill Au [EMAIL PROTECTED] wrote: Seems like this maybe a JVM bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6500147 http://forum.java.sun.com/thread.jspa?threadID=659990messageID=3876052 Have you tried using a different garbage collector? Bill On 3/3/07, Jed Reynolds [EMAIL PROTECTED] wrote: Yonik Seeley wrote: On 3/3/07, Dimitar Ouzounov [EMAIL PROTECTED] wrote: But what hardware problem could it be? Tomorrow I'll make sure that the memory is fine, but nothing else comes to my mind. Memory, motherboard, etc. Try http://www.memtest86.com/ to test this. It may be OS-related - probably a buggy version of some library. But which library? Yep, we've seen that in the past. I'd recommend going with OS versions that vendors test with. The commercial RHEL or the free clone of it http://www.centos.org/, would be my recommendation. I'm running a lot of CentOS 4.4 myself, on i686 and x86_64 processors. I'm testing out Solr on an i686 with JDK 1.5 and I'm running a production copy of Nutch on x86_64 JDK 1.5, Tomcat 1.5. It's been rock solid. From trying to install Java in the past on FC5, I read a lot about how you had to be rather careful to make absolutely certain that you had no conflicting gjc libs in your path. If this is a production box, I'd got with a longer-supported OS than FC6. If the server is only for searching and apache, I don't think FC6 will give you any noticeable performance boost over CentOS 4.4. FC6's performance enhancements with glibc-hash-binding won't affect a JVM. Jed
Re: problem with solr.HTMLStripWhitespaceTokenizerFactory
On 3/6/07, mike topper [EMAIL PROTECTED] wrote: when inserting it it seems like nothing happens ie when i do a query here is the response for a test description: str name=description brhibrmybrnamebrisbrtopperbrand this bnbsp;blahblah/b is a btest/b /str The tag stripping happens during the analysis phase, and affects what gets indexed. For returned field values, you get what you put in. -Yonik
Re: Dynamic RequestHandler loading
: getRequestHandlers() would be equivolent to: : getRequestHandlers( SolrRequestHandler.class ) : : We will need some way to ask what is registered without knowing the : path it is registered to. getting instances by class seems like a pretty special case situation ... i'd rather not add a bunch of methods that really only have one use case which isn't even in the main code base. Adding a MapString,SolrRequestHandler getRequestHandlers() method to the core seems useful enough in a broad case to solve any special needs custom code might have -- find instances by interface etc... As long as we change the SolrCore initialization to construct all SOlrRequestHandler instances and build up that Map prior to calling hte init method on them, it would also solve the what name am i registered with question for RequestHandlers without needing to change the INterface in a backwards incompatible way. (handlers that want to know could get the Map and look for a value they are == to) -Hoss
RE: Time after snapshot is visible on the slave
Hi Galo, The snapinstaller actually performs a commit as its last step, so if that didn't work, it's not surprising that running commit separately didn't work, either. I would suggest running the snapinstaller and/or commit scripts with the -V option. This will produce verbose debugging information and allow you to see where they encounter problems. Hope this helps, -Graham
improve performance after commit
hello, I'm looking for some tips / suggestions around reducing the query time for Solr after I've post'ed a commit request. My Lucene index contains around 2,000,000 documents, and I have a job that periodically removes artibrary documents from Lucene and replaces them with fresh copies from a database. Whenever that cycle occurs, I send a commit to Solr to expose the updates. The problem is that immediately after the commit, a Solr query that previously took 5-20ms now takes 20-25 seconds. Ouch. I know that commit can be expensive, although I don't know by how much, or what I might do to mitigate the expense. I haven't much doc around this topic. I've also tried different cache settings (basically using high values for cache and auto-warm sizes) but that doesn't seem to make much of a difference. I'll keep investigating on my own, but if anyone has any suggestions or additional info, I would greatly appreciate it. thanks, Kaan Confidentiality Notice: This e-mail message (including any attached or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace Managed Hosting. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at [EMAIL PROTECTED], and delete the original message. Your cooperation is appreciated.
Re: improve performance after commit
On 3/6/07, Kaan Erdener [EMAIL PROTECTED] wrote: I'm looking for some tips / suggestions around reducing the query time for Solr after I've post'ed a commit request. My Lucene index contains around 2,000,000 documents, and I have a job that periodically removes artibrary documents from Lucene and replaces them with fresh copies from a database. Whenever that cycle occurs, I send a commit to Solr to expose the updates. The problem is that immediately after the commit, a Solr query that previously took 5-20ms now takes 20-25 seconds. Ouch. If this is a normal query (no faceting) then most likely the time is spent populating a lucene FieldCache entry used for sorting results. Put a static warming entry in solrconfig.xml that queries for a small number of documents and sorts that query by all the fields you commonly sort by. -Yonik
Saving dynamic field name without dynamic extension
I want to add a suffix to my fields names to use the dynamic fields feature. Is there a way to save the field name without the suffix so users can search by field with plain field name? -- View this message in context: http://www.nabble.com/Saving-dynamic-field-name-without-dynamic-extension-tf3358269.html#a9340901 Sent from the Solr - User mailing list archive at Nabble.com.
SQL Update
What is the status of the SQL update? Should all database fields used in sql updates be added to schema.xml before running the sql update? -- View this message in context: http://www.nabble.com/SQL-Update-tf3358303.html#a9341018 Sent from the Solr - User mailing list archive at Nabble.com.
Re: SQL Update
SOLR-103 is waiting for SOLR-139 to solidify before i post more updates... I have it running successfully, but it requires too many other patches to suggest trying to get it running unless you are up for a bit of work. If you are, i can easily post an update. About the schema... SOLR-103 uses the ResultSetMetaData to decide what field to push the value into - you will need to make sure the column names correspond to the fields in schema.sql. If you use SELECT * FROM, your tables will need the same names, if you use: SELECT mysqlfield as mysolrfieldname FROM ... you don't. ryan On 3/6/07, Debra [EMAIL PROTECTED] wrote: What is the status of the SQL update? Should all database fields used in sql updates be added to schema.xml before running the sql update? -- View this message in context: http://www.nabble.com/SQL-Update-tf3358303.html#a9341018 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Saving dynamic field name without dynamic extension
On 3/6/07, Debra [EMAIL PROTECTED] wrote: I want to add a suffix to my fields names to use the dynamic fields feature. Is there a way to save the field name without the suffix so users can search by field with plain field name? No, and I'm not sure that it is possible. Solr needs to know the type of a field at all times--not just during indexing. Why not create a _user suffix, and programmatically add the suffix to user queries before it reaches solr? -Mike
Re: Reindex only records that changed
additional field in your DB as flag? 1 - dirty, 0 - clean. Debra wrote: Hi all, This is not a direct solr issue but I need it for indexing. Is there a way to check if a database record changed since the last index (with out using a specail flag field that has to be set any-where the record is updated). I would like to re-index only records that changed. TIA Debra
Re: improve performance after commit
On Mar 6, 2007, at 1:55 PM, Yonik Seeley wrote: On 3/6/07, Kaan Erdener [EMAIL PROTECTED] wrote: I'm looking for some tips / suggestions around reducing the query time for Solr after I've post'ed a commit request. My Lucene index contains around 2,000,000 documents, and I have a job that periodically removes artibrary documents from Lucene and replaces them with fresh copies from a database. Whenever that cycle occurs, I send a commit to Solr to expose the updates. The problem is that immediately after the commit, a Solr query that previously took 5-20ms now takes 20-25 seconds. Ouch. If this is a normal query (no faceting) then most likely the time is spent populating a lucene FieldCache entry used for sorting results. Put a static warming entry in solrconfig.xml that queries for a small number of documents and sorts that query by all the fields you commonly sort by. -Yonik I'm not exactly sure this is what you meant, but I did some more research and it looks close. I added the following to my solrconfig.xml: listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=qallMessageContent:test/str str name=start0/str str name=rows10/str /lst /arr /listener and also: listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lst str name=qallMessageContent:trying/str str name=start0/str str name=rows10/str /lst /arr /listener From what I can see in the logs, these are both invoked after the commit. However, the query times after a commit are still slow (around 20 seconds). I'm guessing I didn't set up the warming correctly? I had some sorting parameters in there, but the syntax was wrong, produced errors on startup, so I took them out for now. Mar 6, 2007 4:51:52 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush Mar 6, 2007 4:51:52 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main documentCache {lookups=10,hits=0,hitratio=0.00,inserts=20,evictions=0,size=20,cumulati ve_lookups=120,cumulative_hits=68,cumulative_hitratio=0.56,cumulative_in serts=52,cumulative_evictions=0} Mar 6, 2007 4:51:52 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for [EMAIL PROTECTED] main documentCache {lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_ lookups=120,cumulative_hits=68,cumulative_hitratio=0.56,cumulative_inser ts=52,cumulative_evictions=0} Mar 6, 2007 4:51:52 PM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to [EMAIL PROTECTED] main Mar 6, 2007 4:51:52 PM org.apache.solr.core.SolrCore execute INFO: rows=10start=0q=allMessageContent:trying 0 410 Mar 6, 2007 4:51:52 PM org.apache.solr.core.QuerySenderListener newSearcher Confidentiality Notice: This e-mail message (including any attached or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace Managed Hosting. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at [EMAIL PROTECTED], and delete the original message. Your cooperation is appreciated.
Re: [2] Saving dynamic field name without dynamic extension
Thank you, your suggestion looks like the way to go... Mike Klaas wrote: On 3/6/07, Debra [EMAIL PROTECTED] wrote: I want to add a suffix to my fields names to use the dynamic fields feature. Is there a way to save the field name without the suffix so users can search by field with plain field name? No, and I'm not sure that it is possible. Solr needs to know the type of a field at all times--not just during indexing. Why not create a _user suffix, and programmatically add the suffix to user queries before it reaches solr? -Mike -- View this message in context: http://www.nabble.com/Saving-dynamic-field-name-without-dynamic-extension-tf3358269.html#a9343182 Sent from the Solr - User mailing list archive at Nabble.com.
Re: [2] Highlighting problems with HTML tagged fields
Yonik Seeley wrote: HTMLStripWhitespaceTokenizerFactory works in two phases... HTMLStripReader removes the HTML and passes the result to WhitespaceTokenizer... at that point, Tokens are generated, but the offsets will correspond to the text after HTML removal, not before. I did it this way so that HTMLStripReader could go before any tokenizer (like StandardTokenizer). Can you open a JIRA bug for this? The fix would be a special version of HTMLStripReader integrated with a WhitespaceTokenizer to keep offsets correct. -Yonik Is there a fix for this problem? my solr is dated on 12/17/2006. HTMLStripWhitespaceTokenizerFactory + highlighting still doesn't work. All the wrong items are highlighted. -- View this message in context: http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9343253 Sent from the Solr - User mailing list archive at Nabble.com.
Re: [2] Reindex only records that changed
I would like to avoid such a field in case tables are updated in programs not under my control + any program that updates these tables has to add logic for updating this field. Sergey Polzunov-2 wrote: additional field in your DB as flag? 1 - dirty, 0 - clean. Debra wrote: Hi all, This is not a direct solr issue but I need it for indexing. Is there a way to check if a database record changed since the last index (with out using a specail flag field that has to be set any-where the record is updated). I would like to re-index only records that changed. TIA Debra -- View this message in context: http://www.nabble.com/Reindex-only-records-that-changed-tf3358652.html#a9343307 Sent from the Solr - User mailing list archive at Nabble.com.
Re: [2] Reindex only records that changed
MySQL has a TIMESTAMP field that can autoupdate everytime something changes... i've never used it, but that may be a place to look. alternativly you could add a TRIGGER to automatticaly dump stuff to a bucket when it changes and clear the bucket when you index On 3/6/07, Debra [EMAIL PROTECTED] wrote: I would like to avoid such a field in case tables are updated in programs not under my control + any program that updates these tables has to add logic for updating this field. Sergey Polzunov-2 wrote: additional field in your DB as flag? 1 - dirty, 0 - clean. Debra wrote: Hi all, This is not a direct solr issue but I need it for indexing. Is there a way to check if a database record changed since the last index (with out using a specail flag field that has to be set any-where the record is updated). I would like to re-index only records that changed. TIA Debra -- View this message in context: http://www.nabble.com/Reindex-only-records-that-changed-tf3358652.html#a9343307 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Error with bin/optimize and multiple solr webapps
This issue has been logged as: https://issues.apache.org/jira/browse/SOLR-188 A patch file is included for those who are interested. I've unit tested in my environment, please validate it for your own environment. cheers, j On 3/5/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: Thanks Hoss. I'll add an issue in JIRA and attach the patch. On 3/5/07, Chris Hostetter [EMAIL PROTECTED] wrote: : This line assumes a single solr installation under Tomcat, whereas the : multiple webapp scenario runs from a different location (the /solr part). : I'm sure this applies elsewhere. good catch ... it looks like all of our scripts assume /solr/update is the correct path to POST commit/optimize messages to. : I would submit a patch for JIRA, but couldn't find these files under version : control. Any recommendations? They live in src/scripts ... a patch would ceritanly be apprecaited. FYI: there is an evolution underway to allow XML based update messages to be sent to any path (and the fixed path /update is being deprecated) so it would be handy if the entire URL path was configurable (not just hte webapp name) -Hoss
RE: Error with bin/optimize and multiple solr webapps
Apologies in advance if SOLR-187 and SOLR-188 look the same -- they are the same issue. I have been using adjusted scripts locally but hadn't used Jira before and wasn't sure of the process. I decided to figure it out after answering Gola's question this morning...then saw that Jeff had mentioned a similar issue last night. I apologize again for confusion over the double entry. Thanks, -Graham -Original Message- From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 06, 2007 4:34 PM To: solr-user@lucene.apache.org Subject: Re: Error with bin/optimize and multiple solr webapps This issue has been logged as: https://issues.apache.org/jira/browse/SOLR-188 A patch file is included for those who are interested. I've unit tested in my environment, please validate it for your own environment. cheers, j On 3/5/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: Thanks Hoss. I'll add an issue in JIRA and attach the patch. On 3/5/07, Chris Hostetter [EMAIL PROTECTED] wrote: : This line assumes a single solr installation under Tomcat, whereas the : multiple webapp scenario runs from a different location (the /solr part). : I'm sure this applies elsewhere. good catch ... it looks like all of our scripts assume /solr/update is the correct path to POST commit/optimize messages to. : I would submit a patch for JIRA, but couldn't find these files under version : control. Any recommendations? They live in src/scripts ... a patch would ceritanly be apprecaited. FYI: there is an evolution underway to allow XML based update messages to be sent to any path (and the fixed path /update is being deprecated) so it would be handy if the entire URL path was configurable (not just hte webapp name) -Hoss
Re: Error with bin/optimize and multiple solr webapps
Oops, my bad I didn't see either 186 or 187 before entering 188. :-) -- j On 3/6/07, Graham Stead [EMAIL PROTECTED] wrote: Apologies in advance if SOLR-187 and SOLR-188 look the same -- they are the same issue. I have been using adjusted scripts locally but hadn't used Jira before and wasn't sure of the process. I decided to figure it out after answering Gola's question this morning...then saw that Jeff had mentioned a similar issue last night. I apologize again for confusion over the double entry. Thanks, -Graham -Original Message- From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 06, 2007 4:34 PM To: solr-user@lucene.apache.org Subject: Re: Error with bin/optimize and multiple solr webapps This issue has been logged as: https://issues.apache.org/jira/browse/SOLR-188 A patch file is included for those who are interested. I've unit tested in my environment, please validate it for your own environment. cheers, j On 3/5/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: Thanks Hoss. I'll add an issue in JIRA and attach the patch. On 3/5/07, Chris Hostetter [EMAIL PROTECTED] wrote: : This line assumes a single solr installation under Tomcat, whereas the : multiple webapp scenario runs from a different location (the /solr part). : I'm sure this applies elsewhere. good catch ... it looks like all of our scripts assume /solr/update is the correct path to POST commit/optimize messages to. : I would submit a patch for JIRA, but couldn't find these files under version : control. Any recommendations? They live in src/scripts ... a patch would ceritanly be apprecaited. FYI: there is an evolution underway to allow XML based update messages to be sent to any path (and the fixed path /update is being deprecated) so it would be handy if the entire URL path was configurable (not just hte webapp name) -Hoss
Re: improve performance after commit
On 3/6/07, Kaan Erdener [EMAIL PROTECTED] wrote: From what I can see in the logs, these are both invoked after the commit. However, the query times after a commit are still slow (around 20 seconds). Your warming script didn't do any sorts. Why don't you also show the part of the log with the slow query... that would make it much easier for people to help. -Yonik
Re: improve performance after commit
str name=qallMessageContent:test;subject+asc/str there should be a space between subject and asc, try: http://host/select?q=allMessageContent:test;subject%20asc + is supposed to become a space, but it looks like it is staying +