Re: Unable to move index file error during replication
Sorry but which one shoud I take?? where exactly ? Noble Paul നോബിള് नोब्ळ् wrote: this fix is there in the trunk , you may not need to apply the patch On Fri, Mar 27, 2009 at 6:02 AM, sunnyfr johanna...@gmail.com wrote: Hi, It doesn't seem to work for me, I changed as well this part below is it ok?? - ListString copiedfiles = new ArrayListString(); + SetString filesToCopy = new HashSetString(); http://www.nabble.com/file/p22734005/ReplicationHandler.java ReplicationHandler.java Thanks a lot, Noble Paul നോബിള് नोब्ळ् wrote: James thanks . If this is true the place to fix this is in ReplicationHandler#getFileList(). patch is attached. On Wed, Dec 24, 2008 at 4:04 PM, James Grant james.gr...@semantico.com wrote: I had the same problem. It turned out that the list of files from the master included duplicates. When the slave completes the download and tries to move the files into the index it comes across a file that does not exist because it has already been moved so it backs out the whole operation. My solution for now was to patch the copyindexFiles method of org.apache.solr.handler.SnapPuller so that it normalises the list before moving the files. This isn't the best solution since it will still download the file twice but it was the easiest and smallest change to make. The patch is below Regards James --- src/java/org/apache/solr/handler/SnapPuller.java (revision 727347) +++ src/java/org/apache/solr/handler/SnapPuller.java (working copy) @@ -470,7 +470,7 @@ */ private boolean copyIndexFiles(File snapDir, File indexDir) { String segmentsFile = null; - ListString copiedfiles = new ArrayListString(); + SetString filesToCopy = new HashSetString(); for (MapString, Object f : filesDownloaded) { String fname = (String) f.get(NAME); // the segments file must be copied last @@ -482,6 +482,10 @@ segmentsFile = fname; continue; } + filesToCopy.add(fname); + } + ListString copiedfiles = new ArrayListString(); + for (String fname: filesToCopy) { if (!copyAFile(snapDir, indexDir, fname, copiedfiles)) return false; copiedfiles.add(fname); } Jaco wrote: Hello, While testing out the new replication features, I'm running into some strange problem. On the slave, I keep getting an error like this after all files have been copied from the master to the temporary index.x directory: SEVERE: Unable to move index file from: D:\Data\solr\Slave\data\index.20081224110855\_21e.tvx to: D:\Data\Solr\Slave\data\index\_21e.tvx The replication then stops, index remains in original state, so the updates are not available at the slave. This is my replication config at the master: requestHandler name=/replication class=solr.ReplicationHandler lst name=master !--Replicate on 'optimize' it can also be 'commit' -- str name=replicateAftercommit/str str name=confFilesschema.xml/str /lst /requestHandler This is the replication config at the slave: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrl http://hostnamemaster:8080/solr/Master/replication/str str name=pollInterval00:10:00/str str name=ziptrue/str /lst /requestHandler I'm running a Solr nightly build of 21.12.2008 in Tomcat 6 on Windows 2003. Initially I thought there was some problem with disk space, but this is not the case. Replication did run fine for intial version of index, but after that at some point it didn't work anymore. Any ideas what could be wrong here? Thanks very much in advance, bye, Jaco. -- --Noble Paul Index: src/java/org/apache/solr/handler/ReplicationHandler.java === --- src/java/org/apache/solr/handler/ReplicationHandler.java (revision 729282) +++ src/java/org/apache/solr/handler/ReplicationHandler.java (working copy) @@ -268,7 +268,7 @@ ListMapString, Object result = new ArrayListMapString, Object(); try { //get all the files in the commit - CollectionString files = commit.getFileNames(); + CollectionString files = new HashSetString(commit.getFileNames()); for (String fileName : files) { File file = new File(core.getIndexDir(), fileName); MapString, Object fileMeta = getFileInfo(file); -- View this message in context: http://www.nabble.com/%22Unable-to-move-index-file%22-error-during-replication-tp21157722p22734005.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/%22Unable-to-move-index-file%22-error-during-replication-tp21157722p22737672.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unable to move index file error during replication
the latest nightly should do fine On Fri, Mar 27, 2009 at 1:59 PM, sunnyfr johanna...@gmail.com wrote: Sorry but which one shoud I take?? where exactly ? Noble Paul നോബിള് नोब्ळ् wrote: this fix is there in the trunk , you may not need to apply the patch On Fri, Mar 27, 2009 at 6:02 AM, sunnyfr johanna...@gmail.com wrote: Hi, It doesn't seem to work for me, I changed as well this part below is it ok?? - ListString copiedfiles = new ArrayListString(); + SetString filesToCopy = new HashSetString(); http://www.nabble.com/file/p22734005/ReplicationHandler.java ReplicationHandler.java Thanks a lot, Noble Paul നോബിള് नोब्ळ् wrote: James thanks . If this is true the place to fix this is in ReplicationHandler#getFileList(). patch is attached. On Wed, Dec 24, 2008 at 4:04 PM, James Grant james.gr...@semantico.com wrote: I had the same problem. It turned out that the list of files from the master included duplicates. When the slave completes the download and tries to move the files into the index it comes across a file that does not exist because it has already been moved so it backs out the whole operation. My solution for now was to patch the copyindexFiles method of org.apache.solr.handler.SnapPuller so that it normalises the list before moving the files. This isn't the best solution since it will still download the file twice but it was the easiest and smallest change to make. The patch is below Regards James --- src/java/org/apache/solr/handler/SnapPuller.java (revision 727347) +++ src/java/org/apache/solr/handler/SnapPuller.java (working copy) @@ -470,7 +470,7 @@ */ private boolean copyIndexFiles(File snapDir, File indexDir) { String segmentsFile = null; - ListString copiedfiles = new ArrayListString(); + SetString filesToCopy = new HashSetString(); for (MapString, Object f : filesDownloaded) { String fname = (String) f.get(NAME); // the segments file must be copied last @@ -482,6 +482,10 @@ segmentsFile = fname; continue; } + filesToCopy.add(fname); + } + ListString copiedfiles = new ArrayListString(); + for (String fname: filesToCopy) { if (!copyAFile(snapDir, indexDir, fname, copiedfiles)) return false; copiedfiles.add(fname); } Jaco wrote: Hello, While testing out the new replication features, I'm running into some strange problem. On the slave, I keep getting an error like this after all files have been copied from the master to the temporary index.x directory: SEVERE: Unable to move index file from: D:\Data\solr\Slave\data\index.20081224110855\_21e.tvx to: D:\Data\Solr\Slave\data\index\_21e.tvx The replication then stops, index remains in original state, so the updates are not available at the slave. This is my replication config at the master: requestHandler name=/replication class=solr.ReplicationHandler lst name=master !--Replicate on 'optimize' it can also be 'commit' -- str name=replicateAftercommit/str str name=confFilesschema.xml/str /lst /requestHandler This is the replication config at the slave: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrl http://hostnamemaster:8080/solr/Master/replication/str str name=pollInterval00:10:00/str str name=ziptrue/str /lst /requestHandler I'm running a Solr nightly build of 21.12.2008 in Tomcat 6 on Windows 2003. Initially I thought there was some problem with disk space, but this is not the case. Replication did run fine for intial version of index, but after that at some point it didn't work anymore. Any ideas what could be wrong here? Thanks very much in advance, bye, Jaco. -- --Noble Paul Index: src/java/org/apache/solr/handler/ReplicationHandler.java === --- src/java/org/apache/solr/handler/ReplicationHandler.java (revision 729282) +++ src/java/org/apache/solr/handler/ReplicationHandler.java (working copy) @@ -268,7 +268,7 @@ ListMapString, Object result = new ArrayListMapString, Object(); try { //get all the files in the commit - CollectionString files = commit.getFileNames(); + CollectionString files = new HashSetString(commit.getFileNames()); for (String fileName : files) { File file = new File(core.getIndexDir(), fileName); MapString, Object fileMeta = getFileInfo(file); -- View this message in context: http://www.nabble.com/%22Unable-to-move-index-file%22-error-during-replication-tp21157722p22734005.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context:
Re: Incorrect sort with with function query in query parameters
Asif, Could it have something to do with the deleted documents in your unoptimized index? There documents are only marked as deleted. When you run optimize you really remove them completely. It could be that they are getting counted by something and that messes up the scoring/order. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Asif Rahman a...@newscred.com To: solr-user@lucene.apache.org Sent: Thursday, March 26, 2009 10:24:19 PM Subject: Incorrect sort with with function query in query parameters Hi all, I'm having an issue with the order of my results when attempting to sort by a function in my query. Looking at the debug output of the query, the score returned with in the result section for any given document does not match the score in the debug output. It turns out that if I optimize the index, then the results are sorted correctly. The scores in the debug output are the correct scores. This behavior only occurs using a recent nightly build of Solr. It works correctly in Solr 1.3. An example query is: http://localhost:8080/solr/core-01/select?qt=standardfl=*,scorerows=10q=*:*%20_val_:recip(rord(article_published_at),1,1000,1000)^1debugQuery=on I've attached the result to this email. Can anybody shed any light on this problem? Thanks, Asif http://www.nabble.com/file/p22735009/result.xml result.xml -- View this message in context: http://www.nabble.com/Incorrect-sort-with-with-function-query-in-query-parameters-tp22735009p22735009.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: optimization advice?
Steve, Maybe you can tell us about: - your hardware - query rate - document cache and query cache settings - your current response times - any pain points, any slow query patterns - etc. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Steve Conover scono...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, March 27, 2009 1:50:48 AM Subject: optimization advice? Hi, I've looked over the public Solr perf docs and done some searching on this mailing list. Still, I'd like to seek some advice based on my specific situation: - 2-3 million documents / 5GB index - each document has 40+ indexed fields, and many multivalue fields - only primary keys are stored - very low write frequency - queries can be sorted by any combination of fields, and are always sorted by at least one field - query criteria vary from very simple to very complex (the point about queries being that they're not very amenable to being cached) So far I've set my mergefactor very low.I haven't paid much attention to caching except for basic query result caching - I don't think many of the cache features really apply well to my problem. Increasing the amount of ram available to java (by 1GB) has no effect I can detect. Ideally I'd like to get response times down to near-instantaneous / 50ms (which is where they were when the index was ~ 1 millions documents). I'd love to hear suggestions - in particular are there obvious optimization options I've missed? Regards, Steve
Re: Initial query performance poor after update / delete
Hi Tom, Thanks Otis. After some further testing - I've noticed that initial searches are only slow if I include the qt=geo parameter. Searches without this parameter appear to show no slow down whatsoever after updates - so I'm wondering if the problem is actually a localsolr one. Can you tell me where I can specify the configuration to set up the parameters for swapping the searchers? Is this within solrconfig.xml? Any light you could shed on this would be really appreciated. In a single server environment searchers should be swapped whenever you issue a commit. Thanks again, Tom PS. If you wrote a SOLR in Action - I would buy it today! Careful what you wish! ;) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
Re: Initial query performance poor after update / delete
Thanks Otis. After some further testing - I've noticed that initial searches are only slow if I include the qt=geo parameter. Searches without this parameter appear to show no slow down whatsoever after updates - so I'm wondering if the problem is actually a localsolr one. Can you tell me where I can specify the configuration to set up the parameters for swapping the searchers? Is this within solrconfig.xml? Any light you could shed on this would be really appreciated. Thanks again, Tom PS. If you wrote a SOLR in Action - I would buy it today! -- View this message in context: http://www.nabble.com/Initial-query-performance-poor-after-update---delete-tp22732463p22739929.html Sent from the Solr - User mailing list archive at Nabble.com.
Solrj exception posting XML docs
Hello all, I am currently using Solr 1.3 and its Solrj. I am trying to post XML docs directly through Solrj but I get the following exception: 13:12:09,119 ERROR [STDERR] Mar 27, 2009 1:12:09 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:194) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:235) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:190) at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:92) at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.process(SecurityContextEstablishmentValve.java:126) at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:70) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:330) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:829) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:601) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:595) 13:12:09,120 ERROR [STDERR] Mar 27, 2009 1:12:09 PM org.apache.solr.core.SolrCore execute INFO: [downloadable] webapp=/solr path=/update params={wt=javabinversion=2.2} status=500 QTime=2 13:12:09,121 ERROR [STDERR] Mar 27, 2009 1:12:09 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:194) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:235) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:190) at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:92) at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.process(SecurityContextEstablishmentValve.java:126) at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:70) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:330) at
Re: Search transparently with Solr with multiple cores, different indexes, common response type
Hello Hoss, Steve, thank you very much for your feedbacks, they have been very helpful making me feel more confident now about this architecture. In fact I decided to go for a single shared schema, but keeping multiple indexes (multicore) because those two indexes are very different: one is huge and updated not very often (once a day delta, once a week full) and the other one is not that big and it is updated frequently (once per hour, once per day, once per week full). My boss is happy...thus I am happy too :-) Now I am struggling a bit with Solrj...but that is already in another post of mine :-) Cheers, Giovanni On 3/26/09, Stephen Weiss swe...@stylesight.com wrote: I have a very similar setup and that's precisely what we do - except with JSON. 1) Request comes into PHP 2) PHP runs the search against several different cores (in a multicore setup) - ours are a little more than slightly different 3) PHP constructs a new object with the responseHeader and response objects joined together (basically add the record counts together in the header and then concatenate the arrays of documents) 4) PHP encodes the combined data into JSON and returns it It sounds clunky but it all manages to happen very quickly ( 200 ms round trip). The only problem you might hit is with paging, but from the way you describe your situation it doesn't sound like that will be a problem. It's more of an issue if you're trying to make them seamlessly flow into each other, but it sounds like you plan on presenting them separately (as we do). -- Steve it could be a custom request handler, but it doesn't have to be -- you could implment it in whatever way is easiest for you (there's no reason why it has to run in the same JVM or on the same physical machine as Solr ... it could be a PHP script on another server if you want) -Hoss
Re: Incorrect sort with with function query in query parameters
Hi Otis, Any documents marked deleted in this index are just the result of updates to those documents. There are no purely deleted documents. Furthermore, the field that I am ordering by in my function query remains untouched over the updates. I've read in other posts that the logic used by the debug component to calculate the score is different from what the query component uses. The score shown in the debug output is correct. It seems like the two components are getting two different values for the rord function. I'm particularly concerned by the fact that this only happens in the nightly build. Any ideas on how to correct this? Unfortunately, it's not feasible for me to only perform searches on optimized indices because we are doing constant updates. Thanks, Asif Otis Gospodnetic wrote: Asif, Could it have something to do with the deleted documents in your unoptimized index? There documents are only marked as deleted. When you run optimize you really remove them completely. It could be that they are getting counted by something and that messes up the scoring/order. -- View this message in context: http://www.nabble.com/Incorrect-sort-with-with-function-query-in-query-parameters-tp22735009p22741058.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrj exception posting XML docs
Hello all, the null pointer exception was caused by a wrong XML... Basically my doc was something like this: doc ... ... /doc but it had to be wrapped with a add as follow: add doc ... /doc /add A more useful message would have been nice to have because I had to look at the source code to understand that the command was missing... Anyway I posted my own resolution for future reference :-) Cheers, Giovanni On 3/27/09, Giovanni De Stefano giovanni.destef...@gmail.com wrote: Hello all, I am currently using Solr 1.3 and its Solrj. I am trying to post XML docs directly through Solrj but I get the following exception: 13:12:09,119 ERROR [STDERR] Mar 27, 2009 1:12:09 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:194) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:235) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:190) at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:92) at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.process(SecurityContextEstablishmentValve.java:126) at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:70) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:330) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:829) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:601) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:595) 13:12:09,120 ERROR [STDERR] Mar 27, 2009 1:12:09 PM org.apache.solr.core.SolrCore execute INFO: [downloadable] webapp=/solr path=/update params={wt=javabinversion=2.2} status=500 QTime=2 13:12:09,121 ERROR [STDERR] Mar 27, 2009 1:12:09 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:194) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:235) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:190) at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:92) at
Clarifying use of lst name=appends within a requestHandler
Hello, Due to limitations with the way my content is organised and DIH I have to add “-imgCaption:[* TO *]” to some of my queries. I discovered the name=”appends” functionality tucked away inside solconfig.xml. This looks a very useful feature, and I created a new requestHandler to deal with my problem queries. I tried adding the following to my alternate requestHandler:- lst name=appendsstr name=q-imgCaption:[* TO *]/str/lst which did not work; however lst name=appendsstr name=fq-imgCaption:[* TO *]/str/lst worked fine and is also more efficient. I guess I was caught by the “identify values which should be appended to the list of ***multi-val params from the query” portion of the comment within solconfig.xml. I am now wondering how do I know which query params are multi-val or not? Is this documented anywhere? Regards Fergus.
Faceting question
I am using the faceting feature and it works, I get back the facet counts, but I need to know which facet.method(enum or fc) is used. Is there a way to turn on the debug info for faceting. Here's my setup Solr 1.3 EmbededSolrServer SolrJ Facet fields are indexed as multivalued solr.StrField Thanks Rayan -- View this message in context: http://www.nabble.com/Faceting-question-tp22743106p22743106.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr date parsing issue
Hello, I am having a problem indexing a date field. In my schema the date field is defined the standard way: fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true/ I know the Solr format is 1995-12-31T23:59:59Z, but the dates coming from my sources are in the format 2009-04-10T02:02:55+0200 How can I make the conversion? Do I have to extend DateField or is there any cleaner way to do it? Thanks in advance! Giovanni
Encoding problem
I'm having problems with encoding in responses from search queries. The encoding problem only occurs in the topologyname field, if a instancename has accents it is returned correctly. In all my configurations I have UTF-8. ?xml version=1.0 encoding=UTF-8? dataConfig document name=topologies entity query=SELECT DISTINCT '3141-' || Sub0.SUBID as id, 'Inventário' as topologyname, 3141 as topologyid, Sub0.SUBID as instancekey, Sub0.NAME as instancename FROM ... field column=INSTANCEKEY name=instancekey/ field column=ID name=id/ field column=TOPOLOGYID name=topologyid/ field column=INSTANCENAME name=instancename/ field column=TOPOLOGYNAME name=topologyname/... As an example, I can have in the response the following result: doc long name=instancekey285/long str name=instancenameInformática/str long name=topologyid3141/long str name=topologynameInventário/str /doc Thanks in advance, Rui Pereira
Re: Faceting question
It would be the enum method... Solr 1.3 doesn't have the fc method for multi-valued fields... that's a 1.4 feature. -Yonik http://www.lucidimagination.com On Fri, Mar 27, 2009 at 10:44 AM, rayandev rayanm...@gmail.com wrote: I am using the faceting feature and it works, I get back the facet counts, but I need to know which facet.method(enum or fc) is used. Is there a way to turn on the debug info for faceting. Here's my setup Solr 1.3 EmbededSolrServer SolrJ Facet fields are indexed as multivalued solr.StrField Thanks Rayan -- View this message in context: http://www.nabble.com/Faceting-question-tp22743106p22743106.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Search Error
Hi All, I am intermittently getting this Exception when I do the search. What could be the reason?. Caused by: org.apache.solr.common.SolrException: 11938 java.lang.ArrayIndexOutOfBoundsException: 11938 at org.apache.lucene.search.TermScorer.score(TermScorer.java:74)at org.apache.lucene.search.TermScorer.score(TermScorer.java:61)at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:137)at org.apache.lucene.search.Searcher.search(Searcher.java:126) at org.apache.lucene.search.Searcher.search(Searcher.java:105) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:966) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:433) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685) at java.lang.Thread.run(Thread.java:619) Thanks. Karthik
Best way to unit test solr integration
Hello, On our project, we have quite a bit of code used to generate Solr queries, and I need to create some unit tests to ensure that these continue to work. In addition, I need to generate some unit tests that will test indexing and retrieval of certain documents, based on our current schema and the application logic that generates the indexable documents as well as generates the Solr queries. My question is - what's the best way for me to unit test our Solr integration? I'd like to be able to spin up an embedded/in-memory solr, or that failing just start one up as part of my test case setup, fill it with interesting documents, and do some queries, comparing the results to expected results. Are there wiki pages or other documented examples of doing this? It seems rather straight-forward, but who knows, it may be dead simple with some unknown feature. Thanks! -Joe
Re: Encoding problem
Hi, I had the same problem with DATAIMPORTHandler : i have a utf-8 mysql DATABASE but it's seems that DIH import data in LATIN... So i just use Transformer to (re)encode my strings in UTF-8. Rui Pereira-2 wrote: I'm having problems with encoding in responses from search queries. The encoding problem only occurs in the topologyname field, if a instancename has accents it is returned correctly. In all my configurations I have UTF-8. ?xml version=1.0 encoding=UTF-8? dataConfig document name=topologies entity query=SELECT DISTINCT '3141-' || Sub0.SUBID as id, 'Inventário' as topologyname, 3141 as topologyid, Sub0.SUBID as instancekey, Sub0.NAME as instancename FROM ... field column=INSTANCEKEY name=instancekey/ field column=ID name=id/ field column=TOPOLOGYID name=topologyid/ field column=INSTANCENAME name=instancename/ field column=TOPOLOGYNAME name=topologyname/... As an example, I can have in the response the following result: doc long name=instancekey285/long str name=instancenameInformática/str long name=topologyid3141/long str name=topologynameInventário/str /doc Thanks in advance, Rui Pereira -- View this message in context: http://www.nabble.com/Encoding-problem-tp22743698p22745133.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to unit test solr integration
So my first thought is that unit test + solr integration is an oxymoron. In the sense that unit test implies the smallest functional unit, and solr integration implies multiple units working together. It sounds like you have two different tasks. the code that generate queies, you can test that without Solr. If you need to parse some sort of solr document to generate a query based on it, then mock up the query. A lot of folks will just use Solr to build a result set, and then save that on the filesystem. my_big_result1.xml and read it in and feed it to your code. On the other hand, for you code testing indexing and retrieval, again, if you can use the same approach to decouple what solr does from your code. Unless you've patched Solr, you shouldn't need to unit test Solr, Solr has very nice unit testing built in. On the other hand, if you are doing integration testing, where you want a more end to end view of your application, then you probably already have a test solr setup in your environment somewhere that you can rely on to use. Spinning up and shutting down Solr for tests can be done, and I can think of use cases for why you might want to do it, but it does incur a penalty of being more work. And you still need to validate that your embedded/unit test solr works the same as your integration/test environment Solr. Eric On Mar 27, 2009, at 11:59 AM, Joe Pollard wrote: Hello, On our project, we have quite a bit of code used to generate Solr queries, and I need to create some unit tests to ensure that these continue to work. In addition, I need to generate some unit tests that will test indexing and retrieval of certain documents, based on our current schema and the application logic that generates the indexable documents as well as generates the Solr queries. My question is - what's the best way for me to unit test our Solr integration? I'd like to be able to spin up an embedded/in-memory solr, or that failing just start one up as part of my test case setup, fill it with interesting documents, and do some queries, comparing the results to expected results. Are there wiki pages or other documented examples of doing this? It seems rather straight-forward, but who knows, it may be dead simple with some unknown feature. Thanks! -Joe - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
RE: Best way to unit test solr integration
Thanks for the tips, I like the suggestion of testing the document and query generation without having solr involved. That seems like a more bite-sized unit; I think I'll do that. However, here's the test case that I'm considering where I'd like to have a live solr instance: During an exercise of optimizing our schema, I'm going to be making wholesale changes that I'd like to ensure don't break some portion of our app. It seems like a good method for this would be to write a test with the following steps: (arguably not a unit test, but a very valuable test indeed in our application) * take some defined model object generated at test time, store it in db * run it through our document creation code * submit it into solr * generate a query using our custom criteria-based generation code * ensure that the query returns the results as expected * flesh out the new model objects from the db using only the id fields returned from Solr * In the end, it would be expected to have model objects retrieved from the db that match model objects at the beginning of the test. These building blocks could be stacked in numerous ways to test almost all the different scenarios in which we use Solr. Also, when/if we start making solr config changes, I can ensure that they change nothing from my app's functional point of view (with the exception of ridding us of dreaded OOMs). Thanks, -Joe -Original Message- From: Eric Pugh [mailto:ep...@opensourceconnections.com] Sent: Friday, March 27, 2009 11:27 AM To: solr-user@lucene.apache.org Subject: Re: Best way to unit test solr integration So my first thought is that unit test + solr integration is an oxymoron. In the sense that unit test implies the smallest functional unit, and solr integration implies multiple units working together. It sounds like you have two different tasks. the code that generate queies, you can test that without Solr. If you need to parse some sort of solr document to generate a query based on it, then mock up the query. A lot of folks will just use Solr to build a result set, and then save that on the filesystem. my_big_result1.xml and read it in and feed it to your code. On the other hand, for you code testing indexing and retrieval, again, if you can use the same approach to decouple what solr does from your code. Unless you've patched Solr, you shouldn't need to unit test Solr, Solr has very nice unit testing built in. On the other hand, if you are doing integration testing, where you want a more end to end view of your application, then you probably already have a test solr setup in your environment somewhere that you can rely on to use. Spinning up and shutting down Solr for tests can be done, and I can think of use cases for why you might want to do it, but it does incur a penalty of being more work. And you still need to validate that your embedded/unit test solr works the same as your integration/test environment Solr. Eric On Mar 27, 2009, at 11:59 AM, Joe Pollard wrote: Hello, On our project, we have quite a bit of code used to generate Solr queries, and I need to create some unit tests to ensure that these continue to work. In addition, I need to generate some unit tests that will test indexing and retrieval of certain documents, based on our current schema and the application logic that generates the indexable documents as well as generates the Solr queries. My question is - what's the best way for me to unit test our Solr integration? I'd like to be able to spin up an embedded/in-memory solr, or that failing just start one up as part of my test case setup, fill it with interesting documents, and do some queries, comparing the results to expected results. Are there wiki pages or other documented examples of doing this? It seems rather straight-forward, but who knows, it may be dead simple with some unknown feature. Thanks! -Joe - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Solr Search Error
Hi Karthik, First thing I'd do is get the latest Solr nightly build. If that doesn't fix thing, I'd grab the latest Lucene nightly build and use it to replace Lucene jars that are in your version of Solr. If that doesn't work I'd email the ML with a bit more info about the type of search that causes this (e.g. Do all searches cause this or only some? What do those that trigger this error look like or have in common?) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Narayanan, Karthikeyan karthikeyan.naraya...@gs.com To: solr-user@lucene.apache.org Sent: Friday, March 27, 2009 11:42:12 AM Subject: Solr Search Error Hi All, I am intermittently getting this Exception when I do the search. What could be the reason?. Caused by: org.apache.solr.common.SolrException: 11938 java.lang.ArrayIndexOutOfBoundsException: 11938 at org.apache.lucene.search.TermScorer.score(TermScorer.java:74)at org.apache.lucene.search.TermScorer.score(TermScorer.java:61)at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:137) at org.apache.lucene.search.Searcher.search(Searcher.java:126) at org.apache.lucene.search.Searcher.search(Searcher.java:105) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:966) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:433) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685) at java.lang.Thread.run(Thread.java:619) Thanks. Karthik
Re: Best way to unit test solr integration
Joe, Have a look at Solr's own unit test, I believe they have pieces of what you need - the ability to start a Solr instance, index docs, run a query, and test if the results contain what you expect to see in them. You can get to Solr's unit test by checking out Solr from svn, or by browising the svn repository via the Web. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Joe Pollard joe.poll...@bazaarvoice.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Friday, March 27, 2009 12:50:31 PM Subject: RE: Best way to unit test solr integration Thanks for the tips, I like the suggestion of testing the document and query generation without having solr involved. That seems like a more bite-sized unit; I think I'll do that. However, here's the test case that I'm considering where I'd like to have a live solr instance: During an exercise of optimizing our schema, I'm going to be making wholesale changes that I'd like to ensure don't break some portion of our app. It seems like a good method for this would be to write a test with the following steps: (arguably not a unit test, but a very valuable test indeed in our application) * take some defined model object generated at test time, store it in db * run it through our document creation code * submit it into solr * generate a query using our custom criteria-based generation code * ensure that the query returns the results as expected * flesh out the new model objects from the db using only the id fields returned from Solr * In the end, it would be expected to have model objects retrieved from the db that match model objects at the beginning of the test. These building blocks could be stacked in numerous ways to test almost all the different scenarios in which we use Solr. Also, when/if we start making solr config changes, I can ensure that they change nothing from my app's functional point of view (with the exception of ridding us of dreaded OOMs). Thanks, -Joe -Original Message- From: Eric Pugh [mailto:ep...@opensourceconnections.com] Sent: Friday, March 27, 2009 11:27 AM To: solr-user@lucene.apache.org Subject: Re: Best way to unit test solr integration So my first thought is that unit test + solr integration is an oxymoron. In the sense that unit test implies the smallest functional unit, and solr integration implies multiple units working together. It sounds like you have two different tasks. the code that generate queies, you can test that without Solr. If you need to parse some sort of solr document to generate a query based on it, then mock up the query. A lot of folks will just use Solr to build a result set, and then save that on the filesystem. my_big_result1.xml and read it in and feed it to your code. On the other hand, for you code testing indexing and retrieval, again, if you can use the same approach to decouple what solr does from your code. Unless you've patched Solr, you shouldn't need to unit test Solr, Solr has very nice unit testing built in. On the other hand, if you are doing integration testing, where you want a more end to end view of your application, then you probably already have a test solr setup in your environment somewhere that you can rely on to use. Spinning up and shutting down Solr for tests can be done, and I can think of use cases for why you might want to do it, but it does incur a penalty of being more work. And you still need to validate that your embedded/unit test solr works the same as your integration/test environment Solr. Eric On Mar 27, 2009, at 11:59 AM, Joe Pollard wrote: Hello, On our project, we have quite a bit of code used to generate Solr queries, and I need to create some unit tests to ensure that these continue to work. In addition, I need to generate some unit tests that will test indexing and retrieval of certain documents, based on our current schema and the application logic that generates the indexable documents as well as generates the Solr queries. My question is - what's the best way for me to unit test our Solr integration? I'd like to be able to spin up an embedded/in-memory solr, or that failing just start one up as part of my test case setup, fill it with interesting documents, and do some queries, comparing the results to expected results. Are there wiki pages or other documented examples of doing this? It seems rather straight-forward, but who knows, it may be dead simple with some unknown feature. Thanks! -Joe - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
RE: Solr Search Error
Hi Otis, Thanks for the recommendation. Will try with latest nightly build.. I did couple of full data import and got this error at few times while searching.. Thanks. Karthik -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Friday, March 27, 2009 12:57 PM To: solr-user@lucene.apache.org Subject: Re: Solr Search Error Hi Karthik, First thing I'd do is get the latest Solr nightly build. If that doesn't fix thing, I'd grab the latest Lucene nightly build and use it to replace Lucene jars that are in your version of Solr. If that doesn't work I'd email the ML with a bit more info about the type of search that causes this (e.g. Do all searches cause this or only some? What do those that trigger this error look like or have in common?) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Narayanan, Karthikeyan karthikeyan.naraya...@gs.com To: solr-user@lucene.apache.org Sent: Friday, March 27, 2009 11:42:12 AM Subject: Solr Search Error Hi All, I am intermittently getting this Exception when I do the search. What could be the reason?. Caused by: org.apache.solr.common.SolrException: 11938 java.lang.ArrayIndexOutOfBoundsException: 11938 at org.apache.lucene.search.TermScorer.score(TermScorer.java:74) at org.apache.lucene.search.TermScorer.score(TermScorer.java:61) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:137) at org.apache.lucene.search.Searcher.search(Searcher.java:126) at org.apache.lucene.search.Searcher.search(Searcher.java:105) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher. java:966) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.j ava:838) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:2 69) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent. java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search Handler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:210) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:174) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(Authenticator Base.java:433) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:1 51) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:87 0) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.proc essConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint .java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollow erWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool .java:685) at java.lang.Thread.run(Thread.java:619) Thanks. Karthik
Re: Best way to unit test solr integration
So in the building block story you talked about, that sounds like an integration (functional? user acceptance?) test.. And I would treat Solr the same way you treat your database that you are storing model objects in. If in your tests you bring up a fresh version of the db, populate it with tables etc, put in sample data, then you should do the same with Solr. My guess is that you have a test database running, and therefore you need a live supported test Solr. And the same processes you use so that two functional tests don't step on each others data in the database can be applied to Solr! You can think of tweaking solr config changes as similar to tweaking indexes in your db.. Both require Configuration Management to track those changes, ensure they are deployed, and don't regress anything. Let us know how you get on! Eric On Mar 27, 2009, at 12:50 PM, Joe Pollard wrote: Thanks for the tips, I like the suggestion of testing the document and query generation without having solr involved. That seems like a more bite-sized unit; I think I'll do that. However, here's the test case that I'm considering where I'd like to have a live solr instance: During an exercise of optimizing our schema, I'm going to be making wholesale changes that I'd like to ensure don't break some portion of our app. It seems like a good method for this would be to write a test with the following steps: (arguably not a unit test, but a very valuable test indeed in our application) * take some defined model object generated at test time, store it in db * run it through our document creation code * submit it into solr * generate a query using our custom criteria-based generation code * ensure that the query returns the results as expected * flesh out the new model objects from the db using only the id fields returned from Solr * In the end, it would be expected to have model objects retrieved from the db that match model objects at the beginning of the test. These building blocks could be stacked in numerous ways to test almost all the different scenarios in which we use Solr. Also, when/if we start making solr config changes, I can ensure that they change nothing from my app's functional point of view (with the exception of ridding us of dreaded OOMs). Thanks, -Joe -Original Message- From: Eric Pugh [mailto:ep...@opensourceconnections.com] Sent: Friday, March 27, 2009 11:27 AM To: solr-user@lucene.apache.org Subject: Re: Best way to unit test solr integration So my first thought is that unit test + solr integration is an oxymoron. In the sense that unit test implies the smallest functional unit, and solr integration implies multiple units working together. It sounds like you have two different tasks. the code that generate queies, you can test that without Solr. If you need to parse some sort of solr document to generate a query based on it, then mock up the query. A lot of folks will just use Solr to build a result set, and then save that on the filesystem. my_big_result1.xml and read it in and feed it to your code. On the other hand, for you code testing indexing and retrieval, again, if you can use the same approach to decouple what solr does from your code. Unless you've patched Solr, you shouldn't need to unit test Solr, Solr has very nice unit testing built in. On the other hand, if you are doing integration testing, where you want a more end to end view of your application, then you probably already have a test solr setup in your environment somewhere that you can rely on to use. Spinning up and shutting down Solr for tests can be done, and I can think of use cases for why you might want to do it, but it does incur a penalty of being more work. And you still need to validate that your embedded/unit test solr works the same as your integration/test environment Solr. Eric On Mar 27, 2009, at 11:59 AM, Joe Pollard wrote: Hello, On our project, we have quite a bit of code used to generate Solr queries, and I need to create some unit tests to ensure that these continue to work. In addition, I need to generate some unit tests that will test indexing and retrieval of certain documents, based on our current schema and the application logic that generates the indexable documents as well as generates the Solr queries. My question is - what's the best way for me to unit test our Solr integration? I'd like to be able to spin up an embedded/in-memory solr, or that failing just start one up as part of my test case setup, fill it with interesting documents, and do some queries, comparing the results to expected results. Are there wiki pages or other documented examples of doing this? It seems rather straight-forward, but who knows, it may be dead simple with some unknown feature. Thanks! -Joe - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
Re: Faceting question
Thanks Yonik. If it is using enum method then it should also be caching the facet query for every indexed value for the facet fields. 1) Do I need to add filterCache and hashDocSet entry to the solrconfig.xml for this caching to happen.? I did not find any noticeable difference in query time if I added that or not. 2) Will the performance be the same with more facet values and bigger index ? The current implementation has 10 different facet values and there is 300,000 documents indexed. I will be adding more multivalued facet fields, the combined facet values could be upto 60 and index could go up to 1 million documents Thanks Rayan On Fri, Mar 27, 2009 at 11:18 AM, Yonik Seeley yo...@lucidimagination.comwrote: It would be the enum method... Solr 1.3 doesn't have the fc method for multi-valued fields... that's a 1.4 feature. -Yonik http://www.lucidimagination.com On Fri, Mar 27, 2009 at 10:44 AM, rayandev rayanm...@gmail.com wrote: I am using the faceting feature and it works, I get back the facet counts, but I need to know which facet.method(enum or fc) is used. Is there a way to turn on the debug info for faceting. Here's my setup Solr 1.3 EmbededSolrServer SolrJ Facet fields are indexed as multivalued solr.StrField Thanks Rayan -- View this message in context: http://www.nabble.com/Faceting-question-tp22743106p22743106.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: optimization advice?
Steve, Maybe you can tell us about: sure - your hardware 2.5GB RAM, pretty modern virtual servers - query rate Let's say a few queries per second max... 4 And in general the challenge is to get latency on any given query down to something very low - we don't have to worry about a huge amount of load at the moment. - document cache and query cache settings queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ - your current response times This depends on the query. For queries that involve a total record count of 1 million, we often see 10ms response times, up to 4-500ms in the worst case. When we do a page one, sorted query on our full record set of 2 million+ records, response times can get up into 2+ seconds. - any pain points, any slow query patterns Something that can't be emphasized enough is that we can't predict what records people will want. Almost every query is aimed at a different set of records. -Steve
Test
Sorry, I am having trouble sending a message to this Distribution list. This is a test.
Question about Solr memory usage.
I'm running an old version of Solr -- it's 1.2, and I'm about to upgrade to 1.3. But I have a question about Solr 1.2 memory usage. I am occasionally seeing out of memory errors in my Solr log. Doesn't Solr release memory after a document has been indexed ? I would not think it is right for the memory usage to climb to its max specified in java options then give out of memory errors... Any thoughts you have are appreciated. Thanks.
use extrernal index for spellcheck component
Hey there, I have a doubt with spellcheck component... If I tell the spellcheck component to load the dictionary from a field of my solr main index there's no problem but... Does someone know how to tell the spellcheck component to load the dictionary from a filed of an external index? What I do is: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext/str lst name=spellchecker str name=namedefault/str str name=fieldword_spell/str str name=spellcheckIndexDir./spellchecker1/str /lst /searchComponent word_spell is the field witch contains the dictionary in my secondary solr index, with I have placed in /spellchecker1. Don't know if has something to do with the field called name. I am missing something but don't know what... Thanks in advance. -- View this message in context: http://www.nabble.com/use-extrernal-index-for-spellcheck-component-tp22745638p22745638.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: use extrernal index for spellcheck component
On Sat, Mar 28, 2009 at 12:16 AM, Marc Sturlese marc.sturl...@gmail.comwrote: Hey there, I have a doubt with spellcheck component... If I tell the spellcheck component to load the dictionary from a field of my solr main index there's no problem but... Does someone know how to tell the spellcheck component to load the dictionary from a filed of an external index? You need to specify sourceLocation which is the location of the external index. What I do is: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext/str lst name=spellchecker str name=namedefault/str str name=fieldword_spell/str str name=spellcheckIndexDir./spellchecker1/str /lst /searchComponent word_spell is the field witch contains the dictionary in my secondary solr index, with I have placed in /spellchecker1. The spellcheckIndexDir is the location where this spellcheck index will be created. I guess the wiki documentation is lacking sourceLocation completely. I'll add more documentation. http://wiki.apache.org/solr/SpellCheckComponent -- Regards, Shalin Shekhar Mangar.
Re: Solr date parsing issue
On Fri, Mar 27, 2009 at 8:17 PM, Giovanni De Stefano giovanni.destef...@gmail.com wrote: Hello, I am having a problem indexing a date field. In my schema the date field is defined the standard way: fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true/ I know the Solr format is 1995-12-31T23:59:59Z, but the dates coming from my sources are in the format 2009-04-10T02:02:55+0200 How can I make the conversion? If you are using Solrj then parse it into a Date object and add it. Solrj will take care of writing it out in the correct format. If you are using DataImportHandler then use the DateFormatTransformer. -- Regards, Shalin Shekhar Mangar.
Re: Clarifying use of lst name=appends within a requestHandler
On Fri, Mar 27, 2009 at 8:00 PM, fergus mcmenemie fer...@twig.me.uk wrote: Hello, Due to limitations with the way my content is organised and DIH I have to add “-imgCaption:[* TO *]” to some of my queries. I discovered the name=”appends” functionality tucked away inside solconfig.xml. This looks a very useful feature, and I created a new requestHandler to deal with my problem queries. I tried adding the following to my alternate requestHandler:- lst name=appendsstr name=q-imgCaption:[* TO *]/str/lst which did not work; however lst name=appendsstr name=fq-imgCaption:[* TO *]/str/lst appends parameters are appended to the request parameters. An existing q parameter might be overriding it. Also, I'm not sure if pure negative queries are supported in the q parameter. You might need to do *:* AND -imgCaption:[* TO *] instead. I do remember that negative queries in fq work. worked fine and is also more efficient. I guess I was caught by the “identify values which should be appended to the list of ***multi-val params from the query” portion of the comment within solconfig.xml. I am now wondering how do I know which query params are multi-val or not? Is this documented anywhere? For example, fq, facet.field etc are multi-valued since multiple such params can be specified in the same request. q is single-valued. You can look through the Input Parameters section on the wiki front page for more details. -- Regards, Shalin Shekhar Mangar.
Re: Solr date parsing issue
Hello, the problem is that I use both Solrj and DIH but I would like to perform such a change only in 1 place. Is there any way to do it? Otherwise I will stick with the other approach... Cheers, Giovanni On 3/27/09, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Fri, Mar 27, 2009 at 8:17 PM, Giovanni De Stefano giovanni.destef...@gmail.com wrote: Hello, I am having a problem indexing a date field. In my schema the date field is defined the standard way: fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true/ I know the Solr format is 1995-12-31T23:59:59Z, but the dates coming from my sources are in the format 2009-04-10T02:02:55+0200 How can I make the conversion? If you are using Solrj then parse it into a Date object and add it. Solrj will take care of writing it out in the correct format. If you are using DataImportHandler then use the DateFormatTransformer. -- Regards, Shalin Shekhar Mangar.
Re: Encoding problem
On Fri, Mar 27, 2009 at 8:41 PM, Rui Pereira ruipereira...@gmail.comwrote: I'm having problems with encoding in responses from search queries. The encoding problem only occurs in the topologyname field, if a instancename has accents it is returned correctly. In all my configurations I have UTF-8. ?xml version=1.0 encoding=UTF-8? dataConfig document name=topologies entity query=SELECT DISTINCT '3141-' || Sub0.SUBID as id, 'Inventário' as topologyname, 3141 as topologyid, Sub0.SUBID as instancekey, Sub0.NAME as instancename FROM ... field column=INSTANCEKEY name=instancekey/ field column=ID name=id/ field column=TOPOLOGYID name=topologyid/ field column=INSTANCENAME name=instancename/ field column=TOPOLOGYNAME name=topologyname/... As an example, I can have in the response the following result: doc long name=instancekey285/long str name=instancenameInformática/str long name=topologyid3141/long str name=topologynameInventário/str /doc I see that you are specifying the topologyname's value in the query itself. It might be a bug in DataImportHandler because it reads the data-config as a string from an InputStream. If your default platform encoding is not UTF-8, this may be the cause. Can you try running the Solr's (or your servlet-container's) java process with -Dfile.encoding=UTF-8 and see if that fixes the problem? -- Regards, Shalin Shekhar Mangar.
Re: Solr date parsing issue
On Sat, Mar 28, 2009 at 12:46 AM, Giovanni De Stefano giovanni.destef...@gmail.com wrote: Hello, the problem is that I use both Solrj and DIH but I would like to perform such a change only in 1 place. Is there any way to do it? Otherwise I will stick with the other approach... Which of them are you using for adding documents? Both? -- Regards, Shalin Shekhar Mangar.
Re: Question about Solr memory usage.
On Sat, Mar 28, 2009 at 12:13 AM, Jim Adams jasolru...@gmail.com wrote: I'm running an old version of Solr -- it's 1.2, and I'm about to upgrade to 1.3. But I have a question about Solr 1.2 memory usage. I am occasionally seeing out of memory errors in my Solr log. Doesn't Solr release memory after a document has been indexed ? I would not think it is right for the memory usage to climb to its max specified in java options then give out of memory errors... It does. But then there's the caches and auto-warming (after commits). A lot of that stuff can be tweaked though. There are a lot of old mail threads on memory usage and optimization which you may find useful. Use a mailing list search engine like lucidimagination.com, markmail or nabble. -- Regards, Shalin Shekhar Mangar.
Re: Solr date parsing issue
Hello, yes, I use both: I have a multicore architecture, multiple indexes but I have been able to manage a common schema. Giovanni On 3/27/09, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Sat, Mar 28, 2009 at 12:46 AM, Giovanni De Stefano giovanni.destef...@gmail.com wrote: Hello, the problem is that I use both Solrj and DIH but I would like to perform such a change only in 1 place. Is there any way to do it? Otherwise I will stick with the other approach... Which of them are you using for adding documents? Both? -- Regards, Shalin Shekhar Mangar.
How to optimize Index Process?
Hi, We have a distributed Solr system (2-3 boxes with each running 2 instances of Solr and each Solr instance can write to multiple cores). Our use case is high index volume - we can get up to 100 million records (1 record = 500 bytes) per day, but very low query traffic (only administrators may need to search for data - once an hour our so). So, we need very fast index time. Here are the things I'm trying to find out in order to optimize our index process, 1) What's the optimum index size? I've noticed as the index size grows the indexing time starts increasing. In our test less than 10G index size we could index over 2K/sec, but as it grows over 20G the index rate drops to 1400/sec and keeps dropping as index size grows. I'm trying to see whether we can partition (create new SolrCore) after 10G. - related question, is there a way to find the SolrCore size (any web service for that?) - based on that information I can create a new core and freeze the one which has reached 10G. 2) In our test, we noticed that after few hours (after 8 hours of indexing) there is a period (3-4 hours period) where the indexing is very-very slow (like 500 records/sec) and after that period indexing returns back to normal rate (1500/sec). Does Solr run any optimize command on its own? How can we find that out? I'm not issuing any optimize command - should I be doing that after certain time? 3) Every time I add new documents (10K at once) to the index I see searcher closing and then re-opening/re-warming (in Catalina.out) after commit is done. I'm not sure if this is an expensive operation. Since, our search volume is very low can I configure Solr to not do this? Would it make indexing any faster? Mar 26, 2009 11:59:45 PM org.apache.solr.search.SolrIndexSearcher close INFO: Closing searc...@33d9337c main Mar 26, 2009 11:59:52 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher init INFO: Opening searc...@46ba6905 main Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@46ba6905 main from searc...@5c5ffecd main 4) Anything else (any other configuration in Solr - I'm currently using all default settings in the solrconfig.xml and default handlers) that could help optimize my indexing process? Thanks, -vivek
RE: large index vs multicore
Thanks for the reply. Yes in most of the usecase the data would be from both the indices. It's like a parent child relation. The usecase requires the data from the child be displayed along with parent product information. Thanks, Kalyan Manepalli -Original Message- From: Ryan McKinley [mailto:ryan...@gmail.com] Sent: Wednesday, March 25, 2009 8:54 PM To: solr-user@lucene.apache.org Subject: Re: large index vs multicore My question is - From design and query speed point of - should I add new core to handle the additional data or should I add the data to the existing core. Do you ever need to get results from both sets of data in the same query? If so, putting them in the same index will be faster. If every query is always limited to results within on set or the other -- and the doc count is not huge, then the choice of single core vs multi core is more about what you are more comfortable managing then it is about query speeds. Advantages of multicore- - the distinct data is in different indexes, you can maintain them independently (perhaps one data set never changes and the other changes often) Advantages of single core (with multiple data sets) - everything is in one place - replicate / load balance a single index rather then multiple. ryan
Re: Encoding problem
On Sat, Mar 28, 2009 at 12:51 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: I see that you are specifying the topologyname's value in the query itself. It might be a bug in DataImportHandler because it reads the data-config as a string from an InputStream. If your default platform encoding is not UTF-8, this may be the cause. I've opened SOLR-1090 to fix this issue. https://issues.apache.org/jira/browse/SOLR-1090 -- Regards, Shalin Shekhar Mangar.
Apachecon 2009 Europe
Hi all, you came back with a head full of impressions from Apachecon Europe. Thanks a lot for the great Speeches and the inspiring personal talks. I strongly believe that solr will have great future. Olivier -- Olivier Dobberkau d.k.d Internet Service GmbH fon: +49 (0)69 - 43 05 61-70 fax: +49 (0)69 - 43 05 61-90 mail: olivier.dobber...@dkd.de home: http://www.dkd.de
using multisearcher
Hi everybody, I'm interested in using Solr to search multiple indexes at once. We currently use our own search application which uses lucene's multisearcher. Has anyone attempted to or successfully replaced SolrIndexSearcher with some kind of multisearcher? I have looked at the DistributedSearch on the wiki and I'm pretty sure this isn't what we want. Also does anyone have any comments about trying to replace the SolrIndexSearcher with a SolrMultiSearcher; reasons why we shouldn't do this, pitfalls; suggestions about how to go about it etc. Also, it should be noted that we would only be adding documents to one of the indexes. I can give more info about the context of this application if necessary. Thank you for any suggestions! -- Brent Palmer Widernet.org University of Iowa 319-335-2200
OOM at MultiSegmentReader.norms
Hi, I've index of size 50G (around 100 million documents) and growing - around 2000 records (1 rec = 500 byes) are being written every second continuously. If I make any search on this index I get OOM. I'm using default cache settings (512,512,256) in the solrconfig.xml. The search is using the admin interface (returning 10 rows) with no sorting, faceting or highlighting. Max heap size is 1024m. Mar 27, 2009 9:13:41 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.MultiSegmentReader.norms(MultiSegmentReader.java:335) at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.Searcher.search(Searcher.java:126) at org.apache.lucene.search.Searcher.search(Searcher.java:105) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:966) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) What could be the problem? Thanks, -vivek
Re: optimization advice?
OK, we are a step closer. Sorting makes things slower. What field(s) do you sort on, what are their types, and if there is a date in there, are the dates very granular, and if they are, do you really need them to be that precise? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Steve Conover scono...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, March 27, 2009 1:51:14 PM Subject: Re: optimization advice? Steve, Maybe you can tell us about: sure - your hardware 2.5GB RAM, pretty modern virtual servers - query rate Let's say a few queries per second max... 4 And in general the challenge is to get latency on any given query down to something very low - we don't have to worry about a huge amount of load at the moment. - document cache and query cache settings class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ - your current response times This depends on the query. For queries that involve a total record count of 1 million, we often see 10ms response times, up to 4-500ms in the worst case. When we do a page one, sorted query on our full record set of 2 million+ records, response times can get up into 2+ seconds. - any pain points, any slow query patterns Something that can't be emphasized enough is that we can't predict what records people will want. Almost every query is aimed at a different set of records. -Steve
Re: How to optimize Index Process?
Hi, Answers inlined. -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message We have a distributed Solr system (2-3 boxes with each running 2 instances of Solr and each Solr instance can write to multiple cores). Is this really optimal? How many CPU cores do your boxes have vs. the number of Solr cores? Our use case is high index volume - we can get up to 100 million records (1 record = 500 bytes) per day, but very low query traffic (only administrators may need to search for data - once an hour our so). So, we need very fast index time. Here are the things I'm trying to find out in order to optimize our index process, It's tarting to sound like you might be able to batch your data and use http://wiki.apache.org/solr/UpdateCSV -- it's the fastest indexing method, I believe. 1) What's the optimum index size? I've noticed as the index size grows the indexing time starts increasing. In our test less than 10G index size we could index over 2K/sec, but as it grows over 20G the index rate drops to 1400/sec and keeps dropping as index size grows. I'm trying to see whether we can partition (create new SolrCore) after 10G. That's likely due to Lucene's segment merging. You can make mergeFactor bigger to make segment merging less frequent, but don't make it to high or you'll run into open file descriptor limits (which you could raise, of course). - related question, is there a way to find the SolrCore size (any web service for that?) - based on that information I can create a new core and freeze the one which has reached 10G. You can see the number of docs in an index via Admin Statistics page (the response is actually XML, look at the source) 2) In our test, we noticed that after few hours (after 8 hours of indexing) there is a period (3-4 hours period) where the indexing is very-very slow (like 500 records/sec) and after that period indexing returns back to normal rate (1500/sec). Does Solr run any optimize command on its own? How can we find that out? I'm not issuing any optimize command - should I be doing that after certain time? No, it doesn't run optimize on its own. It could be running auto-commit, but you should comment that out anyway. Try doing a thread dump to see what's doing on and watching the system with top, vmstat. No, you shouldn't optimize until you are completely done. 3) Every time I add new documents (10K at once) to the index I see searcher closing and then re-opening/re-warming (in Catalina.out) after commit is done. I'm not sure if this is an expensive operation. Since, our search volume is very low can I configure Solr to not do this? Would it make indexing any faster? Are you running the commit command after every 10K docs? No need to do that if you don't need your searcher to see the changes immediately. Mar 26, 2009 11:59:45 PM org.apache.solr.search.SolrIndexSearcher close INFO: Closing searc...@33d9337c main Mar 26, 2009 11:59:52 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher INFO: Opening searc...@46ba6905 main Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@46ba6905 main from searc...@5c5ffecd main 4) Anything else (any other configuration in Solr - I'm currently using all default settings in the solrconfig.xml and default handlers) that could help optimize my indexing process? Increase ramBufferSizeMB as much as you can afford. Comment out maxBufferedDocs, it's deprecated. Increase mergeFactor slightly. Consider the CSV approach. Index with multiple threads (match the number of CPU cores). If you are using Solrj, use the Streaming version of SolrServer. Give the JVM more memory (you'll need it if you increase ramBufferSizeMB) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
Re: OOM at MultiSegmentReader.norms
That's a tiny heap. Part of it is used for indexing, too. And the fact that your heap is so small shows you are not really making use of that nice ramBufferSizeMB setting. :) Also, use omitNorms=true for fields that don't need norms (if their types don't already do that). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar vivex...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, March 27, 2009 6:15:59 PM Subject: OOM at MultiSegmentReader.norms Hi, I've index of size 50G (around 100 million documents) and growing - around 2000 records (1 rec = 500 byes) are being written every second continuously. If I make any search on this index I get OOM. I'm using default cache settings (512,512,256) in the solrconfig.xml. The search is using the admin interface (returning 10 rows) with no sorting, faceting or highlighting. Max heap size is 1024m. Mar 27, 2009 9:13:41 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.MultiSegmentReader.norms(MultiSegmentReader.java:335) at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.Searcher.search(Searcher.java:126) at org.apache.lucene.search.Searcher.search(Searcher.java:105) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:966) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) What could be the problem? Thanks, -vivek
solr date parsing issue
Hi, I am implementing a project using SOLR in which we need to do a search based on date range. I am passing the date in SOLR date format. During formation of the SOLR query i am encoding the date string using UTF-8 encoding. After forming the whole query string i am posting the search request to SOLR by using Apache's HTTPClient class. But during posting it says invalid query. How to resolve this? please i need a resolution on a immediate basis. Thanks in advance. Regards Suryasnat Das Infosys
More Robust Search Timeouts (to Kill Zombie Queries)?
I've noticed that some of my queries take so long (5 min+) that by the time they return, there is no longer any plausible use for the search results. I've started calling these zombie queries because, well, they should be dead, but they just won't die. Instead, they stick around, wasting my Solr box's CPU, RAM, and I/O resources, and potentially causing more legitimate queries to stack up. (Regarding stacking up, see SOLR-138.) I may be able to prevent some of this by optimizing my index settings and by disallowing certain things, such as prefix wildcard queries (e.g. *ar). However, right now I'm most interested in figuring out how to get more robust server-side search timeouts in Solr. This would seem to provide a good balance between these goals: 1) I would like to allow users to attempt to run potentially expensive queries, such as queries with lots of wildcards or ranges 2) I would like to make sure that potentially expensive queries don't turn into zombies -- especially long-lasting zombies For example, I think some of my users might be willing to wait a minute or two for certain classes of search to complete. But after that point, I'd really like to say enough is enough. [Background] While my load is pretty low (it's not a public-facing site), some of my queries are monsters that can take, say, over 5 minutes. (I don't know how much longer than 5 minutes they might take. Some of them might take hours, for all I know, if allowed to run to completion!) The biggest cuprit queries currently seem to be wildcard queries. This is made worse by how I've allowed prefix wildcard searches on an index with a large # of terms. (This is made worse yet by doing word bigram indexing.) I've implemented the timeAllowed search time out support feature introduced in SOLR-502, and this does catch some searches that would have become zombies. (Some proximity searches, for example.) But the timeAllowed mechanism does not catch everything. And, as I understand it, it's powerless to do anything about, say, wildcard expansions that are taking forever. The question is how to proceed. [Option 1: Wait for someone to bring timeAllowed support to more parts of Solr search] This might be nice. I sort of assume it will happen eventually. I kind of want a more immediate solution, though. Any thoughts on how hard it would be to add the timeout to, say, wildcard expansion? I haven't figured out if I know enough about Solr yet to work on this myself. [Option 2: Add gross timeout support to StandardRequestHandler?] What if I modified StandardRequestHandler so that, when it was invoked, the following would happen: * spawn new thread t to do the stuff that StandardRequestHandlerStuff would normally do * start thread t * sleep, waiting for either the thread t to finish or for a timer go off * after waking up, look whether the timer went off. if so, then terminate thread t This would kill any runaway zombie queries. But maybe it would also have horrible side-effects. Is it wishful thinking to believe that this might not screw up referencing counting, or create deadlocks, or anything else? [Option 3: Servlet container-level Solutions?] I thought Jetty and friends would might have an option along the lines of if a request is taking longer than x seconds, then abort the thread handling it. This seems troublesome in practice, though: 1) I can't find a servlet container with documentation clearly stating that this is possible. 2) I played with Jetty, and maxIdleTime sounded like it *might* cause this behavior, but experiments suggest otherwise. 3) This behavior sounds dangerous, especially unless you can convince the servlet container to only abort index-reading threads, while leaving index-writing threads alone. Thanks for any advice, Chris
Re: optimization advice?
We sort by default on name, which varies quite a bit (we're never going to make sorting by field go away). The thing is solr has been pretty amazing across 1 million records. Now that we've doubled the size of the dataset things are definitely slower in a nonlinear way...I'm wondering what factors are involved here. -Steve On Fri, Mar 27, 2009 at 6:58 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: OK, we are a step closer. Sorting makes things slower. What field(s) do you sort on, what are their types, and if there is a date in there, are the dates very granular, and if they are, do you really need them to be that precise? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Steve Conover scono...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, March 27, 2009 1:51:14 PM Subject: Re: optimization advice? Steve, Maybe you can tell us about: sure - your hardware 2.5GB RAM, pretty modern virtual servers - query rate Let's say a few queries per second max... 4 And in general the challenge is to get latency on any given query down to something very low - we don't have to worry about a huge amount of load at the moment. - document cache and query cache settings class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ - your current response times This depends on the query. For queries that involve a total record count of 1 million, we often see 10ms response times, up to 4-500ms in the worst case. When we do a page one, sorted query on our full record set of 2 million+ records, response times can get up into 2+ seconds. - any pain points, any slow query patterns Something that can't be emphasized enough is that we can't predict what records people will want. Almost every query is aimed at a different set of records. -Steve
Re: solr date parsing issue
Mr. Das, Can you provide a little more details here? Helpful information would be: - The query string you're using - The fieldtype you're using for indexing the value in question. - The exact error message you're getting from Solr. Suryasnat Das wrote: Hi, I am implementing a project using SOLR in which we need to do a search based on date range. I am passing the date in SOLR date format. During formation of the SOLR query i am encoding the date string using UTF-8 encoding. After forming the whole query string i am posting the search request to SOLR by using Apache's HTTPClient class. But during posting it says invalid query. How to resolve this? please i need a resolution on a immediate basis. Thanks in advance. Regards Suryasnat Das Infosys -- View this message in context: http://www.nabble.com/solr-date-parsing-issue-tp22753196p22753613.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: optimization advice?
Steve, A field named name sounds like a free text field. What is its type, string or text? Fields you sort by should not be tokenized and should be indexed. I have a hunch your name field is tokenized. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Steve Conover scono...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, March 27, 2009 11:59:52 PM Subject: Re: optimization advice? We sort by default on name, which varies quite a bit (we're never going to make sorting by field go away). The thing is solr has been pretty amazing across 1 million records. Now that we've doubled the size of the dataset things are definitely slower in a nonlinear way...I'm wondering what factors are involved here. -Steve On Fri, Mar 27, 2009 at 6:58 PM, Otis Gospodnetic wrote: OK, we are a step closer. Sorting makes things slower. What field(s) do you sort on, what are their types, and if there is a date in there, are the dates very granular, and if they are, do you really need them to be that precise? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Steve Conover To: solr-user@lucene.apache.org Sent: Friday, March 27, 2009 1:51:14 PM Subject: Re: optimization advice? Steve, Maybe you can tell us about: sure - your hardware 2.5GB RAM, pretty modern virtual servers - query rate Let's say a few queries per second max... 4 And in general the challenge is to get latency on any given query down to something very low - we don't have to worry about a huge amount of load at the moment. - document cache and query cache settings class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ - your current response times This depends on the query. For queries that involve a total record count of 1 million, we often see 10ms response times, up to 4-500ms in the worst case. When we do a page one, sorted query on our full record set of 2 million+ records, response times can get up into 2+ seconds. - any pain points, any slow query patterns Something that can't be emphasized enough is that we can't predict what records people will want. Almost every query is aimed at a different set of records. -Steve
Re: solr date parsing issue
On Sat, Mar 28, 2009 at 8:17 AM, Suryasnat Das suryaatw...@gmail.comwrote: Hi, I am implementing a project using SOLR in which we need to do a search based on date range. I am passing the date in SOLR date format. During formation of the SOLR query i am encoding the date string using UTF-8 encoding. After forming the whole query string i am posting the search request to SOLR by using Apache's HTTPClient class. But during posting it says invalid query. How to resolve this? please i need a resolution on a immediate basis. Thanks in advance. Why don't you use Solrj client? http://wiki.apache.org/solr/Solrj -- Regards, Shalin Shekhar Mangar.