Re: Replication in Solr 1.4 - redirecting update handlers?
We had a similar issue using acts_as_solr. We already had lighttpd running on some servers so we just proxied all requests for /solr/CORE/ update to the master and /solr/CORE/select to a load balanced IP for our slaves. Doug On Jun 19, 2009, at 11:42 AM, Mark A. Matienzo wrote: I'm trying to figure out the best solution to the following issue. We've got three boxes in our replication set up - one master and two load balanced slaves, all of which serve Solr using Tomcat. Given this setup, we're also using the Drupal apachesolr module, which currently supports only one "Solr host" in its configuration. What is the best way to make this transparent to the Drupal module? Is it possible to have some sort of phony update handler to redirect the update requests to the master box from within Solr, or is this something that would be more properly implemented in the Tomcat configuration? Mark A. Matienzo Applications Developer, Digital Experience Group The New York Public Library
Re: Issue with AND/OR Operator in Dismax Request
http://issues.apache.org/jira/browse/SOLR-405 ? It's quite old and it's exactly what you want, but I think it might be the JIRA ticket that Otis mentioned. Using a filter query was what we really needed. I'm also not really sure why you need a dismax query at all. You're not querying for the same thing in multiple fields. Doug On May 20, 2009, at 1:18 PM, dabboo wrote: Hi, Yeah you are right. Can you please tell me the URL of JIRA. Thanks, Amit Otis Gospodnetic wrote: Amit, That's the same question as the other day, right? Yes, DisMax doesn't play well with Boolean operators. Check JIRA, it has a search box, so you may be able to find related patches. I think the patch I was thinking about is actually for something else - allowing field names to be specified in query string and DixMax handling that correctly. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: dabboo To: solr-user@lucene.apache.org Sent: Wednesday, May 20, 2009 1:35:00 AM Subject: Issue with AND/OR Operator in Dismax Request Hi, I am not getting correct results with a Query which has multiple AND | OR operator. Query Format q=((A AND B) OR (C OR D) OR E) ?q=((intAgeFrom_product_i:[0+TO+3]+AND+intAgeTo_product_i:[3+TO+*]) +OR+(intAgeFrom_product_i:[0+TO+3]+AND+intAgeTo_product_i:[0+TO+3]) +OR+(ageFrom_product_s:Adult))&qt=dismaxrequest Query return correct result without Dismaxrequest, but incorrect results with Dismaxrequest. I have to use dismaxrequest because i need boosting of search results According to some posts there are issues with AND | OR operator with dismaxrequest. Please let me know if anyone has faced the same problem and if there is any way to make the query work with dismaxrequest. I also believe that there is some patch available for this in one of the JIRA. I would appreciate if somebody can let me know the URL, so that I can take a look at the patch. Thanks for the help. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Issue-with-AND-OR-Operator-in-Dismax-Request-tp23629269p23629269.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Issue-with-AND-OR-Operator-in-Dismax-Request-tp23629269p23639786.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MoreLikeThis filtering
Hah. Sorry, I'm really out of it today. The MoreLikeThisComponent doesn't seem to work for filtering using fq, but the MoreLikeThisHandler does. Problem solved, we'll just use the handler instead of a component. Doug On Mar 4, 2009, at 11:02 AM, Doug Steigerwald wrote: Sorry. The examples on the wiki aren't working with the 'fq' to filter the similarities. It just filters the actual queries. http://localhost:8983/solr/mlt?q=id:SP2514N&mlt.fl=manu,cat&mlt.mindf=1&mlt.mintf=1&fq=popularity:6&mlt.displayTerms=details&mlt=true The popularity of the doc found is 6, and trying to use 'fq=popularity:6' brings back similarities with a popularity other than 6. Doug On Mar 4, 2009, at 10:39 AM, Doug Steigerwald wrote: Hm. I checked out a clean Solr 1.3.0 and indexed the example docs and set up a simple MLT handler the example queries on the Wiki work fine (fq can filter out docs). Our build has a slight change to QueryComponent so another query isn't done when we use localsolr +field collapsing, but that change doesn't look like it would make a difference. It just conditionally sets rb.setNeedDocSet() to true or false. Will run some tests on a clean fresh build of Solr to see if it's our build. Doug On Mar 4, 2009, at 9:28 AM, Otis Gospodnetic wrote: Doug, does the good old 'fq' not work with MLT? It should... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Doug Steigerwald To: solr-user@lucene.apache.org Sent: Wednesday, March 4, 2009 9:20:40 AM Subject: MoreLikeThis filtering Is it possible to filter similarities found by the MLT component/ handler? Something like mlt.fq=site_id:86? We have 32 cores in our Solr install, and some of those cores have up to 8 sites indexed in them. Typically those cores will have one very large site with a few hundred thousand indexed documents, and lots of small sites with significantly less documents indexed. We're looking to implement a MLT component for our sites but want the similar stories to be only for a specific site (not all sites in the core). Is there a way to do something like this, or will we have to make mods (I'm not seeing anything jump out at me in the Solr 1.3.0 or Lucene 2.4.0 code)? /solr/dsteiger/mlt?q=story_id:188665+AND+site_id: 86&mlt.fq=site_id:86 (We have all all of our other defaults set up in the handler config.) Thanks. --- Doug Steigerwald Software Developer McClatchy Interactive dsteigerw...@mcclatchyinteractive.com
Re: MoreLikeThis filtering
Sorry. The examples on the wiki aren't working with the 'fq' to filter the similarities. It just filters the actual queries. http://localhost:8983/solr/mlt?q=id:SP2514N&mlt.fl=manu,cat&mlt.mindf=1&mlt.mintf=1&fq=popularity:6&mlt.displayTerms=details&mlt=true The popularity of the doc found is 6, and trying to use 'fq=popularity: 6' brings back similarities with a popularity other than 6. Doug On Mar 4, 2009, at 10:39 AM, Doug Steigerwald wrote: Hm. I checked out a clean Solr 1.3.0 and indexed the example docs and set up a simple MLT handler the example queries on the Wiki work fine (fq can filter out docs). Our build has a slight change to QueryComponent so another query isn't done when we use localsolr +field collapsing, but that change doesn't look like it would make a difference. It just conditionally sets rb.setNeedDocSet() to true or false. Will run some tests on a clean fresh build of Solr to see if it's our build. Doug On Mar 4, 2009, at 9:28 AM, Otis Gospodnetic wrote: Doug, does the good old 'fq' not work with MLT? It should... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Doug Steigerwald To: solr-user@lucene.apache.org Sent: Wednesday, March 4, 2009 9:20:40 AM Subject: MoreLikeThis filtering Is it possible to filter similarities found by the MLT component/ handler? Something like mlt.fq=site_id:86? We have 32 cores in our Solr install, and some of those cores have up to 8 sites indexed in them. Typically those cores will have one very large site with a few hundred thousand indexed documents, and lots of small sites with significantly less documents indexed. We're looking to implement a MLT component for our sites but want the similar stories to be only for a specific site (not all sites in the core). Is there a way to do something like this, or will we have to make mods (I'm not seeing anything jump out at me in the Solr 1.3.0 or Lucene 2.4.0 code)? /solr/dsteiger/mlt?q=story_id:188665+AND+site_id:86&mlt.fq=site_id: 86 (We have all all of our other defaults set up in the handler config.) Thanks. --- Doug Steigerwald Software Developer McClatchy Interactive dsteigerw...@mcclatchyinteractive.com
Re: MoreLikeThis filtering
Hm. I checked out a clean Solr 1.3.0 and indexed the example docs and set up a simple MLT handler the example queries on the Wiki work fine (fq can filter out docs). Our build has a slight change to QueryComponent so another query isn't done when we use localsolr+field collapsing, but that change doesn't look like it would make a difference. It just conditionally sets rb.setNeedDocSet() to true or false. Will run some tests on a clean fresh build of Solr to see if it's our build. Doug On Mar 4, 2009, at 9:28 AM, Otis Gospodnetic wrote: Doug, does the good old 'fq' not work with MLT? It should... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message ---- From: Doug Steigerwald To: solr-user@lucene.apache.org Sent: Wednesday, March 4, 2009 9:20:40 AM Subject: MoreLikeThis filtering Is it possible to filter similarities found by the MLT component/ handler? Something like mlt.fq=site_id:86? We have 32 cores in our Solr install, and some of those cores have up to 8 sites indexed in them. Typically those cores will have one very large site with a few hundred thousand indexed documents, and lots of small sites with significantly less documents indexed. We're looking to implement a MLT component for our sites but want the similar stories to be only for a specific site (not all sites in the core). Is there a way to do something like this, or will we have to make mods (I'm not seeing anything jump out at me in the Solr 1.3.0 or Lucene 2.4.0 code)? /solr/dsteiger/mlt?q=story_id:188665+AND+site_id:86&mlt.fq=site_id:86 (We have all all of our other defaults set up in the handler config.) Thanks. --- Doug Steigerwald Software Developer McClatchy Interactive dsteigerw...@mcclatchyinteractive.com
Re: MoreLikeThis filtering
'fq' seems to only work with finding the documents with your original query, not for filtering the similar documents. Doug On Mar 4, 2009, at 9:28 AM, Otis Gospodnetic wrote: Doug, does the good old 'fq' not work with MLT? It should... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Doug Steigerwald To: solr-user@lucene.apache.org Sent: Wednesday, March 4, 2009 9:20:40 AM Subject: MoreLikeThis filtering Is it possible to filter similarities found by the MLT component/ handler? Something like mlt.fq=site_id:86? We have 32 cores in our Solr install, and some of those cores have up to 8 sites indexed in them. Typically those cores will have one very large site with a few hundred thousand indexed documents, and lots of small sites with significantly less documents indexed. We're looking to implement a MLT component for our sites but want the similar stories to be only for a specific site (not all sites in the core). Is there a way to do something like this, or will we have to make mods (I'm not seeing anything jump out at me in the Solr 1.3.0 or Lucene 2.4.0 code)? /solr/dsteiger/mlt?q=story_id:188665+AND+site_id:86&mlt.fq=site_id:86 (We have all all of our other defaults set up in the handler config.) Thanks. --- Doug Steigerwald Software Developer McClatchy Interactive dsteigerw...@mcclatchyinteractive.com
MoreLikeThis filtering
Is it possible to filter similarities found by the MLT component/ handler? Something like mlt.fq=site_id:86? We have 32 cores in our Solr install, and some of those cores have up to 8 sites indexed in them. Typically those cores will have one very large site with a few hundred thousand indexed documents, and lots of small sites with significantly less documents indexed. We're looking to implement a MLT component for our sites but want the similar stories to be only for a specific site (not all sites in the core). Is there a way to do something like this, or will we have to make mods (I'm not seeing anything jump out at me in the Solr 1.3.0 or Lucene 2.4.0 code)? /solr/dsteiger/mlt?q=story_id:188665+AND+site_id:86&mlt.fq=site_id:86 (We have all all of our other defaults set up in the handler config.) Thanks. --- Doug Steigerwald Software Developer McClatchy Interactive dsteigerw...@mcclatchyinteractive.com
Re: Applying Field Collapsing Patch
Have you tried just checking out (or exporting) the source from SVN and applying the patch? Works fine for me that way. $ svn co http://svn.apache.org/repos/asf/lucene/solr/tags/ release-1.3.0 solr-1.3.0 $ cd solr-1.3.0 ; patch -p0 < ~/Downloads/collapsing-patch-to-1.3.0- ivan_2.patch Doug On Dec 11, 2008, at 3:50 PM, John Martyniak wrote: It was a completely clean install. I downloaded it from one of mirrors right before applying the patch to it. Very troubling. Any other suggestions or ideas? I am running it on Mac OS Maybe I will try looking for some answers around that. -John On Dec 11, 2008, at 3:05 PM, Stephen Weiss wrote: Yes, only ivan patch 2 (and before, only ivan patch 1), my sense was these patches were meant to be used in isolation (there were no notes saying to apply any other patches first). Are you using patches for any other purpose (non-SOLR-236)? Maybe you need to apply this one first, then those patches. For me using any patch makes me nervous (we have a pretty strict policy about using beta code anywhere), I'm only doing it this once because it's absolutely necessary to provide the functionality desired. -- Steve On Dec 11, 2008, at 2:53 PM, John Martyniak wrote: thanks for the advice. I just downloaded a completely clean version, haven't even tried to build it yet. Applied the same, and I received exactly the same results. Do you only apply the ivan patch 2? What version of patch are you running? -John On Dec 11, 2008, at 2:10 PM, Stephen Weiss wrote: Are you sure you have a clean copy of the source? Every time I've applied his patch I grab a fresh copy of the tarball and run the exact same command, it always works for me. Now, whether the collapsing actually works is a different matter... -- Steve On Dec 11, 2008, at 1:29 PM, John Martyniak wrote: Hi, I am trying to apply Ivan's field collapsing patch to solr 1.3 (not a nightly), and it continously fails. I am using the following command: patch -p0 -i collapsing-patch-to-1.3.0-ivan_2.patch --dry-run I am in the apache-solr directory, and have read write for all files directories and files. I am get the following results: patching file src/test/org/apache/solr/search/TestDocSet.java Hunk #1 FAILED at 88. 1 out of 1 hunk FAILED -- saving rejects to file src/test/org/ apache/solr/search/TestDocSet.java.rej patching file src/java/org/apache/solr/search/CollapseFilter.java patching file src/java/org/apache/solr/search/DocSet.java Hunk #1 FAILED at 195. 1 out of 1 hunk FAILED -- saving rejects to file src/java/org/ apache/solr/search/DocSet.java.rej patching file src/java/org/apache/solr/search/NegatedDocSet.java patching file src/java/org/apache/solr/search/ SolrIndexSearcher.java Hunk #1 FAILED at 1357. 1 out of 1 hunk FAILED -- saving rejects to file src/java/org/ apache/solr/search/SolrIndexSearcher.java.rej patching file src/java/org/apache/solr/common/params/ CollapseParams.java patching file src/java/org/apache/solr/handler/component/ CollapseComponent.java Also the '.rej' files are not created. Does anybody have any ideas? thanks in advance for the help. -John
Re: snappuller issue with multicore
Try using the -d option with the snappuller so you can specify the path to the directory holding index data on local machine. Doug On Dec 10, 2008, at 10:20 AM, Kashyap, Raghu wrote: Bill, Yes I do have scripts.conf for each core. However, all the options needed for snappuller is specified in the command line itself (-D -S etc...) -Raghu -Original Message- From: Bill Au [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 10, 2008 9:17 AM To: solr-user@lucene.apache.org Subject: Re: snappuller issue with multicore I notices that you are using the same rysncd port for both core. Do you have a scripts.conf for each core? Bill On Tue, Dec 9, 2008 at 11:40 PM, Kashyap, Raghu <[EMAIL PROTECTED]>wrote: Hi, We are seeing a strange behavior with snappuller We have 2 cores Hotel & Location Here are the steps we perform 1. index hotel on master server 2. index location on master server 3. execute snapshooter for hotel core on master server 4. execute snapshooter for location core on master server 5. execute snappuller from slave machines (once for hotel core & once for location core) However, the hotel core snapshot is pulled into the location data dir. Here are the commands that we execute in our ruby scripts system('solr/multicore/hotel/bin/snappuller -P 18983 -S /solr/data -M masterServer -D /solr/data/hotel ") system("solr/multicore/location/bin/snappuller -P 18983 -M masterServer -S /solr/data -D /solr/data/location") Thanks, Raghu
Re: Problems with SOLR-236 (field collapsing)
The first output is from the query component. You might just need to make the collapse component first and remove the query component completely. We perform geographic searching with localsolr first (if we need to), and then try to collapse those results (if collapse=true). If we don't have any results yet, that's the only time we use the standard query component. I'm making sure we set the builder.setNeedDocSet=false and then I modified the query component to only execute when builder.isNeedDocSet=true. In the field collapsing patch that I'm using, I've got code to remove a previous 'response' from the builder.rsp so we don't have duplicates. Now, if I could get field collapsing to work properly with a docSet/ docList from localsolr and also have faceting work, I'd be golden. Doug On Dec 9, 2008, at 9:37 PM, Stephen Weiss wrote: Hi Tracy, Well, I managed to get it working (I think) but the weird thing is, in the XML output it gives both recordsets (the filtered and unfiltered - filtered second). In the JSON (the one I actually use anyway, at least) I only get the filtered results (as expected). In my core's solrconfig.xml, I added: class="org.apache.solr.handler.component.CollapseComponent" /> (I'm not sure if it's supposed to go anywhere in particular but for me it's right before StandardRequestHandler) and then within StandardRequestHandler: explicit query facet mlt highlight debug collapse Which is basically all the default values plus collapse. Not sure if this was needed for prior versions, I don't see it in any patch files (I just got a vague idea from looking at a comment from someone else who said it wasn't working for them). It would kinda be nice if someone working on the code might throw us a bone and say explicitly what the right options to put in the config file are (if there are even supposed to be any - for all I know, this is just a bandaid over a larger problem). I know it's not done yet though... just a pointer for this patch might be handy, it's really a useful feature if it works (I was kinda shocked this wasn't part of the standard distribution since it's something I had to do so often with mysql, kinda lucky I guess that it only came up now). Another issue I'm having now is the faceting doesn't seem to change - even if I set the collapse.facet option to "after"... I should really try "before" and see what happens. Of course, I just realized the integrity of my collapse field is not so great so I have to go back and redo the data :-) Best of luck. -- Steve On Dec 9, 2008, at 7:49 PM, Tracy Flynn (SOLR) wrote: Steve, I need this too. As my previous posting said, I adapted the 1.2 field collapsing back at the beginning of the year, so I'm somewhat familiar. I'll try and get a look this weekend. It's the earliest I''m likely to get spare cycles. I'll post any results. Tracy On Dec 9, 2008, at 4:18 PM, Stephen Weiss wrote: Hi, I'm trying to use field collapsing with our SOLR but I just can't seem to get it to do anything. I've downloaded a dist copy of solr 1.3 and applied Ivan de Prado's patch - reading through the source code, the patch definitely was applied successfully (all the changes are in the right places, I've checked every single one). I've run ant clean, ant compile, and ant dist to produce the war file in the dist/ folder, and then put the war file in place and restarted jetty. According to the logs, jetty is definitely loading the right war file. If I expand the war file and grep through the files, it would appear the collapsing code is there. However, when I add any sort of collapse parameters (I've tried any combination of collapse=true collapse.field=link_id collapse.threshold=1 collapse.type=normal collapse.info.doc=true), the result set is no different from normal query, and there is no collapse data returned in the XML. I'm not a java developer, this is my first time using ant period, and I'm just following basic directions I found on google. Here is the output of the compilation process: I really need this patch to work for a project... Can someone please tell me what I'm missing to get this to work? I can't really find any documentation beyond adding the collapse options to the query string, so it's hard to tell - is there an option in solrconfig.xml or in the core configuration that needs to be set? Am I going about this entirely the wrong way? Thanks for any advice, I appreciate it. [ sorry if you get this twice, I accidentally sent first from the wrong e-mail address and I don't think it went through ] -- Steve
Re: IndexOutOfBoundsException
We actually have this same exact issue on 5 of our cores. We're just going to wipe the index and reindex soon, but it isn't actually causing any problems for us. We can update the index just fine, there's just no merging going on. Ours happened when I reloaded all of our cores for a schema change. I don't do that any more ;). Doug On Aug 14, 2008, at 11:08 PM, Yonik Seeley wrote: Since this looks like more of a lucene issue, I've replied in [EMAIL PROTECTED] -Yonik On Thu, Aug 14, 2008 at 10:18 PM, Ian Connor <[EMAIL PROTECTED]> wrote: I seem to be able to reproduce this very easily and the data is medline (so I am sure I can share it if needed with a quick email to check). - I am using fedora: %uname -a Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30 13:18:33 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux %java -version java version "1.7.0" IcedTea Runtime Environment (build 1.7.0-b21) IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode) - single core (will use shards but each machine just as one HDD so didn't see how cores would help but I am new at this) - next run I will keep the output to check for earlier errors - very and I can share code + data if that will help On Thu, Aug 14, 2008 at 4:23 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: Yikes... not good. This shouldn't be due to anything you did wrong Ian... it looks like a lucene bug. Some questions: - what platform are you running on, and what JVM? - are you using multicore? (I fixed some index locking bugs recently) - are there any exceptions in the log before this? - how reproducible is this? -Yonik On Thu, Aug 14, 2008 at 2:47 PM, Ian Connor <[EMAIL PROTECTED]> wrote: Hi, I have rebuilt my index a few times (it should get up to about 4 Million but around 1 Million it starts to fall apart). Exception in thread "Lucene Merge Thread #0" org.apache.lucene.index.MergePolicy$MergeException: java.lang.IndexOutOfBoundsException: Index: 105, Size: 33 at org .apache .lucene .index .ConcurrentMergeScheduler .handleMergeException(ConcurrentMergeScheduler.java:323) at org.apache.lucene.index.ConcurrentMergeScheduler $MergeThread.run(ConcurrentMergeScheduler.java:300) Caused by: java.lang.IndexOutOfBoundsException: Index: 105, Size: 33 at java.util.ArrayList.rangeCheck(ArrayList.java:572) at java.util.ArrayList.get(ArrayList.java:350) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:188) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java: 670) at org .apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java: 349) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 3998) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3650) at org .apache .lucene .index .ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java: 214) at org.apache.lucene.index.ConcurrentMergeScheduler $MergeThread.run(ConcurrentMergeScheduler.java:269) When this happens, the disk usage goes right up and the indexing really starts to slow down. I am using a Solr build from about a week ago - so my Lucene is at 2.4 according to the war files. Has anyone seen this error before? Is it possible to tell which Array is too large? Would it be an Array I am sending in or another internal one? Regards, Ian Connor -- Regards, Ian Connor
Re: spellcheck collation
Right before I sent the message. Did a 'svn up src/;and clean;ant dist' and it failed. Seems to work fine now. On Aug 14, 2008, at 2:38 PM, Ryan McKinley wrote: have you updated recently? isEnabled() was removed last night... On Aug 14, 2008, at 2:30 PM, Doug Steigerwald wrote: I'd try, but the build is failing from (guessing) Ryan's last commit: compile: [mkdir] Created dir: /Users/dsteiger/Desktop/java/solr/build/core [javac] Compiling 337 source files to /Users/dsteiger/Desktop/ java/solr/build/core [javac] /Users/dsteiger/Desktop/java/solr/client/java/solrj/src/ org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.java:129: cannot find symbol [javac] symbol : method isEnabled() [javac] location: class org.apache.solr.core.CoreContainer [javac] multicore.isEnabled() ) { Doug On Aug 14, 2008, at 2:24 PM, Grant Ingersoll wrote: I believe I just fixed this on SOLR-606 (thanks to Stefan's patch). Give it a try and let us know. -Grant On Aug 13, 2008, at 2:25 PM, Doug Steigerwald wrote: I've noticed a few things with the new spellcheck component that seem a little strange. Here's my document: 5 wii blackberry blackjack creative labs zen ipod video nano Some sample queries: http://localhost:8983/solr/core1/spellCheckCompRH?q=blackberri+wi&spellcheck=true&spellcheck.collate=true http://localhost:8983/solr/core1/spellCheckCompRH?q=blackberr+wi&spellcheck=true&spellcheck.collate=true http://localhost:8983/solr/core1/spellCheckCompRH?q=blackber+wi&spellcheck=true&spellcheck.collate=true When spellchecking 'blackberri wi', the collation returned is 'blackberry wii'. When spellchecking 'blackberr wi', the collation returned is 'blackberrywii'. 'blackber wi' returns 'blackberrwiiwi'. Doug
Re: spellcheck collation
I'd try, but the build is failing from (guessing) Ryan's last commit: compile: [mkdir] Created dir: /Users/dsteiger/Desktop/java/solr/build/core [javac] Compiling 337 source files to /Users/dsteiger/Desktop/ java/solr/build/core [javac] /Users/dsteiger/Desktop/java/solr/client/java/solrj/src/ org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.java:129: cannot find symbol [javac] symbol : method isEnabled() [javac] location: class org.apache.solr.core.CoreContainer [javac] multicore.isEnabled() ) { Doug On Aug 14, 2008, at 2:24 PM, Grant Ingersoll wrote: I believe I just fixed this on SOLR-606 (thanks to Stefan's patch). Give it a try and let us know. -Grant On Aug 13, 2008, at 2:25 PM, Doug Steigerwald wrote: I've noticed a few things with the new spellcheck component that seem a little strange. Here's my document: 5 wii blackberry blackjack creative labs zen ipod video nano Some sample queries: http://localhost:8983/solr/core1/spellCheckCompRH?q=blackberri+wi&spellcheck=true&spellcheck.collate=true http://localhost:8983/solr/core1/spellCheckCompRH?q=blackberr+wi&spellcheck=true&spellcheck.collate=true http://localhost:8983/solr/core1/spellCheckCompRH?q=blackber+wi&spellcheck=true&spellcheck.collate=true When spellchecking 'blackberri wi', the collation returned is 'blackberry wii'. When spellchecking 'blackberr wi', the collation returned is 'blackberrywii'. 'blackber wi' returns 'blackberrwiiwi'. Doug
Re: more multicore fun
Ah, that's right. Thanks. Forgot I had to do that with our current setup in production. On Aug 13, 2008, at 3:05 PM, Ryan McKinley wrote: the dataDir is configured in solrconfig.xml With multicore it is currently a bit wonky. Currenlty, you need to configure it explicitly for each core, but it shares the same system variables: ${solr.data.dir}, so if you use properties, you end up pointing to the same place. https://issues.apache.org/jira/browse/SOLR-545 is hoping to solve this... Before 1.3 is released, you will either be able to: 1. set the dataDir from your solr.xml config or 2. set a system property in solr.xml and have solrconfig decide where the dataDir is... for now -- if you remove the dataDir config from solrconfig.xml it will use the default directory for each instanceDir and will point to independent locations... ryan On Aug 13, 2008, at 2:52 PM, Doug Steigerwald wrote: OK. Last question for a while (hopefully), but something else with multicore seems to be wrong. $ java -jar start.jar ... INFO: [core0] Opening new SolrCore at solr/core0/, dataDir=./solr/ data/ ... INFO: [core1] Opening new SolrCore at solr/core1/, dataDir=./solr/ data/ ... The instanceDir seems to be fine, but the dataDir isn't being set correctly. The dataDir is actually example/solr/data instead of example/solr/core{0|1}/data. http://localhost:8983/solr/admin/multicore shows the exact same path to the index for both cores. Am I missing something that the example multicore config doesn't use? Thanks. Doug
Re: multicore /solr/update
I checked out the trunk about 2 hours ago. Was the last commit on the 10th supposed to fix this (r684606)? On Aug 13, 2008, at 3:00 PM, Ryan McKinley wrote: check a recent version, this issue should have been fixed in: https://issues.apache.org/jira/browse/SOLR-545 On Aug 13, 2008, at 2:22 PM, Doug Steigerwald wrote: Yeah, that's the problem. Not having the core in the URL you're posting to shouldn't update any core, but it does. Doug On Aug 13, 2008, at 2:10 PM, Alok K. Dhir wrote: you need to add the core to your call -- post to http://localhost:8983/solr/coreX/update On Aug 13, 2008, at 1:58 PM, Doug Steigerwald wrote: I've got two cores (core{0|1}) both using the provided example schema (example/solr/conf/schema.xml). Posting to http://localhost:8983/solr/update added the example docs to the last core loaded (core1). Shouldn't this give you a 400? Doug --- Alok K. Dhir Symplicity Corporation www.symplicity.com (703) 351-0200 x 8080 [EMAIL PROTECTED]
more multicore fun
OK. Last question for a while (hopefully), but something else with multicore seems to be wrong. $ java -jar start.jar ... INFO: [core0] Opening new SolrCore at solr/core0/, dataDir=./solr/data/ ... INFO: [core1] Opening new SolrCore at solr/core1/, dataDir=./solr/data/ ... The instanceDir seems to be fine, but the dataDir isn't being set correctly. The dataDir is actually example/solr/data instead of example/solr/core{0|1}/data. http://localhost:8983/solr/admin/multicore shows the exact same path to the index for both cores. Am I missing something that the example multicore config doesn't use? Thanks. Doug
spellcheck collation
I've noticed a few things with the new spellcheck component that seem a little strange. Here's my document: 5 wii blackberry blackjack creative labs zen ipod video nano Some sample queries: http://localhost:8983/solr/core1/spellCheckCompRH?q=blackberri+wi&spellcheck=true&spellcheck.collate=true http://localhost:8983/solr/core1/spellCheckCompRH?q=blackberr+wi&spellcheck=true&spellcheck.collate=true http://localhost:8983/solr/core1/spellCheckCompRH?q=blackber+wi&spellcheck=true&spellcheck.collate=true When spellchecking 'blackberri wi', the collation returned is 'blackberry wii'. When spellchecking 'blackberr wi', the collation returned is 'blackberrywii'. 'blackber wi' returns 'blackberrwiiwi'. Doug
Re: multicore /solr/update
Yeah, that's the problem. Not having the core in the URL you're posting to shouldn't update any core, but it does. Doug On Aug 13, 2008, at 2:10 PM, Alok K. Dhir wrote: you need to add the core to your call -- post to http://localhost:8983/solr/coreX/update On Aug 13, 2008, at 1:58 PM, Doug Steigerwald wrote: I've got two cores (core{0|1}) both using the provided example schema (example/solr/conf/schema.xml). Posting to http://localhost:8983/solr/update added the example docs to the last core loaded (core1). Shouldn't this give you a 400? Doug --- Alok K. Dhir Symplicity Corporation www.symplicity.com (703) 351-0200 x 8080 [EMAIL PROTECTED]
multicore /solr/update
I've got two cores (core{0|1}) both using the provided example schema (example/solr/conf/schema.xml). Posting to http://localhost:8983/solr/update added the example docs to the last core loaded (core1). Shouldn't this give you a 400? Doug
WordGramFilterFactory
Just checked out Solr trunk from SVN and ran 'ant dist && ant example'. Running the example throws out errors because there is no WordGramFilterFactory class. We don't need it here, but is that something waiting to be committed? Doug --Snippet from schema-- positionIncrementGap="100" > generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> maxLength="3" sep=" " />
Re: Solr stops responding
ivial cases... P.S. 72 hours runtime: OOM problem with BEA JRockit R27: jrockit-R27.4.0- jdk1.6.0_02 (AMD Opteron, 64bit, SLES 10 SP1, Tomcat 5.5.26). 100k queries a day... Jul 17, 2008 11:08:07 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 3149016, Num elements: 393625 at org.apache.solr.util.OpenBitSet.(OpenBitSet.java:86) at org .apache .solr.search.DocSetHitCollector.collect(DocSetHitCollector.java:63) at org.apache.solr.search.SolrIndexSearcher $9.collect(SolrIndexSearcher.java:1072) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:320) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146) at org.apache.lucene.search.Searcher.search(Searcher.java:118) at org.apache.lucene.search.Searcher.search(Searcher.java:97) at org .apache .solr .search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java: 1069) at org .apache .solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:804) at org .apache .solr .search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java: 1245) at org .apache .solr.handler.component.QueryComponent.process(QueryComponent.java:96) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java:148) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:117) at org.apache.solr.core.SolrCore.execute(SolrCore.java:902) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:280) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:237) at org .apache .catalina .core .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 215) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org .apache .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 213) at org .apache .catalina.core.StandardContextValve.invoke(StandardContextValve.java: 174) at org .apache .catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org .apache .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org .apache .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 108) at org .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 174) at org .apache .coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:834) at org.apache.coyote.http11.Http11AprProtocol $Http11ConnectionHandler.process(Http11AprProtocol.java:640) at org.apache.tomcat.util.net.AprEndpoint $Worker.run(AprEndpoint.java:1286) at java.lang.Thread.run(Thread.java:619) P.P.S. I'll send thread dump in separate Email Quoting Doug Steigerwald <[EMAIL PROTECTED]>: It happened again last night. I cronned a script that ran jstack on the process every 5 minutes just to see what was going on. Here's a snippet: "btpool0-2668" prio=10 tid=0x2aac3a905800 nid=0x76ed waiting for monitor entry [0x5e584000..0x5e585a10] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.solr.search.LRUCache.get(LRUCache.java:129) - waiting to lock <0x2aaabcdd9450> (a org.apache.solr.search.LRUCache$1) at org .apache .solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java: 730) at org .apache .solr.search.SolrIndexSearcher.getDocList(SolrIndexSearcher.java:693) at org.apache.solr.search.CollapseFilter.(CollapseFilter.java:137) at org .apache .solr .handler.component.CollapseComponent.process(CollapseComponent.java: 97) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java:148) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 117) at org.apache.solr.core.SolrCore.execute(SolrCore.java:942) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:280) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 237) During this log, there were 547 threads active (going by occurrences of Thread.State in the log). Here's some more: "btpool0-2051" prio=10 tid=0x2aac39144c00 nid=0x4012 waiting for monitor entry [0x45bfc000..0x45bfdd90] java.lang.Thread.State: BLOCKED (on object monitor) at java.util.Vector.size(Unknown Source) - waiting to lock <0x2aaac0af0ea0> (a java.util.Vector) at java.util.AbstractList.listIterator(Unknown Source) at java.util.AbstractList.listIterator(Unknown Source) at java.util.AbstractLi
Re: Solr stops responding
5, 2008 at 8:59 PM, Fuad Efendi <[EMAIL PROTECTED]> wrote: I constantly have the same problem; sometimes I have OutOfMemoryError in logs, sometimes not. Not-predictable. I minimized all caches, it still happens even with 8192M. CPU usage is 375%-400% (two double-core Opterons), SUN Java 5. Moved to BEA JRockit 5 yesterday, looks 30 times faster (25% CPU load with 4096M RAM); no any problem yet, let's see... Strange: Tomcat simply hangs instead of exit(...) There are some posts related to OutOfMemoryError in solr-user list. == http://www.linkedin.com/in/liferay Quoting Doug Steigerwald <[EMAIL PROTECTED]>: Since we pushed Solr out to production a few weeks ago, we've seen a few issues with Solr not responding to requests (searches or admin pages). There doesn't seem to be any reason for it from what we can tell. We haven't seen it in QA or development. We're running Solr with basically the example Solr setup with Jetty (6.1.3). We package our Solr install by using 'ant example' and replacing configs/etc. Whenever Solr stops responding, there are no messages in the logs, nothing. Requests just time out. We have also only seen this on our slaves. The master doesn't seem to be hitting this issue. All the boxes are the same, version of java is the same, etc. We don't have a stack trace and no JMX set up. Once we see this issue, our support folks just stop and start Solr on that machine. Has anyone else run into anything like this with Solr? Thanks. Doug -- --Noble Paul
Re: Solr stops responding
We haven't seen an OutOfMemoryError. The load on the server doesn't go up either (hovers around 1-2). We're on Java 1.6.0_03-b05. 4x3.8GHz Xeons, 8GB RAM. Doug On Jul 15, 2008, at 11:29 AM, Fuad Efendi wrote: I constantly have the same problem; sometimes I have OutOfMemoryError in logs, sometimes not. Not-predictable. I minimized all caches, it still happens even with 8192M. CPU usage is 375%-400% (two double-core Opterons), SUN Java 5. Moved to BEA JRockit 5 yesterday, looks 30 times faster (25% CPU load with 4096M RAM); no any problem yet, let's see... Strange: Tomcat simply hangs instead of exit(...) There are some posts related to OutOfMemoryError in solr-user list. == http://www.linkedin.com/in/liferay Quoting Doug Steigerwald <[EMAIL PROTECTED]>: Since we pushed Solr out to production a few weeks ago, we've seen a few issues with Solr not responding to requests (searches or admin pages). There doesn't seem to be any reason for it from what we can tell. We haven't seen it in QA or development. We're running Solr with basically the example Solr setup with Jetty (6.1.3). We package our Solr install by using 'ant example' and replacing configs/etc. Whenever Solr stops responding, there are no messages in the logs, nothing. Requests just time out. We have also only seen this on our slaves. The master doesn't seem to be hitting this issue. All the boxes are the same, version of java is the same, etc. We don't have a stack trace and no JMX set up. Once we see this issue, our support folks just stop and start Solr on that machine. Has anyone else run into anything like this with Solr? Thanks. Doug
Solr stops responding
Since we pushed Solr out to production a few weeks ago, we've seen a few issues with Solr not responding to requests (searches or admin pages). There doesn't seem to be any reason for it from what we can tell. We haven't seen it in QA or development. We're running Solr with basically the example Solr setup with Jetty (6.1.3). We package our Solr install by using 'ant example' and replacing configs/etc. Whenever Solr stops responding, there are no messages in the logs, nothing. Requests just time out. We have also only seen this on our slaves. The master doesn't seem to be hitting this issue. All the boxes are the same, version of java is the same, etc. We don't have a stack trace and no JMX set up. Once we see this issue, our support folks just stop and start Solr on that machine. Has anyone else run into anything like this with Solr? Thanks. Doug
High load when updating many cores
We're experiencing some high load on our Solr master server. It currently has 30 cores and processes over 3 million updates per day. During most of the day the load on the master is low (0.5 to 2), but sometimes we get spikes in excess of 12 for hours at a time. The only reason I can figure why this is happening because we're updating almost all of our cores during those times. Usually during the day our sites update pretty randomly, but it seems like many of them send updates at the same time. Over a 3 hour period where the load was ~12 we had only 156k updates. Usually a pretty light load when updating a single core through just a few producers. It seems as though we're just getting updates from nearly all of our 30 cores at once, and something in the background is slowing down. Here's some stats about our setup. 4x3.2GHz Xeon. 8GB RAM. RHEL 5.1. 4GB max heap size for Solr. Our build is a trunk build from January (using Lucene 2.3.0). Java 1.6.0_03-b05 (64bit). Using Jetty started as: 'java -server -Xms1024m -Xmx4096m -jar start.jar' We never query the master, but we do have caching enabled (same configs on master and slave). autowarmCount is set to 0 for each core (they all use the same configs). We autocommit every 5 seconds. Any ideas what might cause the load to spike? Could it be our caching even though we have autowarmCount set to 0? Could it be that Solr is trying to merge a lot of indexes at once? Maybe some garbage collection stuff? Thanks. Doug
Re: MergeException
We're using Lucene 2.3.0. I'll try upgrading to 2.3.2 at some point. All of our cores are updating fine, so not a huge rush. Thanks. Doug On Jul 2, 2008, at 9:42 AM, Yonik Seeley wrote: Doug, it looks like it might be this Lucene bug: https://issues.apache.org/jira/browse/LUCENE-1262 What version of Lucene is in the Solr you are running? You might want to try either one of the latest Solr nightly builds, or at least upgrading your Lucene version in Solr if it's not the latest patch release. -Yonik On Wed, Jul 2, 2008 at 9:03 AM, Doug Steigerwald <[EMAIL PROTECTED]> wrote: What exactly does this error mean and how can we fix it? As far as I can tell, all of our 30+ cores seem to be updating and autocommiting fine. By fine I mean our autocommit hook is firing for all cores which leads me to believe that the commit is happening, but segments can't be merged. Are we going to have to rebuild whatever core this happens to be (if I can figure it out)? Exception in thread "Thread-704" org.apache.lucene.index.MergePolicy$MergeException: java.lang.IndexOutOfBoundsException: Index: 43, Size: 43 at org.apache.lucene.index.ConcurrentMergeScheduler $MergeThread.run(ConcurrentMergeScheduler.java:271) Caused by: java.lang.IndexOutOfBoundsException: Index: 43, Size: 43 at java.util.ArrayList.RangeCheck(Unknown Source) at java.util.ArrayList.get(Unknown Source) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java: 154) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java: 659) at org .apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java: 319) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:133) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 3109) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java: 2834) Thanks. Doug
MergeException
What exactly does this error mean and how can we fix it? As far as I can tell, all of our 30+ cores seem to be updating and autocommiting fine. By fine I mean our autocommit hook is firing for all cores which leads me to believe that the commit is happening, but segments can't be merged. Are we going to have to rebuild whatever core this happens to be (if I can figure it out)? Exception in thread "Thread-704" org.apache.lucene.index.MergePolicy $MergeException: java.lang.IndexOutOfBoundsException: Index: 43, Size: 43 at org.apache.lucene.index.ConcurrentMergeScheduler $MergeThread.run(ConcurrentMergeScheduler.java:271) Caused by: java.lang.IndexOutOfBoundsException: Index: 43, Size: 43 at java.util.ArrayList.RangeCheck(Unknown Source) at java.util.ArrayList.get(Unknown Source) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:154) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java: 659) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java: 319) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:133) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 3109) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2834) Thanks. Doug
Re: java.io.FileNotFoundException?
Was looking on our server and at one point there were over 13k open file descriptors for the same spell index: /home/dsteiger/local/solr/cores/qaa/data/spell/_1ji.cfs. At some point dropped back down to 3000 (when I checked again) with no intervention from us. On my local machine after every query I end up with two extra open files for the spell index. Solr start: $ ls -l /proc/18832/fd|grep spell lr-x-- 1 dsteiger dsteiger 64 2008-04-03 11:37 28 -> /home/dsteiger/Desktop/solr/example/work/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp/WEB-INF/lib/lucene-spellchecker-2.3.0.jar After first query: $ ls -l /proc/18832/fd|grep spell lr-x-- 1 dsteiger dsteiger 64 2008-04-03 11:37 28 -> /home/dsteiger/Desktop/solr/example/work/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp/WEB-INF/lib/lucene-spellchecker-2.3.0.jar lr-x-- 1 dsteiger dsteiger 64 2008-04-03 11:37 49 -> /home/dsteiger/Desktop/solr/example/solr/core0/data/spell/_25y.cfs lr-x-- 1 dsteiger dsteiger 64 2008-04-03 11:37 50 -> /home/dsteiger/Desktop/solr/example/solr/core0/data/spell/_25y.cfs After 10 queries: $ ls -l /proc/18832/fd|grep _25y|wc -l 20 Up until this point I've done each query one at a time. After 15 more queries with a perl script (15 konsoles open all running my random query script at the same time): $ ls -l /proc/18832/fd|grep _25y|wc -l 38 Another 15 leaves the count at 44. I'm guessing this has to do with the spellchecker being in a component and how I ripped the code out of the SpellCheckRequestHandler. If I hit the SpellCheckRequestHandler normally (http://localhost:8983/solr/core0/select?qt=spellchecker&q=pouted), two files are opened after the first query, and then no additional files opened. If anyone wants to take a look at the spellcheck component I have, let me know and I'll pass it along. I may just have to stop using it and go back to a separate request for our spellchecking. Thanks. Doug Doug Steigerwald wrote: The user that runs our apps is configured to allow 65536 open files in limits.conf. Shouldn't even come close to that number. Solr is the only app we have running on these machines as our app user. We hit the same type of issue when we had our mergeFactor set to 40 for all of our indexes. We lowered it to 5 and have been fine since. No errors in the snappuller for either core. The spellcheck index is rebuilt once a night around midnight and copied to the slave afterwards. I had even rebuilt the spell index manually for the two cores, pulled them, installed them, and tested to make sure it was working with a few queries before the load testing started (this was before we released the patch to lower the spell index mergeFactor). We were even getting errors trying to run out postCommit script on the slave (it doesn't end up doing anything since it's the slave). SEVERE: java.io.IOException: Cannot run program "./solr/bin/snapctl": java.io.IOException: error=24, Too many open files at java.lang.ProcessBuilder.start(Unknown Source) at java.lang.Runtime.exec(Unknown Source) And a correction from my previous email. The errors started 10 -seconds- after load testing started. This was about 40 minutes after Solr started, and less than 30 queries had been run on the server before load testing started. Load testing has been fine since I restarted Solr and rebuilt the spellcheck indexes with the lowered mergeFactor. Doug Otis Gospodnetic wrote: Hi Doug, Sounds fishy, especially increasing/decreasing mergeFactor to "funny values" (try changing your OS setting instead). My guess is this is happening only with the 2 indices that are being modified and I'll guess that the FNFE is due to a bad/incomplete rsync from the master. Do snappuller logs mention any errors? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
Re: java.io.FileNotFoundException?
The user that runs our apps is configured to allow 65536 open files in limits.conf. Shouldn't even come close to that number. Solr is the only app we have running on these machines as our app user. We hit the same type of issue when we had our mergeFactor set to 40 for all of our indexes. We lowered it to 5 and have been fine since. No errors in the snappuller for either core. The spellcheck index is rebuilt once a night around midnight and copied to the slave afterwards. I had even rebuilt the spell index manually for the two cores, pulled them, installed them, and tested to make sure it was working with a few queries before the load testing started (this was before we released the patch to lower the spell index mergeFactor). We were even getting errors trying to run out postCommit script on the slave (it doesn't end up doing anything since it's the slave). SEVERE: java.io.IOException: Cannot run program "./solr/bin/snapctl": java.io.IOException: error=24, Too many open files at java.lang.ProcessBuilder.start(Unknown Source) at java.lang.Runtime.exec(Unknown Source) And a correction from my previous email. The errors started 10 -seconds- after load testing started. This was about 40 minutes after Solr started, and less than 30 queries had been run on the server before load testing started. Load testing has been fine since I restarted Solr and rebuilt the spellcheck indexes with the lowered mergeFactor. Doug Otis Gospodnetic wrote: Hi Doug, Sounds fishy, especially increasing/decreasing mergeFactor to "funny values" (try changing your OS setting instead). My guess is this is happening only with the 2 indices that are being modified and I'll guess that the FNFE is due to a bad/incomplete rsync from the master. Do snappuller logs mention any errors? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
java.io.FileNotFoundException?
We just started hitting a FileNotFoundException for no real apparent reason for both our regular index and our spellchecker index, and only a few minute after we restarted Solr. I did some searching and didn't find much that helped. We started to do some load testing, and after about 10 minutes we started getting these errors. We hit the spellchecker every request through a SpellcheckComponent that we created (ie, code ripped out of SpellCheckRequestHandler for now). It runs essentially the same code as the spellcheck request handler when we specify a parameter (spellcheck=true). We have 34 cores. All but two cores are fully optimized (haven't been updated in 2 months). Only two cores are actively updated. We started Solr around 11:45am, not much happened until 12:27 when we started load testing (just a few queries, maybe 100 updates). find /home/dsteiger/local/solr/cores/*/data/index|wc -l => 414 find /home/dsteiger/local/solr/cores/*/data/spell|wc -l => 6 (only the two 'active' cores use the spell checker). So, not many files are open. Anyone have any idea what might cause the two below errors to happen? When I restarted Solr around 11:45am it was to test a new patch that set the mergeFactor in the lucene spellchecker to 2 instead of 300 because we kept running into 'too many files open' errors when rebuilding more than one spell index at a time. The spell indexes were rebuilt manually using the mergeFactor of 300, solr restarted, and any subsequent rebuild of the spell index would use a mergeFactor of 2. After we hit this error, I rebuilt the spell indexes with the new code replicated them to the slave, restarted Solr, and all has been well. We ran the load testing for more than an hour and the issue hasn't returned. Could the old spell indexes that were created using the high mergeFactor cause an issue like this somehow? Could the opening and closing of searchers so fast cause this? I don't have the slightest idea. All of our search queries hit the slave, and the master just handles updates. The master had no issues through all of this. Caused by: java.io.IOException: cannot read directory org.apache.lucene.store.FSDirectory@/home/dsteiger/local/solr/cores/qaa/data/spell: list() returned null at org.apache.lucene.index.SegmentInfos.getCurrentSegmentGeneration(SegmentInfos.java:115) at org.apache.lucene.index.IndexReader.indexExists(IndexReader.java:506) at org.apache.lucene.search.spell.SpellChecker.setSpellIndex(SpellChecker.java:102) at org.apache.lucene.search.spell.SpellChecker.(SpellChecker.java:89) And this happened I believe when running the snapinstaller (done through cron)... Caused by: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.FSDirectory@/home/dsteiger/local/solr/cores/qab/data/index: files: null at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:587) at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63) at org.apache.lucene.index.IndexReader.open(IndexReader.java:209) at org.apache.lucene.index.IndexReader.open(IndexReader.java:173) at org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:93) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:706) We're running r614955. Thanks. Doug
logging in 24hour time
Is there any way to get the logs to stderr/stdout to be in 24hour time? Thanks. Doug
Re: field collapsing
We're on r614955. On Wednesday 19 March 2008 11:33:36 am muddassir hasan wrote: > Hi Doug, > > Please let me know on which solr revision you applied patch. > > Thanks. > M. Hasan > > Doug Steigerwald <[EMAIL PROTECTED]> wrote: The latest > one won't apply to the trunk because it's too old. It hasn't been updated > to match changes made to Solr since mid-February. One of the things I know > has to change is that in CollapseComponent->prepare/process, the parameters > need to change to just accept a ResponseBuilder. > > Other than that, I'm not sure what will have to be changed. I'm not > planning on updating our Solr build until 1.3 is released. > > Doug > > muddassir hasan wrote: > > Hi, > > > > I have unsuccessfully tried to apply solr field collapsing patches > > available at > > > > https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.pl > >ugin.system.issuetabpanels:all-tabpanel > > > > One of the patch could be applied to trunk but it could not be compiled. > > > > Please let me know which of the available field collapsing patch could be > > applied to solr trunk or release 1.2.0. > > > > Thanks. > > > > M.Hasan > > > > > > - > > Now you can chat without downloading messenger. Click here to know how. > > - > Save all your chat conversations. Find them online. -- Doug
Re: field collapsing
The latest one won't apply to the trunk because it's too old. It hasn't been updated to match changes made to Solr since mid-February. One of the things I know has to change is that in CollapseComponent->prepare/process, the parameters need to change to just accept a ResponseBuilder. Other than that, I'm not sure what will have to be changed. I'm not planning on updating our Solr build until 1.3 is released. Doug muddassir hasan wrote: Hi, I have unsuccessfully tried to apply solr field collapsing patches available at https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel One of the patch could be applied to trunk but it could not be compiled. Please let me know which of the available field collapsing patch could be applied to solr trunk or release 1.2.0. Thanks. M.Hasan - Now you can chat without downloading messenger. Click here to know how.
Admin ping
Came in this morning to find some alerts that the admin interface has basically died. Everything was fine until about 4am. No updates or queries going on at that time (this is a QA machine). Anyone know why it might die like this? Solr 1.3 trunk build from Jan 23rd, 4GB heap size, 4x3.2GHz Xeon, 8GB RAM total, RHEL 5.1, 64bit. Mar 7, 2008 5:42:46 AM org.apache.solr.common.SolrException log SEVERE: org.apache.jasper.JasperException: PWC6117: File "/admin/ping.jsp" not found at org.apache.jasper.compiler.DefaultErrorHandler.jspError(DefaultErrorHandler.java:60) at org.apache.jasper.compiler.ErrorDispatcher.dispatch(ErrorDispatcher.java:346) at org.apache.jasper.compiler.ErrorDispatcher.jspError(ErrorDispatcher.java:140) at org.apache.jasper.compiler.JspUtil.getInputStream(JspUtil.java:881) at org.apache.jasper.xmlparser.XMLEncodingDetector.getEncoding(XMLEncodingDetector.java:114) at org.apache.jasper.compiler.ParserController.determineSyntaxAndEncoding(ParserController.java:347) at org.apache.jasper.compiler.ParserController.doParse(ParserController.java:181) at org.apache.jasper.compiler.ParserController.parse(ParserController.java:111) at org.apache.jasper.compiler.Compiler.generateJava(Compiler.java:169) at org.apache.jasper.compiler.Compiler.compile(Compiler.java:387) at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:579) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:344) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) This happened a few weeks ago, but someone just restarted Solr to get the admin interface back. They said that updates and queries were still working fine. Thanks. Doug
Re: JSONRequestWriter
Looks like it's only happening when we use the LocalSolrQueryComponent from localsolr. rsp.add("response", sdoclist); sdoclist is a SolrDocumentList. Could that be causing an issue instead of it being just a DocList? Doug Yonik Seeley wrote: The output you showed is indeed incorrect, but I can't reproduce that with stock solr. Here is a example of what I get: { 'responseHeader'=>{ 'status'=>0, 'QTime'=>16, 'params'=>{ 'wt'=>'ruby', 'indent'=>'true', 'q'=>'*:*', 'facet'=>'true', 'highlight'=>'true'}}, 'response'=>{'numFound'=>0,'start'=>0,'docs'=>[] }, 'facet_counts'=>{ 'facet_queries'=>{}, 'facet_fields'=>{}, 'facet_dates'=>{}}} -Yonik On Wed, Mar 5, 2008 at 12:00 PM, Doug Steigerwald <[EMAIL PROTECTED]> wrote: Sure. The default (json.nl=flat): 'response',{'numFound'=>41,'start'=>0, Adding json.nl=map makes output correct: 'response'=>{'numFound'=>41,'start'=>0, This also changes facet output (which was evaluating fine): FLAT: 'facet_counts',{ 'facet_queries'=>{}, 'facet_fields'=>{ 'movies_movie_genre_facet'=>[ 'Drama',22, 'Action/Adventure',11, 'Comedy',11, 'Suspense/Thriller',11, 'SciFi/Fantasy',5, 'Animation',4, 'Documentary',4, 'Family',3, 'Horror',3, 'Musical',2, 'Romance',2, 'Concert',1, 'War',1]}, 'facet_dates'=>{}} MAP: 'facet_counts'=>{ 'facet_queries'=>{}, 'facet_fields'=>{ 'movies_movie_genre_facet'=>{ 'Drama'=>22, 'Action/Adventure'=>11, 'Comedy'=>11, 'Suspense/Thriller'=>11, 'SciFi/Fantasy'=>5, 'Animation'=>4, 'Documentary'=>4, 'Family'=>3, 'Horror'=>3, 'Musical'=>2, 'Romance'=>2, 'Concert'=>1, 'War'=>1}}, 'facet_dates'=>{}} Doug Yonik Seeley wrote: > On Wed, Mar 5, 2008 at 11:25 AM, Doug Steigerwald > <[EMAIL PROTECTED]> wrote: >> If you don't add the json.nl=map to your params, then you can't eval() what you get back in Ruby >> ("can't convert String into Integer"). > > Can you show what the problematic ruby output is? > > json.nl=map isn't the default because some things need to be ordered, > and eval of a map in python & ruby looses that order. > > -Yonik
Re: JSONRequestWriter
Sure. The default (json.nl=flat): 'response',{'numFound'=>41,'start'=>0, Adding json.nl=map makes output correct: 'response'=>{'numFound'=>41,'start'=>0, This also changes facet output (which was evaluating fine): FLAT: 'facet_counts',{ 'facet_queries'=>{}, 'facet_fields'=>{ 'movies_movie_genre_facet'=>[ 'Drama',22, 'Action/Adventure',11, 'Comedy',11, 'Suspense/Thriller',11, 'SciFi/Fantasy',5, 'Animation',4, 'Documentary',4, 'Family',3, 'Horror',3, 'Musical',2, 'Romance',2, 'Concert',1, 'War',1]}, 'facet_dates'=>{}} MAP: 'facet_counts'=>{ 'facet_queries'=>{}, 'facet_fields'=>{ 'movies_movie_genre_facet'=>{ 'Drama'=>22, 'Action/Adventure'=>11, 'Comedy'=>11, 'Suspense/Thriller'=>11, 'SciFi/Fantasy'=>5, 'Animation'=>4, 'Documentary'=>4, 'Family'=>3, 'Horror'=>3, 'Musical'=>2, 'Romance'=>2, 'Concert'=>1, 'War'=>1}}, 'facet_dates'=>{}} Doug Yonik Seeley wrote: On Wed, Mar 5, 2008 at 11:25 AM, Doug Steigerwald <[EMAIL PROTECTED]> wrote: If you don't add the json.nl=map to your params, then you can't eval() what you get back in Ruby ("can't convert String into Integer"). Can you show what the problematic ruby output is? json.nl=map isn't the default because some things need to be ordered, and eval of a map in python & ruby looses that order. -Yonik
Re: JSONRequestWriter
Note that we now have to add a default param to the requestHandler: explicit map collapse localsolr facet If you don't add the json.nl=map to your params, then you can't eval() what you get back in Ruby ("can't convert String into Integer"). Not sure if this can be put into the RubyResponseWriter as a default. Also not sure if this an issue with the python writer either (since I don't use python). Doug Yonik Seeley wrote: Thanks Doug, I just checked in your fix. This was a recent bug... writing of SolrDocument was recently added and is not touched by normal code paths, except for distributed search. -Yonik On Wed, Mar 5, 2008 at 9:29 AM, Doug Steigerwald <[EMAIL PROTECTED]> wrote: We're using localsolr and the RubyResponseWriter. When we do a request with the localsolr component in our requestHandler we're seeing issues with the display of a multivalued field when it only has one value. 'class'=>['showtime']'showtime', <-- 'genre'=>['Drama', 'Suspsense/Triller'], With no localsolr component it works fine. Looks like the issue is with the JSONRequestWriter.writeSolrDocument(). Here's the small patch for it that seems to fix it. Index: src/java/org/apache/solr/request/JSONResponseWriter.java === --- src/java/org/apache/solr/request/JSONResponseWriter.java(revision 614955) +++ src/java/org/apache/solr/request/JSONResponseWriter.java(working copy) @@ -416,7 +416,7 @@ writeVal(fname, val); writeArrayCloser(); } -writeVal(fname, val); +else writeVal(fname, val); } if (pseudoFields !=null && pseudoFields.size()>0) { We're running solr trunk r614955 (Jan 23rd), and r75 of localsolr. Result snippet with the patch: 'class'=>['showtime'], 'genre'=>['Drama', 'Suspsense/Triller'], Has anyone come across an issue like this? Is this fixed in a newer build of Solr? It looks like we'd still need this patch even in a build of the solr trunk from yesterday, but maybe not. -- Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287
Re: JSONRequestWriter
Sweet. Thanks. Doug Yonik Seeley wrote: Thanks Doug, I just checked in your fix. This was a recent bug... writing of SolrDocument was recently added and is not touched by normal code paths, except for distributed search. -Yonik On Wed, Mar 5, 2008 at 9:29 AM, Doug Steigerwald <[EMAIL PROTECTED]> wrote: We're using localsolr and the RubyResponseWriter. When we do a request with the localsolr component in our requestHandler we're seeing issues with the display of a multivalued field when it only has one value. 'class'=>['showtime']'showtime', <-- 'genre'=>['Drama', 'Suspsense/Triller'], With no localsolr component it works fine. Looks like the issue is with the JSONRequestWriter.writeSolrDocument(). Here's the small patch for it that seems to fix it. Index: src/java/org/apache/solr/request/JSONResponseWriter.java === --- src/java/org/apache/solr/request/JSONResponseWriter.java(revision 614955) +++ src/java/org/apache/solr/request/JSONResponseWriter.java(working copy) @@ -416,7 +416,7 @@ writeVal(fname, val); writeArrayCloser(); } -writeVal(fname, val); +else writeVal(fname, val); } if (pseudoFields !=null && pseudoFields.size()>0) { We're running solr trunk r614955 (Jan 23rd), and r75 of localsolr. Result snippet with the patch: 'class'=>['showtime'], 'genre'=>['Drama', 'Suspsense/Triller'], Has anyone come across an issue like this? Is this fixed in a newer build of Solr? It looks like we'd still need this patch even in a build of the solr trunk from yesterday, but maybe not. -- Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287
JSONRequestWriter
We're using localsolr and the RubyResponseWriter. When we do a request with the localsolr component in our requestHandler we're seeing issues with the display of a multivalued field when it only has one value. 'class'=>['showtime']'showtime', <-- 'genre'=>['Drama', 'Suspsense/Triller'], With no localsolr component it works fine. Looks like the issue is with the JSONRequestWriter.writeSolrDocument(). Here's the small patch for it that seems to fix it. Index: src/java/org/apache/solr/request/JSONResponseWriter.java === --- src/java/org/apache/solr/request/JSONResponseWriter.java(revision 614955) +++ src/java/org/apache/solr/request/JSONResponseWriter.java(working copy) @@ -416,7 +416,7 @@ writeVal(fname, val); writeArrayCloser(); } -writeVal(fname, val); +else writeVal(fname, val); } if (pseudoFields !=null && pseudoFields.size()>0) { We're running solr trunk r614955 (Jan 23rd), and r75 of localsolr. Result snippet with the patch: 'class'=>['showtime'], 'genre'=>['Drama', 'Suspsense/Triller'], Has anyone come across an issue like this? Is this fixed in a newer build of Solr? It looks like we'd still need this patch even in a build of the solr trunk from yesterday, but maybe not. -- Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287
YAML update request handler
A few months back I wrote a YAML update request handler to see if we could post documents faster than with XMl. We did see some small speed improvements (didn't write down the numbers), but the hacked together code was probably making it slower as well. Not sure if there are faster YAML libraries out there either. We're not actually using it, since it was just a small proof of concept type of project, but is this anything people might be interested in? -- Doug Steigerwald
Re: Integrated Spellchecking
Posted our patches if anyone wants to take a look: https://issues.apache.org/jira/browse/SOLR-433 Small change to core.RunExecutableListener and all the changes to the shell scripts. All these scripts seem to run fine on RHEL-3 and RHEL-5.1 servers. doug Doug Steigerwald wrote: Sure. I'll try to post it today or tomorrow. Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287 Otis Gospodnetic wrote: Hey Doug, You have multicore/spellcheck replication going already? We have been working on the replication for multicore. Sounds like we are replicating each others work. When will you be able to attach your stuff to JIRA issue? https://issues.apache.org/jira/browse/SOLR-433 Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Doug Steigerwald <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, February 15, 2008 12:45:08 PM Subject: Re: Integrated Spellchecking That unfortunately got pushed aside to work on some of our higher priority solr work since we already had it working one way. Hoping to revisit this after we push to production and start working on new features and share what I've done for this and multicore/spellcheck replication (which we have working quite well in QA right now). Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287 oleg_gnatovskiy wrote: dsteiger wrote: I've got a couple search components for automatic spell correction that I've been working on. I've converted most of the SpellCheckerRequestHandler to a search component (hopefully will throw a patch out soon for this). Then another search component that will do auto correction for a query if the search returns zero results. We're hoping to see some performance improvements out of handling this in Solr instead of our Rails service. doug Ryan McKinley wrote: Yes -- this is what search components are for! Depending on where you put it in the chain, it could only return spell checked results if there are too few results (or the top score is below some threshold) ryan Grant Ingersoll wrote: Is it feasible to submit a query to any of the various handlers and have it bring back results and spelling suggestions all in one response? Is this something the query components piece would handle, assuming one exists for the spell checker? Thanks, Grant So have you succeeded in implementing this patch? I'd definitely like to use this functionality as a search suggestion.
Re: Integrated Spellchecking
Allocating some time to this next week. Need to try and remember what issues I was having when I stopped working on it. doug Matthew Runo wrote: I'd have to agree with this. I'd probably be able to put a bit of work into it as well, as it's something we'd use for sure if it were available. Thanks! Matthew Runo Software Developer Zappos.com 702.943.7833 On Feb 18, 2008, at 6:09 AM, Grant Ingersoll wrote: Hey Doug, If you have permission to donate, perhaps you can just post the patch anyway and state that it isn't quite ready to go. This is something I could use too, and so may have some cycles to work on it. I hate to replicate the work if you already have something that is more or less working. A half baked patch is better than no patch. -Grant On Feb 15, 2008, at 12:45 PM, Doug Steigerwald wrote: That unfortunately got pushed aside to work on some of our higher priority solr work since we already had it working one way. Hoping to revisit this after we push to production and start working on new features and share what I've done for this and multicore/spellcheck replication (which we have working quite well in QA right now). Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287 oleg_gnatovskiy wrote: dsteiger wrote: I've got a couple search components for automatic spell correction that I've been working on. I've converted most of the SpellCheckerRequestHandler to a search component (hopefully will throw a patch out soon for this). Then another search component that will do auto correction for a query if the search returns zero results. We're hoping to see some performance improvements out of handling this in Solr instead of our Rails service. doug Ryan McKinley wrote: Yes -- this is what search components are for! Depending on where you put it in the chain, it could only return spell checked results if there are too few results (or the top score is below some threshold) ryan Grant Ingersoll wrote: Is it feasible to submit a query to any of the various handlers and have it bring back results and spelling suggestions all in one response? Is this something the query components piece would handle, assuming one exists for the spell checker? Thanks, Grant So have you succeeded in implementing this patch? I'd definitely like to use this functionality as a search suggestion.
Re: Integrated Spellchecking
Sure. I'll try to post it today or tomorrow. Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287 Otis Gospodnetic wrote: Hey Doug, You have multicore/spellcheck replication going already? We have been working on the replication for multicore. Sounds like we are replicating each others work. When will you be able to attach your stuff to JIRA issue? https://issues.apache.org/jira/browse/SOLR-433 Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Doug Steigerwald <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, February 15, 2008 12:45:08 PM Subject: Re: Integrated Spellchecking That unfortunately got pushed aside to work on some of our higher priority solr work since we already had it working one way. Hoping to revisit this after we push to production and start working on new features and share what I've done for this and multicore/spellcheck replication (which we have working quite well in QA right now). Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287 oleg_gnatovskiy wrote: dsteiger wrote: I've got a couple search components for automatic spell correction that I've been working on. I've converted most of the SpellCheckerRequestHandler to a search component (hopefully will throw a patch out soon for this). Then another search component that will do auto correction for a query if the search returns zero results. We're hoping to see some performance improvements out of handling this in Solr instead of our Rails service. doug Ryan McKinley wrote: Yes -- this is what search components are for! Depending on where you put it in the chain, it could only return spell checked results if there are too few results (or the top score is below some threshold) ryan Grant Ingersoll wrote: Is it feasible to submit a query to any of the various handlers and have it bring back results and spelling suggestions all in one response? Is this something the query components piece would handle, assuming one exists for the spell checker? Thanks, Grant So have you succeeded in implementing this patch? I'd definitely like to use this functionality as a search suggestion.
Re: Integrated Spellchecking
That unfortunately got pushed aside to work on some of our higher priority solr work since we already had it working one way. Hoping to revisit this after we push to production and start working on new features and share what I've done for this and multicore/spellcheck replication (which we have working quite well in QA right now). Doug Steigerwald Software Developer McClatchy Interactive [EMAIL PROTECTED] 919.861.1287 oleg_gnatovskiy wrote: dsteiger wrote: I've got a couple search components for automatic spell correction that I've been working on. I've converted most of the SpellCheckerRequestHandler to a search component (hopefully will throw a patch out soon for this). Then another search component that will do auto correction for a query if the search returns zero results. We're hoping to see some performance improvements out of handling this in Solr instead of our Rails service. doug Ryan McKinley wrote: Yes -- this is what search components are for! Depending on where you put it in the chain, it could only return spell checked results if there are too few results (or the top score is below some threshold) ryan Grant Ingersoll wrote: Is it feasible to submit a query to any of the various handlers and have it bring back results and spelling suggestions all in one response? Is this something the query components piece would handle, assuming one exists for the spell checker? Thanks, Grant So have you succeeded in implementing this patch? I'd definitely like to use this functionality as a search suggestion.
Re: DisMax and Search Components
We don't always want to use the dismax handler in our setup. Doug Yonik Seeley wrote: On Jan 21, 2008 9:06 PM, Doug Steigerwald <[EMAIL PROTECTED]> wrote: We've found a way to work around it. In our search components, we're doing something like: defType = defType == null ? DisMaxQParserPlugin.NAME : defType; Would it be easier to just add it as a default parameter in the request handler? -Yonik
Re: DisMax and Search Components
We've found a way to work around it. In our search components, we're doing something like: defType = defType == null ? DisMaxQParserPlugin.NAME : defType; If you add &defType=dismax to the query string, it'll use the DisMaxQParserPlugin. Unfortunately, I haven't been able to figure out an easy way to access the config for the different defined disxmax handlers in the config, so on our service side (Rails app), we're going to have a configuration with all the params we need to pass (qf, pf, fl, etc) and send them based on parameters we have coming into the service that we use to figure out which dismax handler to use (uh, yeah, I think that sounds right). This may not be the best way to do it, but it will work fine for us until we can dedicate more time to it (we roll out Solr and our search service to QA next week). Doug Charles Hornberger wrote: On Jan 21, 2008 10:23 AM, Doug Steigerwald <[EMAIL PROTECTED]> wrote: Is there any support for DisMax (or any search request handlers) in search components, or is that something that still needs to be done? It seems like it isn't supported at the moment. I was curious about this, too ... If it *is* something that needs to be done, am happy to help w/ the coding. But I would need some advice/guidance up front -- I'm new enough to Solr that the design behind the SearchComponents refactoring is not immediately obvious to me, either from the Jira comments or the code itself. -Charlie
DisMax and Search Components
Is there any support for DisMax (or any search request handlers) in search components, or is that something that still needs to be done? It seems like it isn't supported at the moment. We want to be able to use a field collapsing component (https://issues.apache.org/jira/browse/SOLR-236), but still be able to use our DisMax handlers. Right now it's one or the other, and we -need- both. Thanks. doug
Re: Integrated Spellchecking
I've got a couple search components for automatic spell correction that I've been working on. I've converted most of the SpellCheckerRequestHandler to a search component (hopefully will throw a patch out soon for this). Then another search component that will do auto correction for a query if the search returns zero results. We're hoping to see some performance improvements out of handling this in Solr instead of our Rails service. doug Ryan McKinley wrote: Yes -- this is what search components are for! Depending on where you put it in the chain, it could only return spell checked results if there are too few results (or the top score is below some threshold) ryan Grant Ingersoll wrote: Is it feasible to submit a query to any of the various handlers and have it bring back results and spelling suggestions all in one response? Is this something the query components piece would handle, assuming one exists for the spell checker? Thanks, Grant
Re: Spell checker index rebuild
Seemed to be able to fix the below problem with the following patch in lucene-2.2. Going to try the lucene 2.3 branch. Index: contrib/spellchecker/src/java/org/apache/lucene/search/spell/SpellChecker.java === --- contrib/spellchecker/src/java/org/apache/lucene/search/spell/SpellChecker.java (revision 612882) +++ contrib/spellchecker/src/java/org/apache/lucene/search/spell/SpellChecker.java (working copy) @@ -285,7 +285,7 @@ */ public void clearIndex() throws IOException { IndexReader.unlock(spellIndex); -IndexWriter writer = new IndexWriter(spellIndex, null, true); +IndexWriter writer = new IndexWriter(spellIndex, null, false); writer.close(); } Now the IndexWriter won't create a new index every time you rebuild the spellchecker index. Didn't seem to have any issues with the small index I have. Only issue I have now is with a large index (not that large, 49k documents) I get keep getting errors like the one below when initially building an index (and every rebuild after that). This is with and without the patch above. SEVERE: java.io.FileNotFoundException: /home/dsteiger/local/solr/cores/dsteiger/data/spell/_66.fnm (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:212) at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:506) at org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:536) at org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:531) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:440) at org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:204) at org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:169) at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:155) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1970) at org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:1741) at org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:1733) at org.apache.lucene.index.IndexWriter.maybeFlushRamSegments(IndexWriter.java:1727) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1004) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:983) Any ideas? doug Doug Steigerwald wrote: It's in the index. Can see it with a query: q=word:blackjack And in luke: − 29 The actual index data seems to disappear. First rebuild: $ ls spell/ _2.cfs segments.gen segments_i Second rebuild: $ ls spell segments_2z segments.gen doug Otis Gospodnetic wrote: Do you trust the spellchecker 100% (not looking at its source now). I'd peek at the index with Luke (Luke I trust :)) and see if that term is really there first. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Doug Steigerwald <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday, January 16, 2008 2:56:35 PM Subject: Spell checker index rebuild Having another weird spell checker index issue. Starting off from a clean index and spell check index, I'll index everything in example/exampledocs. On the first rebuild of the spellchecker index using the query below says the word 'blackjack' exists in the spellchecker index. Great, no problems. Rebuild it again and the word 'blackjack' does not exist any more. http://localhost:8983/solr/core0/select?q=blackjack&qt=spellchecker&cmd=rebuild Any ideas? This is with a Solr trunk build from yesterday. doug
Re: Spell checker index rebuild
It's in the index. Can see it with a query: q=word:blackjack And in luke: − 29 The actual index data seems to disappear. First rebuild: $ ls spell/ _2.cfs segments.gen segments_i Second rebuild: $ ls spell segments_2z segments.gen doug Otis Gospodnetic wrote: Do you trust the spellchecker 100% (not looking at its source now). I'd peek at the index with Luke (Luke I trust :)) and see if that term is really there first. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message ---- From: Doug Steigerwald <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday, January 16, 2008 2:56:35 PM Subject: Spell checker index rebuild Having another weird spell checker index issue. Starting off from a clean index and spell check index, I'll index everything in example/exampledocs. On the first rebuild of the spellchecker index using the query below says the word 'blackjack' exists in the spellchecker index. Great, no problems. Rebuild it again and the word 'blackjack' does not exist any more. http://localhost:8983/solr/core0/select?q=blackjack&qt=spellchecker&cmd=rebuild Any ideas? This is with a Solr trunk build from yesterday. doug
Spell checker index rebuild
Having another weird spell checker index issue. Starting off from a clean index and spell check index, I'll index everything in example/exampledocs. On the first rebuild of the spellchecker index using the query below says the word 'blackjack' exists in the spellchecker index. Great, no problems. Rebuild it again and the word 'blackjack' does not exist any more. http://localhost:8983/solr/core0/select?q=blackjack&qt=spellchecker&cmd=rebuild Any ideas? This is with a Solr trunk build from yesterday. doug
Spellchecker index rebuild error
Lately I've been having issues with the spellchecker failing to properly rebuild my spell index. I used to be able to delete the spell directory and reload the core and build the index fine if it ever crapped out, but now I can't even build it. java.io.FileNotFoundException: /home/dsteiger/solr/data/spell/_8c.cfs (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:212) at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:506) at org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:536) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445) at org.apache.lucene.index.CompoundFileReader.(CompoundFileReader.java:70) at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:181) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:167) ... Here's the query: /solr/dsteiger/select/?q=test&qt=spellchecker&cmd=rebuild Here's my config snippet: 1 0.5 spell spell Anyone have any ideas? Doug
Re: Field collapsing
I finally took more than 30 minutes to try and apply the patch and got it to (mostly) work. Will try to submit it tomorrow for review if there's interest. Doug Ryan McKinley wrote: I think the last patch is pre QueryComponent infrastructure it needs to be transformed into a QueryComponent to work. I don't think anyone has tackled that yet... ryan Doug Steigerwald wrote: Modifying the patch to apply. StandardRequestHandler and DisMaxRequestHandler were changed a lot in mid-November and I've been having a hard time figuring out where the changes should be reapplied. Doug Grant Ingersoll wrote: Hi Doug, Is the problem in applying the patch or getting it to work once it is applied? -Grant On Jan 3, 2008, at 8:52 AM, Doug Steigerwald wrote: Being able to collapse multiple documents into one result with Solr is a big deal for us here. Has anyone been able to get field collapsing (http://issues.apache.org/jira/browse/SOLR-236) to patch to a recent checkout of Solr? I've been unsuccessful so far in trying to modify the latest patch to work. Thanks. Doug
Re: Field collapsing
Modifying the patch to apply. StandardRequestHandler and DisMaxRequestHandler were changed a lot in mid-November and I've been having a hard time figuring out where the changes should be reapplied. Doug Grant Ingersoll wrote: Hi Doug, Is the problem in applying the patch or getting it to work once it is applied? -Grant On Jan 3, 2008, at 8:52 AM, Doug Steigerwald wrote: Being able to collapse multiple documents into one result with Solr is a big deal for us here. Has anyone been able to get field collapsing (http://issues.apache.org/jira/browse/SOLR-236) to patch to a recent checkout of Solr? I've been unsuccessful so far in trying to modify the latest patch to work. Thanks. Doug
Field collapsing
Being able to collapse multiple documents into one result with Solr is a big deal for us here. Has anyone been able to get field collapsing (http://issues.apache.org/jira/browse/SOLR-236) to patch to a recent checkout of Solr? I've been unsuccessful so far in trying to modify the latest patch to work. Thanks. Doug
Create new core on the fly
Is it going to be possible (soon) to register new Solr cores on the fly? I know the LOAD action is yet to be implemented, but will that let you create new cores that are not listed in the multicore.xml? We're occasionally going to have to create new cores and would like to not have to stop/start Solr do to do this. We want to be able to create the core structure on the filesystem and register that core, or make changes to the multicore.xml file and tell Solr to reload the cores and pick up the new ones. Thanks. Doug
Re: spellchecker and multi-core index replication
Has anyone done any work on this? https://issues.apache.org/jira/browse/SOLR-433 Thanks. Doug Ryan McKinley wrote: OG: Yes, I think that makes sense - distribute everything for a given core, not just its index. And the spellchecker could then also have its data dir (and only index/ underneath really) and be replicated in the same fashion. Right? Yes, that was my thought. If an arbitrary directory could be distributed, then you could have /path/to/dist/index/... /path/to/dist/spelling-index/... /path/to/dist/foo and that would all get put into a snapshot. This would also let you put multiple cores within a single distribution: /path/to/dist/core0/index/... /path/to/dist/core0/spelling-index/... /path/to/dist/core0/foo /path/to/dist/core1/index/... /path/to/dist/core1/spelling-index/... /path/to/dist/core1/foo ryan
Re: Continue posting after an error
Thanks. We're probably not going to be sending huge batches of documents very often, so I'll just try a persistent connection and hopefully performance won't be an issue. With our document size, I was posting around 300+ docs/s, so anything reasonably close to that will be good. Historically we've been processing 335k document updates per hour, so we're way under the max docs/s we've seen with Solr. Doug Chris Hostetter wrote: : Sometimes there's a field that shouldn't be multiValued, but the data comes in : with multiple fields of the same name in a single document. : : Is there any way to continue processing other documents in a file even if one : document errors out? It seems like whenever we hit one of these cases, it : stops processing the file completely. I believe you are correct, the UpdateRequestHandler aborts as soon as bad doc is found. It might be possible to make it skip bad docs and continue processing, but what mechanism could it use to report which doc had failed? not all schemas have uniqueKey fields, and even if they do - the uniqueKey field may have been the problem. This is one of the reasons why i personally recommend only sending one doc at a time -- if you use persistent HTTP connections, there really shouldn't be much performance differnece (and if there is, we can probably optimize that) -Hoss
Continue posting after an error
We often have data that isn't generated by us going into our search. Sometimes there's a field that shouldn't be multiValued, but the data comes in with multiple fields of the same name in a single document. Is there any way to continue processing other documents in a file even if one document errors out? It seems like whenever we hit one of these cases, it stops processing the file completely. Thanks. Doug
Geographic searching in solr
Not sure if this got through earlier, pine messed up... Has anyone implemented any sort of geographic searching for Solr? I've found Local Lucene (http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene.htm) by Patrick O'Leary and there is another project in his CVS called Local Solr (http://www.nsshutdown.com/viewcvs/viewcvs.cgi/localsolr/). I've gotten Local Solr and Local Lucene compiled, but trying to drop the plugin in the Solr lib folder and trying to define the custom FieldTypes in my scheme results in errors (see below). Has anyone gotten Local Lucene/Solr to work for geographic searches or implemented anything like this? I can't actually find any other plugins for Solr to look at and try to resolve my issues with Local Solr. Any help would be appreciated. I've tried this with Solr 1.2, and compiling Solr from the trunk. Java 1.6. Thanks. Doug ---error--- Sep 12, 2007 8:22:50 AM org.apache.solr.common.SolrException log SEVERE: java.lang.NoClassDefFoundError: org/apache/solr/schema/FieldType at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:620) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:260) at java.net.URLClassLoader.access$000(URLClassLoader.java:56) at java.net.URLClassLoader$1.run(URLClassLoader.java:195) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) ... ... ...