Re: A sorting question.
Hi, Erick! And thank you for answering! You always answer my questions, :-) Well, I´ll try to explain better, because the context is more complex. The original problem becomes from MoreLikeThis behaviour. As you probably know that Solr feature only suggests similar components by the first - and only - document returned from the original query. That is if you have a query that returns 5 documents (a query with five IDs with OR boolean clauses, like before) MoreLikeThis only returns similar documents for the first one. Thats very frustrating, and I tried to solve it partially - and not very efficiently. I´ve got an intermediate business logic that manages querys from the front-end and Solr architecture. This components defines and API of queries and pre and post processors to execute with it. The thing is that I want to return similar documents for, for example, five documents queried. Due to MoreLikeThis limitations I do this: 1. First MoreLikeThis query for the first document. I get all the similar documents. 2. Second MoreLikeThis query for the second document. I get all the similar documents. ... 5. Fith MoreLikeThis query for the fith document. I get all the similar documents. I have to notice that the order of the query is important. I mean that the first ID is the first ID because its more important that the second ID, etc. So now I have to merge the results but, hey! Imagine that you receive a sort by Date. You have to compose the final response with the merged similar documents and sort it by Date. Thats a problem, right? So I do the following: 1. Get first similar document ID from the first ID response. 2. Get the first similar document ID from the second ID response. ... 3. Get the first similar document ID from the fith ID response. 4. Get the second similar document ID from the first ID response. N. Get the N similar document ID from the fith ID response. The number of documents is not important. Imagine that you have a rows=20, so N=20 and you have and array of 20 similar components ordered correctly from most important to less important. Returning to the sorting problem, if you launch another and final query to Solr with q=(all the similar document IDs ordered) you can append the original sorting by Date, so the results can be sorted by Date, or by other field, or just without order... and that´s the problem! If you don´t indicate any order I hope that the documents will be returned with the similar documents IDs sorting: I mean from most important to less important, and you saw what Solr does: returns the documents response with score sort. Phew! And that´s all. Ehm... any suggestion? :-D Hehehe. Thank you so much! Luis Cappa.
Re: A sorting question.
Hi Luis, Do you mean q=id:(A^10+OR+B^9+OR+C^8+OR...) I'm not sure whether it woks but q=id:A^10+OR+id:B^9+OR+id:C^8+OR...) definitely does On Fri, Mar 2, 2012 at 1:13 PM, Luis Cappa Banda luisca...@gmail.comwrote: Hello! Just a brief question. I'm querying by my docs ids to retrieve the whole document data from them, and I would like to retrieve them in the same order as I queried. Example: *q*=id:(A+OR+B+OR+C+OR...) And I would like to get a response with a default order like: response: *docA*:{ } *docB*:{ } *docC*:{ } Etc. The default response get the documents in a different order, I supose that due to Solr internal score algorithm. The ids are not numeric, so there is no option to order them with a numeric logic. Any suggestion? Thanks a lot! Luis Cappa. -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: How can Solr do parallel query warming with firstSearcher and newSearcher?
Neil, Would you mind if I ask what particularly do you want to warm by these queries? Regards On Sat, Mar 3, 2012 at 12:37 AM, Neil Hooey nho...@gmail.com wrote: I'm trying to get Solr to run warming queries in parallel with listener events, but it always does them in sequence, pegging one CPU while calculating facet counts. Someone at Lucid Imagination suggested using multiple listenever event=firstSearcher tags, each with a single facet query in them, but those are still done in parallel. Is it possible to run warming queries in parallel, and if so, how? I'm aware that you could run an external script that forks, but I'd like to use Solr's native support for this if it exists. Examples that don't work: !-- runs in sequence: multiple facet queries in a single listener -- query listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lststr name=q*:*/strstr name=facet.fieldfield1/str/lst lststr name=q*:*/strstr name=facet.fieldfield2/str/lst lststr name=q*:*/strstr name=facet.fieldfield3/str/lst lststr name=q*:*/strstr name=facet.fieldfield4/str/lst /arr /listener /query !-- runs in sequence: queries distributed across separate listener tags -- query listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lststr name=q*:*/strstr name=facet.fieldfield1/str/lst /arr /listener listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lststr name=q*:*/strstr name=facet.fieldfield2/str/lst /arr /listener listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lststr name=q*:*/strstr name=facet.fieldfield3/str/lst /arr /listener listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lststr name=q*:*/strstr name=facet.fieldfield4/str/lst /arr /listener /query -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
nutch log
this is my nutch log after configured it for solr index: 2012-03-03 12:20:25,520 INFO solr.SolrMappingReader - source: content dest: content 2012-03-03 12:20:25,520 INFO solr.SolrMappingReader - source: site dest: site 2012-03-03 12:20:25,520 INFO solr.SolrMappingReader - source: title dest: title 2012-03-03 12:20:25,520 INFO solr.SolrMappingReader - source: host dest: host 2012-03-03 12:20:25,520 INFO solr.SolrMappingReader - source: segment dest: segment 2012-03-03 12:20:25,520 INFO solr.SolrMappingReader - source: boost dest: boost 2012-03-03 12:20:25,520 INFO solr.SolrMappingReader - source: digest dest: digest 2012-03-03 12:20:25,520 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 2012-03-03 12:20:25,520 INFO solr.SolrMappingReader - source: url dest: id 2012-03-03 12:20:25,520 INFO solr.SolrMappingReader - source: url dest: url 2012-03-03 12:20:25,707 INFO solr.SolrWriter - Adding 11 documents 2012-03-03 12:20:26,519 WARN mapred.LocalJobRunner - job_local_0019 org.apache.solr.common.SolrException: Internal Server Error Internal Server Error request: http://localhost:8983/solr/update?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) 2012-03-03 12:20:27,377 ERROR solr.SolrIndexer - java.io.IOException: Job failed! 2012-03-03 12:20:27,393 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2012-03-03 12:20:27 2012-03-03 12:20:27,393 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/ suggestions? thanks alessio
Re: nutch log
(12/03/03 20:32), alessio crisantemi wrote: this is my nutch log after configured it for solr index: : org.apache.solr.common.SolrException: Internal Server Error Internal Server Error request: http://localhost:8983/solr/update?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) : suggestions? thanks alessio Hi alessio, I have no ideas for nutch, but I think you can look for the cause of the internal server error in Solr log, not in nutch log. koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: nutch log
is true. this is the slr problem: mar 03, 2012 12:08:04 PM org.apache.solr.common.SolrException log Grave: org.apache.solr.common.SolrException: invalid boolean value: at org.apache.solr.common.util.StrUtils.parseBool(StrUtils.java:237) at org.apache.solr.common.util.DOMUtil.addToNamedList(DOMUtil.java:140) at org.apache.solr.common.util.DOMUtil.nodesToNamedList(DOMUtil.java:98) at org.apache.solr.common.util.DOMUtil.childNodesToNamedList(DOMUtil.java:88) at org.apache.solr.common.util.DOMUtil.addToNamedList(DOMUtil.java:142) at org.apache.solr.common.util.DOMUtil.nodesToNamedList(DOMUtil.java:98) at org.apache.solr.common.util.DOMUtil.childNodesToNamedList(DOMUtil.java:88) at org.apache.solr.core.PluginInfo.init(PluginInfo.java:54) at org.apache.solr.core.SolrConfig.readPluginInfos(SolrConfig.java:220) at org.apache.solr.core.SolrConfig.loadPluginInfo(SolrConfig.java:212) at org.apache.solr.core.SolrConfig.init(SolrConfig.java:184) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:134) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4624) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5281) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:866) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:842) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:649) at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1581) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) whats means? thanks a. Il giorno 03 marzo 2012 14:40, Koji Sekiguchi k...@r.email.ne.jp ha scritto: (12/03/03 20:32), alessio crisantemi wrote: this is my nutch log after configured it for solr index: : org.apache.solr.common.**SolrException: Internal Server Error Internal Server Error request: http://localhost:8983/solr/**update?wt=javabinversion=2http://localhost:8983/solr/update?wt=javabinversion=2 at org.apache.solr.client.solrj.**impl.CommonsHttpSolrServer.** request(CommonsHttpSolrServer.**java:430) : suggestions? thanks alessio Hi alessio, I have no ideas for nutch, but I think you can look for the cause of the internal server error in Solr log, not in nutch log. koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: nutch log
Looks like you have a bad value where a boolean is expected in your solrconfig.xml. On Sat, 3 Mar 2012 16:09:11 +0100, alessio crisantemi alessio.crisant...@gmail.com wrote: is true. this is the slr problem: mar 03, 2012 12:08:04 PM org.apache.solr.common.SolrException log Grave: org.apache.solr.common.SolrException: invalid boolean value: at org.apache.solr.common.util.StrUtils.parseBool(StrUtils.java:237) at org.apache.solr.common.util.DOMUtil.addToNamedList(DOMUtil.java:140) at org.apache.solr.common.util.DOMUtil.nodesToNamedList(DOMUtil.java:98) at org.apache.solr.common.util.DOMUtil.childNodesToNamedList(DOMUtil.java:88) at org.apache.solr.common.util.DOMUtil.addToNamedList(DOMUtil.java:142) at org.apache.solr.common.util.DOMUtil.nodesToNamedList(DOMUtil.java:98) at org.apache.solr.common.util.DOMUtil.childNodesToNamedList(DOMUtil.java:88) at org.apache.solr.core.PluginInfo.init(PluginInfo.java:54) at org.apache.solr.core.SolrConfig.readPluginInfos(SolrConfig.java:220) at org.apache.solr.core.SolrConfig.loadPluginInfo(SolrConfig.java:212) at org.apache.solr.core.SolrConfig.init(SolrConfig.java:184) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:134) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4624) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5281) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:866) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:842) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:649) at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1581) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) whats means? thanks a. Il giorno 03 marzo 2012 14:40, Koji Sekiguchi k...@r.email.ne.jp ha scritto: (12/03/03 20:32), alessio crisantemi wrote: this is my nutch log after configured it for solr index: : org.apache.solr.common.**SolrException: Internal Server Error Internal Server Error request: http://localhost:8983/solr/**update?wt=javabinversion=2http://localhost:8983/solr/update?wt=javabinversion=2 at org.apache.solr.client.solrj.**impl.CommonsHttpSolrServer.** request(CommonsHttpSolrServer.**java:430) : suggestions? thanks alessio Hi alessio, I have no ideas for nutch, but I think you can look for the cause of the internal server error in Solr log, not in nutch log. koji -- Query Log Visualizer for Apache Solr http://soleami.com/ --
Re: nutch log
(12/03/04 0:09), alessio crisantemi wrote: is true. this is the slr problem: mar 03, 2012 12:08:04 PM org.apache.solr.common.SolrException log Grave: org.apache.solr.common.SolrException: invalid boolean value: Solr said that there was an erroneous boolean value in your solrconfig.xml. Check the values of bool.../bool of your solr plugins in solrconfig.xml. Those should be one of true/false/on/off/... koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: nutch log
now, I solve the boolean problem. but my indexing don't works now also.. But this time, I don't have error in tomcat log and not error in nutch log. I see only this code on cygwin window: Exception in thread main org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/C:/temp/apache-nutch-1.4-bin/runtime/local/crawl/segments/20120303171628/parse_data at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149) at org.apache.nutch.crawl.Crawl.run(Crawl.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) why, in your opinion? thanks again alessio Il giorno 03 marzo 2012 16:43, Koji Sekiguchi k...@r.email.ne.jp ha scritto: (12/03/04 0:09), alessio crisantemi wrote: is true. this is the slr problem: mar 03, 2012 12:08:04 PM org.apache.solr.common.**SolrException log Grave: org.apache.solr.common.**SolrException: invalid boolean value: Solr said that there was an erroneous boolean value in your solrconfig.xml. Check the values of bool.../bool of your solr plugins in solrconfig.xml. Those should be one of true/false/on/off/... koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: How can Solr do parallel query warming with firstSearcher and newSearcher?
I need to have those queries trigger the generation of facet counts, which can take up to 5 minutes for all of them combined. If the facet counts aren't warmed, then the first query to ask for facet counts on a particular field will take several minutes to return results. On Sat, Mar 3, 2012 at 5:40 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Neil, Would you mind if I ask what particularly do you want to warm by these queries? Regards On Sat, Mar 3, 2012 at 12:37 AM, Neil Hooey nho...@gmail.com wrote: I'm trying to get Solr to run warming queries in parallel with listener events, but it always does them in sequence, pegging one CPU while calculating facet counts. Someone at Lucid Imagination suggested using multiple listenever event=firstSearcher tags, each with a single facet query in them, but those are still done in parallel. Is it possible to run warming queries in parallel, and if so, how? I'm aware that you could run an external script that forks, but I'd like to use Solr's native support for this if it exists. Examples that don't work: !-- runs in sequence: multiple facet queries in a single listener -- query listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lststr name=q*:*/strstr name=facet.fieldfield1/str/lst lststr name=q*:*/strstr name=facet.fieldfield2/str/lst lststr name=q*:*/strstr name=facet.fieldfield3/str/lst lststr name=q*:*/strstr name=facet.fieldfield4/str/lst /arr /listener /query !-- runs in sequence: queries distributed across separate listener tags -- query listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lststr name=q*:*/strstr name=facet.fieldfield1/str/lst /arr /listener listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lststr name=q*:*/strstr name=facet.fieldfield2/str/lst /arr /listener listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lststr name=q*:*/strstr name=facet.fieldfield3/str/lst /arr /listener listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lststr name=q*:*/strstr name=facet.fieldfield4/str/lst /arr /listener /query -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: nutch log
It is not solr error. Consult nutch/hadoop mailing list. koji -- Query Log Visualizer for Apache Solr http://soleami.com/ (12/03/04 2:38), alessio crisantemi wrote: now, I solve the boolean problem. but my indexing don't works now also.. But this time, I don't have error in tomcat log and not error in nutch log. I see only this code on cygwin window: Exception in thread main org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/C:/temp/apache-nutch-1.4-bin/runtime/local/crawl/segments/20120303171628/parse_data at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149) at org.apache.nutch.crawl.Crawl.run(Crawl.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) why, in your opinion? thanks again alessio Il giorno 03 marzo 2012 16:43, Koji Sekiguchik...@r.email.ne.jp ha scritto: (12/03/04 0:09), alessio crisantemi wrote: is true. this is the slr problem: mar 03, 2012 12:08:04 PM org.apache.solr.common.**SolrException log Grave: org.apache.solr.common.**SolrException: invalid boolean value: Solr said that there was an erroneous boolean value in your solrconfig.xml. Check the values ofbool.../bool of your solr plugins in solrconfig.xml. Those should be one of true/false/on/off/... koji -- Query Log Visualizer for Apache Solr http://soleami.com/