Exception when using File based and Index based SpellChecker
I am trying to use Filebased and index based spell checker and getting this exception All checkers need to use the same StringDistance. They work fine as expected individually but not together. Any pointers? -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/Exception-when-using-File-based-and-Index-based-SpellChecker-tp4078773.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Doc's FunctionQuery result field in my custom SearchComponent class ?
Eric , In freq:termfreq(product,'spider') , freq is alias for 'termfreq' function query so I could have that field with name 'freq' in document response. this is my code which I am using to get document object and there is no termfreq field in its fields collection. DocList docs = rb.getResults().docList; DocIterator iterator = docs.iterator(); int sumFreq = 0; String id = null; for (int i = 0; i docs.size(); i++) { try { int docId = iterator.nextDoc(); // Document doc = searcher.doc(docId, fieldSet); Document doc = searcher.doc(docId); Thanks, Tony On Wed, Jul 17, 2013 at 5:30 PM, Erick Erickson erickerick...@gmail.comwrote: Where are you getting the syntax freq:termfreq(product,'spider') ? Try just termfreq(product,'spider') you'll get an element in the doc labeled 'termfreq', at least I do. Best Erick On Tue, Jul 16, 2013 at 1:03 PM, Tony Mullins tonymullins...@gmail.com wrote: OK, So thats why I cannot see the FunctionQuery fields in my SearchComponent class. So then question would be how can I apply my custom processing/logic to these FunctionQuery ? Whats the ExtensionPoint in Solr for such scenarios ? Basically I want to call termfreq() for each document and then apply the sum to all doc's termfreq() results and show in one aggregated TermFreq field in my query response. Thanks. Tony On Tue, Jul 16, 2013 at 6:01 PM, Jack Krupansky j...@basetechnology.com wrote: Basically, the evaluation of function queries in the fl parameter occurs when the response writer is composing the document results. That's AFTER all of the search components are done. SolrReturnFields.**getTransformer() gets the DocTransformer, which is really a DocTransformers, and then a call to DocTransformers.transform() in each response writer will evaluate the embedded function queries and insert their values in the results as they are being written. -- Jack Krupansky -Original Message- From: Tony Mullins Sent: Tuesday, July 16, 2013 1:37 AM To: solr-user@lucene.apache.org Subject: Re: Doc's FunctionQuery result field in my custom SearchComponent class ? No sorry, I am still not getting the termfreq() field in my 'doc' object. I do get the _version_ field in my 'doc' object which I think is realValue=StoredField. At which point termfreq() or any other FunctionQuery field becomes the part of doc object in Solr ? And at that point can I perform some custom logic and append the response ? Thanks. Tony On Tue, Jul 16, 2013 at 1:34 AM, Patanachai Tangchaisin patanachai.tangchaisin@**wizecommerce.com patanachai.tangchai...@wizecommerce.com wrote: Hi, I think the process of retrieving a stored field (through fl) is happens after SearchComponent. One solution: If you wrap a q params with function your score will be a result of the function. For example, http://localhost:8080/solr/collection2/demoendpoint?q=** http://localhost:8080/solr/**collection2/demoendpoint?q=** termfreq%28product,%27spider%27%29wt=xmlindent=truefl=***,**score http://localhost:**8080/solr/collection2/**demoendpoint?q=termfreq%** 28product,%27spider%27%29wt=**xmlindent=truefl=*,score http://localhost:8080/solr/collection2/demoendpoint?q=termfreq%28product,%27spider%27%29wt=xmlindent=truefl=*,score Now your score is going to be a result of termfreq(product,'spider') -- Patanachai Tangchaisin On 07/15/2013 12:01 PM, Tony Mullins wrote: any help plz !!! On Mon, Jul 15, 2013 at 4:13 PM, Tony Mullins tonymullins...@gmail.com * *wrote: Please any help on how to get the value of 'freq' field in my custom SearchComponent ? http://localhost:8080/solr/collection2/demoendpoint?q=** http://localhost:8080/solr/**collection2/demoendpoint?q=** spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%** 27spider%27%29http://**localhost:8080/solr/** collection2/demoendpoint?q=**spiderwt=xmlindent=truefl=*** ,freq:termfreq%28product,%**27spider%27%29 http://localhost:8080/solr/collection2/demoendpoint?q=spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%27spider%27%29 docstr name=id11/strstr name=typeVideo Games/strstr name=formatxbox 360/strstr name=productThe Amazing Spider-Man/strint name=popularity11/intlong name=_version_1439994081345273856/longint name=freq1/int/doc Here is my code DocList docs = rb.getResults().docList; DocIterator iterator = docs.iterator(); int sumFreq = 0; String id = null; for (int i = 0; i docs.size(); i++) { try { int docId = iterator.nextDoc(); // Document doc = searcher.doc(docId, fieldSet); Document doc = searcher.doc(docId); In doc object I can
Configuring Tomcat 6 with Solr431 with multiple cores
Thanks to Sandeep in this post: http://lucene.472066.n3.nabble.com/HTTP-Status-503-Server-is-shutting-down-td4065958.html#a4078567 I was able to setup Tomcat 6 with Solr 431. However, I need a multicore implementation and am now stuck on how to do so. Here is what I did based on Sandeeps recommended steps so far and what I need: 1. Extract solr431 package. In my case I did in E:\solr-4.3.1\example\solr Peter's path: C:\Dropbox\Databases\solr-4.3.1\example\solr 2. Now copied solr dir from extracted package (E:\solr-4.3.1\example\solr) into TOMCAT_HOME dir. In my case TOMCAT_HOME dir is pointed to E:\Apache\Tomcat 6.0. 3. I can refer now SOLR_HOME as E:\Apache\Tomcat 6.0\solr (please remember this) Peter's path: C:\Program Files\Apache Software Foundation\Tomcat 6.0\solr 4. Copy the solr.war file from extracted package to SOLR HOME dir i.e E:\Apache\Tomcat 6.0\solr. This is required to create the context. As I donot want to pass this as JAVA OPTS 5. Create solr1.xml file into TOMCAT_HOME\conf\Catalina\localhost (I gave file name as solr1.xml ) ?xml version=1.0 encoding=utf-8? Context docBase=C:\Program Files\Apache Software Foundation\Tomcat 6.0\solr\solr-4.3.1.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=C:\Program Files\Apache Software Foundation\Tomcat 6.0\solr override=true/ /Context 6. Also copy solr.war file into TOMCAT_HOME\webapps for deployment purpose 7. If you start tomcat you will get errors as mentioned by Shawn. S0 you need to copy all the 5 jar files from solr extracted package ( E:\solr-4.3.1\example\lib\ext ) to TOMCAT_HOME\lib dir.(jul-to-slf4j-1.6.6, jcl-over-slf4j-1.6.6, slf4j-log4j12-1.6.6, slf4j-api-1.6.6,log4j-1.2.16) 8. Also copy the log4js.properties file from E:\solr-4.3.1\example\resources dir to TOMCAT_HOME\lib dir. 9. Now if you start the tomcat you wont having any problem. So far Sandeeps steps. I can now approach http://localhost:8080/solr-4.3.1/#/ Now, what I will be requiring after I've completed the basic setup of Tomcat6 and Solr431 I want to migrate my Solr350 (now running on Cygwin) cores to that environment. C:\Dropbox\Databases\apache-solr-3.5.0\example\example-DIH\solr\tt C:\Dropbox\Databases\apache-solr-3.5.0\example\example-DIH\solr\shop C:\Dropbox\Databases\apache-solr-3.5.0\example\example-DIH\solr\homes Where do I need to copy the above cores for this all to work? To C:\Program Files\Apache Software Foundation\Tomcat 6.0\solr? And how can I then approach the data-import handler? I now do this like so: http://localhost:8983/solr/tt/dataimport?command=full-import Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Configuring-Tomcat-6-with-Solr431-with-multiple-cores-tp4078778.html Sent from the Solr - User mailing list archive at Nabble.com.
Inconsistent solrcloud search
Hi, I have a strange behavior while searching my solrcloud cluster: for a query like this http://localhost/solr/my_collection/select?q=my+query; http://10.1.1.193:7006/solr-madaptive/collection_mapi/select?q=%22Sairauden+sanoma%22 solr responses sometimes with one document and sometimes with no documents. This found document is located at shard8, so if I query with shards=shard8 then I always get this document, but if I query with shards=shard8,shard1 then about 50% of my requests return no documents at all. I tried it with solr 4.3.0 and also with 4.3.1. My cluster has 8 shards with 8 replicas with about 100M docs and default (compositeId) document routing.
boost docs if token matches happen in the first 5 words
I've a set of documents with a WhiteSpaceTokenize field. I want to give more boost when the match of the query happens in the first 3 token positions of the field. Is there any way to do that (don't want to use payloads as they mean on more seek to disk so lower performance) -- View this message in context: http://lucene.472066.n3.nabble.com/boost-docs-if-token-matches-happen-in-the-first-5-words-tp4078786.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: boost docs if token matches happen in the first 5 words
You must implement a SpanFirst query yourself. These are not implemented in any Solr query parser. You can easily expand the (e)dismax parsers and add support for it. -Original message- From:Anatoli Matuskova anatoli.matusk...@gmail.com Sent: Thursday 18th July 2013 11:54 To: solr-user@lucene.apache.org Subject: boost docs if token matches happen in the first 5 words I've a set of documents with a WhiteSpaceTokenize field. I want to give more boost when the match of the query happens in the first 3 token positions of the field. Is there any way to do that (don't want to use payloads as they mean on more seek to disk so lower performance) -- View this message in context: http://lucene.472066.n3.nabble.com/boost-docs-if-token-matches-happen-in-the-first-5-words-tp4078786.html Sent from the Solr - User mailing list archive at Nabble.com.
Custom RequestHandlerBase XML Response Issue
Hi all I am using a Custom RequestHandlerBase where I am querying from multiple different Solr instance and aggregating their output as a XML Document using DOM, now in the RequestHandler's function handleRequestBody(SolrQueryRequest req, SolrQueryResponse resp) I want to output this XML Document to the user as a response, but if I write it as a Document or Node by For Document response.add(grouped, domResult); or response.add(grouped, domNode); its writing to the user For Document com.sun.org.apache.xerces.internal.dom.DocumentImpl:[#document: null] or For Node com.sun.org.apache.xerces.internal.dom.ElementImpl:[arr: null] Even when the Document is present, because when I convert the Document to String its coming perfectly, but I don't want it as a String rather I want it in a XML format. Please this is very urgent, has anybody worked on this! Regards Vineet
RE: boost docs if token matches happen in the first 5 words
Thanks for the quick answer Markus. Could you give me a a guideline or point me where to check in the solr source code to see how to get it done? -- View this message in context: http://lucene.472066.n3.nabble.com/boost-docs-if-token-matches-happen-in-the-first-5-words-tp4078786p4078792.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom RequestHandlerBase XML Response Issue
This isn't a Solr issue. Maybe ask on the xerces list? On Thu, Jul 18, 2013 at 3:31 PM, Vineet Mishra clearmido...@gmail.comwrote: Hi all I am using a Custom RequestHandlerBase where I am querying from multiple different Solr instance and aggregating their output as a XML Document using DOM, now in the RequestHandler's function handleRequestBody(SolrQueryRequest req, SolrQueryResponse resp) I want to output this XML Document to the user as a response, but if I write it as a Document or Node by For Document response.add(grouped, domResult); or response.add(grouped, domNode); its writing to the user For Document com.sun.org.apache.xerces.internal.dom.DocumentImpl:[#document: null] or For Node com.sun.org.apache.xerces.internal.dom.ElementImpl:[arr: null] Even when the Document is present, because when I convert the Document to String its coming perfectly, but I don't want it as a String rather I want it in a XML format. Please this is very urgent, has anybody worked on this! Regards Vineet -- Regards, Shalin Shekhar Mangar.
Re: Custom RequestHandlerBase XML Response Issue
Thanks for your response Shalin, so does that mean that we can't return a XML object in SolrQueryResponse through Custom RequestHandler? On Thu, Jul 18, 2013 at 4:04 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: This isn't a Solr issue. Maybe ask on the xerces list? On Thu, Jul 18, 2013 at 3:31 PM, Vineet Mishra clearmido...@gmail.com wrote: Hi all I am using a Custom RequestHandlerBase where I am querying from multiple different Solr instance and aggregating their output as a XML Document using DOM, now in the RequestHandler's function handleRequestBody(SolrQueryRequest req, SolrQueryResponse resp) I want to output this XML Document to the user as a response, but if I write it as a Document or Node by For Document response.add(grouped, domResult); or response.add(grouped, domNode); its writing to the user For Document com.sun.org.apache.xerces.internal.dom.DocumentImpl:[#document: null] or For Node com.sun.org.apache.xerces.internal.dom.ElementImpl:[arr: null] Even when the Document is present, because when I convert the Document to String its coming perfectly, but I don't want it as a String rather I want it in a XML format. Please this is very urgent, has anybody worked on this! Regards Vineet -- Regards, Shalin Shekhar Mangar.
RE: boost docs if token matches happen in the first 5 words
You'll need the import org.apache.lucene.search.spans package in Solr's ExtendedDismaxQParserPlugin and add SpanFirstQuery's to the main query. Something like: query.add(new SpanFirstQuery(new SpanTermQuery(field, clause), distance), BooleanClause.Occur.SHOULD); -Original message- From:Anatoli Matuskova anatoli.matusk...@gmail.com Sent: Thursday 18th July 2013 12:33 To: solr-user@lucene.apache.org Subject: RE: boost docs if token matches happen in the first 5 words Thanks for the quick answer Markus. Could you give me a a guideline or point me where to check in the solr source code to see how to get it done? -- View this message in context: http://lucene.472066.n3.nabble.com/boost-docs-if-token-matches-happen-in-the-first-5-words-tp4078786p4078792.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom RequestHandlerBase XML Response Issue
Solr's response writers support only a few known types. Look at the writeVal method in TextResponseWriter: https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/TextResponseWriter.java On Thu, Jul 18, 2013 at 4:08 PM, Vineet Mishra clearmido...@gmail.comwrote: Thanks for your response Shalin, so does that mean that we can't return a XML object in SolrQueryResponse through Custom RequestHandler? On Thu, Jul 18, 2013 at 4:04 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: This isn't a Solr issue. Maybe ask on the xerces list? On Thu, Jul 18, 2013 at 3:31 PM, Vineet Mishra clearmido...@gmail.com wrote: Hi all I am using a Custom RequestHandlerBase where I am querying from multiple different Solr instance and aggregating their output as a XML Document using DOM, now in the RequestHandler's function handleRequestBody(SolrQueryRequest req, SolrQueryResponse resp) I want to output this XML Document to the user as a response, but if I write it as a Document or Node by For Document response.add(grouped, domResult); or response.add(grouped, domNode); its writing to the user For Document com.sun.org.apache.xerces.internal.dom.DocumentImpl:[#document: null] or For Node com.sun.org.apache.xerces.internal.dom.ElementImpl:[arr: null] Even when the Document is present, because when I convert the Document to String its coming perfectly, but I don't want it as a String rather I want it in a XML format. Please this is very urgent, has anybody worked on this! Regards Vineet -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: autoCommit and performance
Hi It totally depends upon your affordability. If you could afford go for bigger RAM, SSD drive and 64 Bit OS. Benchmark your application, with certain set of docs, how much RAM it takes, Indexing time, Search time etc. Increase the document count and perform benchmarking tasks again. This will provide more information. Everything is directly proportional to number of docs. In my case, I have basic hosting plan and i am happy with the performance. My point is you don't always need fancy hardware. Start with basic and based on the need you could change the plan. Regards Aditya www.findbestopensource.com On Wed, Jul 17, 2013 at 4:55 PM, Ayman Plaha aymanpl...@gmail.com wrote: Thanks Aditya, can I also please get some advice on hosting. - What *hosting specs* should I get ? How much RAM ? Considering my - client application is very simple that just register users to database and queries SOLR and displays SOLR results. - simple batch program adds the 1000 OR 2000 documents to SOLR every second. I'm hoping to deploy the code next week, if you guys can give me any other advice I'd really appreciate that. On Wed, Jul 17, 2013 at 7:07 PM, Aditya findbestopensou...@gmail.com wrote: Hi It will not affect the performance. We are doing this regularly. If you do optimize and search then there may be some impact. Regards Aditya www.findbestopensource.com On Wed, Jul 17, 2013 at 12:52 PM, Ayman Plaha aymanpl...@gmail.com wrote: Hey Guys, I've finally finished my Spring Java application that uses SOLR for searches and just had performance related question about SOLR. I'm indexing exactly 1000 *OR* 2000 records every second. Every record having 13 fields including 'id'. Majority of the fields are solr.StrField (no filters) with characters ranging from 5 - 50 in length and one field which is text_t (solr.TextField) which can be of length 100 characters to 2000 characters and has the following tokenizer and filters - PatternTokenizerFactory - LowerCaseFilterFactory - SynonymFilterFactory - SnowballPorterFilterFactory. I'm not using shards. I was hoping when searches get slow I will consider this or should I consider this now ? *Questions:* - I'm using SOLR autoCommit (every 15 minutes) with openSearcher set as true. I'm not using autoSoftCommit because instant availability of the documents for search is not necessary and I don't want to chew up too much memory because I'm consider Cloud hosting. *autoCommit **maxTime90/maxTime **openSearchertrue/openSearcher **/autoCommit *will this effect the query performance of the client website if the index grew to 10 million records ? I mean while the commit is happening does that *effect the performance of queries* and how will this effect the queries if the index grew to 10 million records ? - What *hosting specs* should I get ? How much RAM ? Considering my - client application is very simple that just register users to database and queries SOLR and displays SOLR results. - simple batch program adds the 1000 OR 2000 documents to SOLR every second. I'm hoping to deploy the code next week, if you guys can give me any other advice I'd really appreciate that. Thanks Ayman
Re: autoCommit and performance
Thanks Shawn and Aditya. Really appreciate your help. Based on your advice and reading the SolrPerformance article Shawn linked me to, I ended up getting Intel Dual Core (2 Core) i3 3220 3.3Ghz with 36GB RAM with 2 x 125GB SSD drives for 227$ per month. It's still expensive for me but I got it anyway because a very basic dedicated host in Australia is for 150$ per month. VPS in Australia don't offer more then 2GB. I hope I made the right decision. What do you guys think ? Thanks Ayman On Thu, Jul 18, 2013 at 9:07 PM, Aditya findbestopensou...@gmail.comwrote: Hi It totally depends upon your affordability. If you could afford go for bigger RAM, SSD drive and 64 Bit OS. Benchmark your application, with certain set of docs, how much RAM it takes, Indexing time, Search time etc. Increase the document count and perform benchmarking tasks again. This will provide more information. Everything is directly proportional to number of docs. In my case, I have basic hosting plan and i am happy with the performance. My point is you don't always need fancy hardware. Start with basic and based on the need you could change the plan. Regards Aditya www.findbestopensource.com On Wed, Jul 17, 2013 at 4:55 PM, Ayman Plaha aymanpl...@gmail.com wrote: Thanks Aditya, can I also please get some advice on hosting. - What *hosting specs* should I get ? How much RAM ? Considering my - client application is very simple that just register users to database and queries SOLR and displays SOLR results. - simple batch program adds the 1000 OR 2000 documents to SOLR every second. I'm hoping to deploy the code next week, if you guys can give me any other advice I'd really appreciate that. On Wed, Jul 17, 2013 at 7:07 PM, Aditya findbestopensou...@gmail.com wrote: Hi It will not affect the performance. We are doing this regularly. If you do optimize and search then there may be some impact. Regards Aditya www.findbestopensource.com On Wed, Jul 17, 2013 at 12:52 PM, Ayman Plaha aymanpl...@gmail.com wrote: Hey Guys, I've finally finished my Spring Java application that uses SOLR for searches and just had performance related question about SOLR. I'm indexing exactly 1000 *OR* 2000 records every second. Every record having 13 fields including 'id'. Majority of the fields are solr.StrField (no filters) with characters ranging from 5 - 50 in length and one field which is text_t (solr.TextField) which can be of length 100 characters to 2000 characters and has the following tokenizer and filters - PatternTokenizerFactory - LowerCaseFilterFactory - SynonymFilterFactory - SnowballPorterFilterFactory. I'm not using shards. I was hoping when searches get slow I will consider this or should I consider this now ? *Questions:* - I'm using SOLR autoCommit (every 15 minutes) with openSearcher set as true. I'm not using autoSoftCommit because instant availability of the documents for search is not necessary and I don't want to chew up too much memory because I'm consider Cloud hosting. *autoCommit **maxTime90/maxTime **openSearchertrue/openSearcher **/autoCommit *will this effect the query performance of the client website if the index grew to 10 million records ? I mean while the commit is happening does that *effect the performance of queries* and how will this effect the queries if the index grew to 10 million records ? - What *hosting specs* should I get ? How much RAM ? Considering my - client application is very simple that just register users to database and queries SOLR and displays SOLR results. - simple batch program adds the 1000 OR 2000 documents to SOLR every second. I'm hoping to deploy the code next week, if you guys can give me any other advice I'd really appreciate that. Thanks Ayman
Re: Doc's FunctionQuery result field in my custom SearchComponent class ?
As detailed in previous email, termfreq is not a field - it is a transformer or function. Technically, it is actually a ValueSource. If you look at the TextResponseWriter.writeVal method you can see you it kicks off the execution of transformers for writing documents. -- Jack Krupansky -Original Message- From: Tony Mullins Sent: Thursday, July 18, 2013 2:49 AM To: solr-user@lucene.apache.org Subject: Re: Doc's FunctionQuery result field in my custom SearchComponent class ? Eric , In freq:termfreq(product,'spider') , freq is alias for 'termfreq' function query so I could have that field with name 'freq' in document response. this is my code which I am using to get document object and there is no termfreq field in its fields collection. DocList docs = rb.getResults().docList; DocIterator iterator = docs.iterator(); int sumFreq = 0; String id = null; for (int i = 0; i docs.size(); i++) { try { int docId = iterator.nextDoc(); // Document doc = searcher.doc(docId, fieldSet); Document doc = searcher.doc(docId); Thanks, Tony On Wed, Jul 17, 2013 at 5:30 PM, Erick Erickson erickerick...@gmail.comwrote: Where are you getting the syntax freq:termfreq(product,'spider') ? Try just termfreq(product,'spider') you'll get an element in the doc labeled 'termfreq', at least I do. Best Erick On Tue, Jul 16, 2013 at 1:03 PM, Tony Mullins tonymullins...@gmail.com wrote: OK, So thats why I cannot see the FunctionQuery fields in my SearchComponent class. So then question would be how can I apply my custom processing/logic to these FunctionQuery ? Whats the ExtensionPoint in Solr for such scenarios ? Basically I want to call termfreq() for each document and then apply the sum to all doc's termfreq() results and show in one aggregated TermFreq field in my query response. Thanks. Tony On Tue, Jul 16, 2013 at 6:01 PM, Jack Krupansky j...@basetechnology.com wrote: Basically, the evaluation of function queries in the fl parameter occurs when the response writer is composing the document results. That's AFTER all of the search components are done. SolrReturnFields.**getTransformer() gets the DocTransformer, which is really a DocTransformers, and then a call to DocTransformers.transform() in each response writer will evaluate the embedded function queries and insert their values in the results as they are being written. -- Jack Krupansky -Original Message- From: Tony Mullins Sent: Tuesday, July 16, 2013 1:37 AM To: solr-user@lucene.apache.org Subject: Re: Doc's FunctionQuery result field in my custom SearchComponent class ? No sorry, I am still not getting the termfreq() field in my 'doc' object. I do get the _version_ field in my 'doc' object which I think is realValue=StoredField. At which point termfreq() or any other FunctionQuery field becomes the part of doc object in Solr ? And at that point can I perform some custom logic and append the response ? Thanks. Tony On Tue, Jul 16, 2013 at 1:34 AM, Patanachai Tangchaisin patanachai.tangchaisin@**wizecommerce.com patanachai.tangchai...@wizecommerce.com wrote: Hi, I think the process of retrieving a stored field (through fl) is happens after SearchComponent. One solution: If you wrap a q params with function your score will be a result of the function. For example, http://localhost:8080/solr/collection2/demoendpoint?q=** http://localhost:8080/solr/**collection2/demoendpoint?q=** termfreq%28product,%27spider%27%29wt=xmlindent=truefl=***,**score http://localhost:**8080/solr/collection2/**demoendpoint?q=termfreq%** 28product,%27spider%27%29wt=**xmlindent=truefl=*,score http://localhost:8080/solr/collection2/demoendpoint?q=termfreq%28product,%27spider%27%29wt=xmlindent=truefl=*,score Now your score is going to be a result of termfreq(product,'spider') -- Patanachai Tangchaisin On 07/15/2013 12:01 PM, Tony Mullins wrote: any help plz !!! On Mon, Jul 15, 2013 at 4:13 PM, Tony Mullins tonymullins...@gmail.com * *wrote: Please any help on how to get the value of 'freq' field in my custom SearchComponent ? http://localhost:8080/solr/collection2/demoendpoint?q=** http://localhost:8080/solr/**collection2/demoendpoint?q=** spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%** 27spider%27%29http://**localhost:8080/solr/** collection2/demoendpoint?q=**spiderwt=xmlindent=truefl=*** ,freq:termfreq%28product,%**27spider%27%29 http://localhost:8080/solr/collection2/demoendpoint?q=spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%27spider%27%29 docstr name=id11/strstr name=typeVideo Games/strstr name=formatxbox 360/strstr name=productThe Amazing Spider-Man/strint name=popularity11/intlong name=_version_1439994081345273856/longint name=freq1/int/doc Here is my code DocList docs = rb.getResults().docList; DocIterator
Re: Custom RequestHandlerBase XML Response Issue
But it seems it even have something called XML ResponseWriter https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/XMLResponseWriter.java Wont it be appropriate in my case? Although I have not implemented it yet but how come there couldn't be any way to make a SolrQueryResponse in XML format! On Thu, Jul 18, 2013 at 4:36 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Solr's response writers support only a few known types. Look at the writeVal method in TextResponseWriter: https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/TextResponseWriter.java On Thu, Jul 18, 2013 at 4:08 PM, Vineet Mishra clearmido...@gmail.com wrote: Thanks for your response Shalin, so does that mean that we can't return a XML object in SolrQueryResponse through Custom RequestHandler? On Thu, Jul 18, 2013 at 4:04 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: This isn't a Solr issue. Maybe ask on the xerces list? On Thu, Jul 18, 2013 at 3:31 PM, Vineet Mishra clearmido...@gmail.com wrote: Hi all I am using a Custom RequestHandlerBase where I am querying from multiple different Solr instance and aggregating their output as a XML Document using DOM, now in the RequestHandler's function handleRequestBody(SolrQueryRequest req, SolrQueryResponse resp) I want to output this XML Document to the user as a response, but if I write it as a Document or Node by For Document response.add(grouped, domResult); or response.add(grouped, domNode); its writing to the user For Document com.sun.org.apache.xerces.internal.dom.DocumentImpl:[#document: null] or For Node com.sun.org.apache.xerces.internal.dom.ElementImpl:[arr: null] Even when the Document is present, because when I convert the Document to String its coming perfectly, but I don't want it as a String rather I want it in a XML format. Please this is very urgent, has anybody worked on this! Regards Vineet -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: Custom RequestHandlerBase XML Response Issue
It would probably be better to integrate the responses (document lists.) Solr response writers do a lot of special processing of the response data, so you can't just throw random objects into the response. You may need to explain your use case a little more clearly. -- Jack Krupansky -Original Message- From: Vineet Mishra Sent: Thursday, July 18, 2013 8:41 AM To: solr-user@lucene.apache.org Subject: Re: Custom RequestHandlerBase XML Response Issue But it seems it even have something called XML ResponseWriter https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/XMLResponseWriter.java Wont it be appropriate in my case? Although I have not implemented it yet but how come there couldn't be any way to make a SolrQueryResponse in XML format! On Thu, Jul 18, 2013 at 4:36 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Solr's response writers support only a few known types. Look at the writeVal method in TextResponseWriter: https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/TextResponseWriter.java On Thu, Jul 18, 2013 at 4:08 PM, Vineet Mishra clearmido...@gmail.com wrote: Thanks for your response Shalin, so does that mean that we can't return a XML object in SolrQueryResponse through Custom RequestHandler? On Thu, Jul 18, 2013 at 4:04 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: This isn't a Solr issue. Maybe ask on the xerces list? On Thu, Jul 18, 2013 at 3:31 PM, Vineet Mishra clearmido...@gmail.com wrote: Hi all I am using a Custom RequestHandlerBase where I am querying from multiple different Solr instance and aggregating their output as a XML Document using DOM, now in the RequestHandler's function handleRequestBody(SolrQueryRequest req, SolrQueryResponse resp) I want to output this XML Document to the user as a response, but if I write it as a Document or Node by For Document response.add(grouped, domResult); or response.add(grouped, domNode); its writing to the user For Document com.sun.org.apache.xerces.internal.dom.DocumentImpl:[#document: null] or For Node com.sun.org.apache.xerces.internal.dom.ElementImpl:[arr: null] Even when the Document is present, because when I convert the Document to String its coming perfectly, but I don't want it as a String rather I want it in a XML format. Please this is very urgent, has anybody worked on this! Regards Vineet -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: Custom RequestHandlerBase XML Response Issue
So does that mean there is no way that we can write a XML or JSON object to the SolrQueryResponse and expect it to be formatted?
Re: Custom RequestHandlerBase XML Response Issue
Okay, let me explain. If you construct your combined response (why are you doing that again?) in the form a Solr NamedList or SolrDocumentList then the XMLResponseWriter (which btw uses TextResponseWriter) has no problem writing it down as XML. The problem here is that you are giving it an object (a DOM Document?) which it doesn't know how to serialize so it just calls .toString on it and writes it out. As long as you stick a known type into the SolrQueryResponse, you should be fine. On Thu, Jul 18, 2013 at 6:24 PM, Vineet Mishra clearmido...@gmail.comwrote: So does that mean there is no way that we can write a XML or JSON object to the SolrQueryResponse and expect it to be formatted? -- Regards, Shalin Shekhar Mangar.
Sort by document similarity counts
Hi, Is it possible to sort search results based on the count of similar documents a document has? Say we have a document A which has 4 other similar documents in the index and document B which has 10. Then the order solr returns them should be B, A. Sorting on moreLikeThis counts for each document would be an example of this (in my case I use ngram similarity detection from Tika). I have tried doing this via custom SearchComponent, where I can find all similar documents for each document in current search result, then add a new field into document hoping to use sort parameter (q=*sort=similarityCount). But this will not work because sort is done before handling my custom search component, if added via last-components. Can't add it via first-components, because then I will have no access to query results. And I do not want to override QueryComponent because I need to have all the functionality it covers: grouping, facets, etc. Thanks
Re: Custom RequestHandlerBase XML Response Issue
My case is like, I have got a few Solr Instances and querying them and getting their xml response, out of that xml I have to extract a group of specific xml nodes, later I am combining other solr's response into a single xml and making a DOM document out of it. So as you mentioned in your last mail, how can I prepare a combined response for this xml doc and even if I do I don't think it would work because the same I am doing in the RequstHandler. On Thu, Jul 18, 2013 at 6:30 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Okay, let me explain. If you construct your combined response (why are you doing that again?) in the form a Solr NamedList or SolrDocumentList then the XMLResponseWriter (which btw uses TextResponseWriter) has no problem writing it down as XML. The problem here is that you are giving it an object (a DOM Document?) which it doesn't know how to serialize so it just calls .toString on it and writes it out. As long as you stick a known type into the SolrQueryResponse, you should be fine. On Thu, Jul 18, 2013 at 6:24 PM, Vineet Mishra clearmido...@gmail.com wrote: So does that mean there is no way that we can write a XML or JSON object to the SolrQueryResponse and expect it to be formatted? -- Regards, Shalin Shekhar Mangar.
Re: Sort by document similarity counts
I have tried doing this via custom SearchComponent, where I can find all similar documents for each document in current search result, then add a new field into document hoping to use sort parameter (q=*sort=similarityCount). I don't understand this part very well, but: But this will not work because sort is done before handling my custom search component, if added via last-components. Can't add it via first-components, because then I will have no access to query results. And I do not want to override QueryComponent because I need to have all the functionality it covers: grouping, facets, etc. You may want to put your custom SearchComponent to last-component and inject SortSpec in your prepare() so that QueryComponent can sort the result complying with your SortSpec? koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: How can I learn the total count of how many documents indexed and how many documents updated?
Hi Shawn; This is what I see when I look at mbeans: lst name=UPDATEHANDLERlst name=updateHandlerstr name=classorg.apache.solr.update.DirectUpdateHandler2/strstr name=version1.0/strstr name=descriptionUpdate handler that efficiently directly updates the on-disk main lucene index/strstr name=src$URL$/str lst name=stats long name=commits41/long str name=autocommit maxTime15000ms/str int name=autocommits37/int int name=soft autocommits0/int long name=optimizes2/long long name=rollbacks0/long long name=expungeDeletes0/long long name=docsPending0/long long name=adds0/long long name=deletesById0/long long name=deletesByQuery0/long long name=errors0/long long name=cumulative_adds211453/long long name=cumulative_deletesById0/long long name=cumulative_deletesByQuery0/long long name=cumulative_errors0/long /lst/lst/lst I think that there is no information about what I look for? 2013/7/18 Shawn Heisey s...@elyograg.org On 7/17/2013 8:06 AM, Furkan KAMACI wrote: I have crawled some web pages and indexed them at my SolrCloud(Solr 4.2.1). However before I index them there was already some indexes. I can calculate the difference between current and previous document count. However it doesn't mean that I have indexed that count of documents. Because urls of websites are unique ids at my system. So it means that some of documents updated and they did not increased document count. My question is that: How can I learn the total count of how many documents indexed and how many documents updated? Look at the update handler statistics. Your application should record the numbers there, then you can check the handler statistics again and note the differences. Here's a URL that can give you those statistics. http://server:port/solr/mycollectionname/admin/mbeans?stats=true They are also available in the UI on the UPDATEHANDLER section of Plugins / Stats, but you can't really use that in a program. By setting the request handler path on a query object to /admin/mbeans and setting the stats parameter, you can get this information with SolrJ. Thanks, Shawn
RE: How can I learn the total count of how many documents indexed and how many documents updated?
Not your updateHandler, that only shows number about what it's doing and it can be restarted. Check your cores: host:port/solr/admin/cores -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Thursday 18th July 2013 15:46 To: solr-user@lucene.apache.org Subject: Re: How can I learn the total count of how many documents indexed and how many documents updated? Hi Shawn; This is what I see when I look at mbeans: lst name=UPDATEHANDLERlst name=updateHandlerstr name=classorg.apache.solr.update.DirectUpdateHandler2/strstr name=version1.0/strstr name=descriptionUpdate handler that efficiently directly updates the on-disk main lucene index/strstr name=src$URL$/str lst name=stats long name=commits41/long str name=autocommit maxTime15000ms/str int name=autocommits37/int int name=soft autocommits0/int long name=optimizes2/long long name=rollbacks0/long long name=expungeDeletes0/long long name=docsPending0/long long name=adds0/long long name=deletesById0/long long name=deletesByQuery0/long long name=errors0/long long name=cumulative_adds211453/long long name=cumulative_deletesById0/long long name=cumulative_deletesByQuery0/long long name=cumulative_errors0/long /lst/lst/lst I think that there is no information about what I look for? 2013/7/18 Shawn Heisey s...@elyograg.org On 7/17/2013 8:06 AM, Furkan KAMACI wrote: I have crawled some web pages and indexed them at my SolrCloud(Solr 4.2.1). However before I index them there was already some indexes. I can calculate the difference between current and previous document count. However it doesn't mean that I have indexed that count of documents. Because urls of websites are unique ids at my system. So it means that some of documents updated and they did not increased document count. My question is that: How can I learn the total count of how many documents indexed and how many documents updated? Look at the update handler statistics. Your application should record the numbers there, then you can check the handler statistics again and note the differences. Here's a URL that can give you those statistics. http://server:port/solr/mycollectionname/admin/mbeans?stats=true They are also available in the UI on the UPDATEHANDLER section of Plugins / Stats, but you can't really use that in a program. By setting the request handler path on a query object to /admin/mbeans and setting the stats parameter, you can get this information with SolrJ. Thanks, Shawn
Re: Custom RequestHandlerBase XML Response Issue
This sounds like a bad idea. You could have done this much simply inside your own application using libraries that you know well. That being said, instead of creating a DOM document, create a solr NamedList object which can be serialized by XMLResponseWriter. On Thu, Jul 18, 2013 at 6:48 PM, Vineet Mishra clearmido...@gmail.comwrote: My case is like, I have got a few Solr Instances and querying them and getting their xml response, out of that xml I have to extract a group of specific xml nodes, later I am combining other solr's response into a single xml and making a DOM document out of it. So as you mentioned in your last mail, how can I prepare a combined response for this xml doc and even if I do I don't think it would work because the same I am doing in the RequstHandler. On Thu, Jul 18, 2013 at 6:30 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Okay, let me explain. If you construct your combined response (why are you doing that again?) in the form a Solr NamedList or SolrDocumentList then the XMLResponseWriter (which btw uses TextResponseWriter) has no problem writing it down as XML. The problem here is that you are giving it an object (a DOM Document?) which it doesn't know how to serialize so it just calls .toString on it and writes it out. As long as you stick a known type into the SolrQueryResponse, you should be fine. On Thu, Jul 18, 2013 at 6:24 PM, Vineet Mishra clearmido...@gmail.com wrote: So does that mean there is no way that we can write a XML or JSON object to the SolrQueryResponse and expect it to be formatted? -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Getting a large number of documents by id
I have a situation which is common in our current use case, where I need to get a large number (many hundreds) of documents by id. What I'm doing currently is creating a large query of the form id:12345 OR id:23456 OR ... and sending it off. Unfortunately, this query is taking a long time, especially the first time it's executed. I'm seeing times of like 4+ seconds for this query to return, to get 847 documents. So, my question is: what should I be looking at to improve the performance here? Brian
Re: Clearing old nodes from zookeper without restarting solrcloud cluster
Hey andre, that isn't a possibility for us right now since we are terminating nodes using aws autoscaling policies. We'll have to either change our policies so that we can have some kind of graceful shutdown where we get the possibility to unload cores or update zookeeper's cluster state every once in a while to clear old offline nodes. Thanks for the help! On Wed, Jul 17, 2013 at 2:23 AM, Andre Bois-Crettez andre.b...@kelkoo.comwrote: Indeed we are using UNLOAD of cores before shutting down extra replica nodes, works well but already said, it needs such nodes to be up. Once UNLOADed it is possible to stop them, works well for our use case. But if nodes are already down, maybe it is possible to manually create and upload a cleaned /clusterstate.json to Zookeeper ? André On 07/16/2013 11:18 PM, Marcin Rzewucki wrote: Unloading a core is the known way to unregister a solr node in zookeeper (and not use for further querying). It works for me. If you didn't do that like this, unused nodes may remain in the cluster state and Solr may try to use them without a success. I'd suggest to start some machine with the old name, run solr, join the cluster for a while, unload a core to unregister it from the cluster and shutdown host at the end. This way you could have clear cluster state. On 16 July 2013 14:41, Luis Carlos Guerrero Covo lcguerreroc...@gmail.com**wrote: Thanks, I was actually asking about deleting nodes from the cluster state not cores, unless you can unload cores specific to an already offline node from zookeeper. -- André Bois-Crettez Search technology, Kelkoo http://www.kelkoo.com/ Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur. -- Luis Carlos Guerrero Covo M.S. Computer Engineering (57) 3183542047
Two-steps queries with different sorting criteria
Hi all, I need to execute a Solr query in two steps, executing in the first step a generic limited-results query ordered by relevance, and in the second step the ordering of the results of the first step according to a given sorting criterion (different from relevance). This two-steps query is meaningful when the query terms are so generic in such a way that the number of matched results exceeds the wanted number of results. In such circumstance, using single-step queries with different sorting criteria has a very confusing effect on the user experience, because at each change of sorting criterion the user gets different results even if the search query and the filtering conditions have not changed. On the contrary, using a two-steps query where the sorting order of the first step is always the relevance is more acceptable in case of large number of matched results because the result set would not change with the sorting criterion of the second step. I am wondering if such a two-steps query is achievable with a single Solr query, or if I am obliged to execute the sorting step of my two-steps query out of Solr (i.e.:in my application). Another possibility could be the development of a Solr plugin, but I am afraid of the possible effects on the performances. I am using Solr 3.4.0 Thanks in advance for your kind help. Fabio
Re: How can I learn the total count of how many documents indexed and how many documents updated?
Hi Markus; It doesn't give me how many documents updated from last commit. 2013/7/18 Markus Jelsma markus.jel...@openindex.io Not your updateHandler, that only shows number about what it's doing and it can be restarted. Check your cores: host:port/solr/admin/cores -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Thursday 18th July 2013 15:46 To: solr-user@lucene.apache.org Subject: Re: How can I learn the total count of how many documents indexed and how many documents updated? Hi Shawn; This is what I see when I look at mbeans: lst name=UPDATEHANDLERlst name=updateHandlerstr name=classorg.apache.solr.update.DirectUpdateHandler2/strstr name=version1.0/strstr name=descriptionUpdate handler that efficiently directly updates the on-disk main lucene index/strstr name=src$URL$/str lst name=stats long name=commits41/long str name=autocommit maxTime15000ms/str int name=autocommits37/int int name=soft autocommits0/int long name=optimizes2/long long name=rollbacks0/long long name=expungeDeletes0/long long name=docsPending0/long long name=adds0/long long name=deletesById0/long long name=deletesByQuery0/long long name=errors0/long long name=cumulative_adds211453/long long name=cumulative_deletesById0/long long name=cumulative_deletesByQuery0/long long name=cumulative_errors0/long /lst/lst/lst I think that there is no information about what I look for? 2013/7/18 Shawn Heisey s...@elyograg.org On 7/17/2013 8:06 AM, Furkan KAMACI wrote: I have crawled some web pages and indexed them at my SolrCloud(Solr 4.2.1). However before I index them there was already some indexes. I can calculate the difference between current and previous document count. However it doesn't mean that I have indexed that count of documents. Because urls of websites are unique ids at my system. So it means that some of documents updated and they did not increased document count. My question is that: How can I learn the total count of how many documents indexed and how many documents updated? Look at the update handler statistics. Your application should record the numbers there, then you can check the handler statistics again and note the differences. Here's a URL that can give you those statistics. http://server:port/solr/mycollectionname/admin/mbeans?stats=true They are also available in the UI on the UPDATEHANDLER section of Plugins / Stats, but you can't really use that in a program. By setting the request handler path on a query object to /admin/mbeans and setting the stats parameter, you can get this information with SolrJ. Thanks, Shawn
RE: How can I learn the total count of how many documents indexed and how many documents updated?
No nothing will. If you must know, you'll have to do it on the client side and make sure autocommit is disabled. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Thursday 18th July 2013 17:01 To: solr-user@lucene.apache.org Subject: Re: How can I learn the total count of how many documents indexed and how many documents updated? Hi Markus; It doesn't give me how many documents updated from last commit. 2013/7/18 Markus Jelsma markus.jel...@openindex.io Not your updateHandler, that only shows number about what it's doing and it can be restarted. Check your cores: host:port/solr/admin/cores -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Thursday 18th July 2013 15:46 To: solr-user@lucene.apache.org Subject: Re: How can I learn the total count of how many documents indexed and how many documents updated? Hi Shawn; This is what I see when I look at mbeans: lst name=UPDATEHANDLERlst name=updateHandlerstr name=classorg.apache.solr.update.DirectUpdateHandler2/strstr name=version1.0/strstr name=descriptionUpdate handler that efficiently directly updates the on-disk main lucene index/strstr name=src$URL$/str lst name=stats long name=commits41/long str name=autocommit maxTime15000ms/str int name=autocommits37/int int name=soft autocommits0/int long name=optimizes2/long long name=rollbacks0/long long name=expungeDeletes0/long long name=docsPending0/long long name=adds0/long long name=deletesById0/long long name=deletesByQuery0/long long name=errors0/long long name=cumulative_adds211453/long long name=cumulative_deletesById0/long long name=cumulative_deletesByQuery0/long long name=cumulative_errors0/long /lst/lst/lst I think that there is no information about what I look for? 2013/7/18 Shawn Heisey s...@elyograg.org On 7/17/2013 8:06 AM, Furkan KAMACI wrote: I have crawled some web pages and indexed them at my SolrCloud(Solr 4.2.1). However before I index them there was already some indexes. I can calculate the difference between current and previous document count. However it doesn't mean that I have indexed that count of documents. Because urls of websites are unique ids at my system. So it means that some of documents updated and they did not increased document count. My question is that: How can I learn the total count of how many documents indexed and how many documents updated? Look at the update handler statistics. Your application should record the numbers there, then you can check the handler statistics again and note the differences. Here's a URL that can give you those statistics. http://server:port/solr/mycollectionname/admin/mbeans?stats=true They are also available in the UI on the UPDATEHANDLER section of Plugins / Stats, but you can't really use that in a program. By setting the request handler path on a query object to /admin/mbeans and setting the stats parameter, you can get this information with SolrJ. Thanks, Shawn
Re: Getting a large number of documents by id
You could start from doing id:(12345 23456) to reduce the query length and possibly speed up parsing. You could also move the query from 'q' parameter to 'fq' parameter, since you probably don't care about ranking ('fq' does not rank). If these are unique every time, you could probably look at not caching (can't remember exact syntax). That's all I can think of at the moment without digging deep into why you need to do this at all. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Jul 18, 2013 at 10:46 AM, Brian Hurt bhur...@gmail.com wrote: I have a situation which is common in our current use case, where I need to get a large number (many hundreds) of documents by id. What I'm doing currently is creating a large query of the form id:12345 OR id:23456 OR ... and sending it off. Unfortunately, this query is taking a long time, especially the first time it's executed. I'm seeing times of like 4+ seconds for this query to return, to get 847 documents. So, my question is: what should I be looking at to improve the performance here? Brian
Re: Getting a large number of documents by id
Solr really isn't designed for that kind of use case. If it happens to work well for your particular situation, great, but don't complain when you are well outside the normal usage for a search engine (10, 20, 50, 100 results paged at a time, with modest sized query strings.) If you must get these 837 documents, do them in reasonable size batches, like 20, 50, or 100 at a time. That said, there may be something else going on here, since a query for 837 results should not take 4 seconds anyway. Check QTime - is it 4 seconds? Add debugQuery=true to your query and check the individual module times - which ones are the biggest hogs? Or, maybe it is none of them and the problem is elsewhere, like formatting the response, network problems, etc. Hmmm... I wonder if the new real-time Get API would be better for your case. It takes a comma-separated list of document IDs (keys). Check it out: http://wiki.apache.org/solr/RealTimeGet -- Jack Krupansky -Original Message- From: Brian Hurt Sent: Thursday, July 18, 2013 10:46 AM To: solr-user@lucene.apache.org Subject: Getting a large number of documents by id I have a situation which is common in our current use case, where I need to get a large number (many hundreds) of documents by id. What I'm doing currently is creating a large query of the form id:12345 OR id:23456 OR ... and sending it off. Unfortunately, this query is taking a long time, especially the first time it's executed. I'm seeing times of like 4+ seconds for this query to return, to get 847 documents. So, my question is: what should I be looking at to improve the performance here? Brian
Re: Getting a large number of documents by id
Brian, Have you tried the realtime get handler? It supports multiple documents. http://wiki.apache.org/solr/RealTimeGet Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Thu, Jul 18, 2013 at 10:46 AM, Brian Hurt bhur...@gmail.com wrote: I have a situation which is common in our current use case, where I need to get a large number (many hundreds) of documents by id. What I'm doing currently is creating a large query of the form id:12345 OR id:23456 OR ... and sending it off. Unfortunately, this query is taking a long time, especially the first time it's executed. I'm seeing times of like 4+ seconds for this query to return, to get 847 documents. So, my question is: what should I be looking at to improve the performance here? Brian
Re: Solr with Hadoop
Rajesh, If you require to have an integration between Solr and Hadoop or NoSQL, I would recommend using a commercial distribution. I think most are free to use as long as you don't require support. I inquired about the Cloudera Search capability, but it seems like that far it is just preliminary: there is no tight integration yet between Hbase and Solr, for example, other than full text search on the HDFS data (I believe enabled in Hue). I am not too familiar with what MapR's M7 has to offer. However Datastax does a good job of tightly integrating Solr with Cassandra, and lets you query over the data ingested from Solr in Hive for example, which is pretty nice. Solr would not trigger Hadoop jobs, though. Cheers, Matt On 7/17/13 7:37 PM, Rajesh Jain rjai...@gmail.com wrote: I have a newbie question on integrating Solr with Hadoop. There are some vendors like Cloudera/MapR who have announced Solr Search for Hadoop. If I use the Apache distro, how can I use Solr Search on docs in HDFS/Hadoop Is there a tutorial on how to use it or getting started. I am using Flume to sink CSV docs into Hadoop/HDFS and I would like to use Solr to provide Search. Does Solr Search trigger MapReduce Jobs (like Splunk-Hunk) does? Thanks, Rajesh NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Getting a large number of documents by id
Look at speed of reading the data - likely, it takes long time to assemble a big response, especially if there are many long fields - you may want to try SSD disks, if you have that option. Also, to gain better understanding: Start your solr, start jvisualvm and attach to your running solr. Start sending queries and observe where the most time is spent - it is very easy, you don't have to be a programmer to do it. The crucial parts are (but they will show up under different names) are: 1. query parsing 2. search execution 3. response assembly quite likely, your query is a huge boolean OR clause, that may not be as efficient as some filter query. Your use case is actually not at all exotic. There will soon be a JIRA ticket that makes the scenario of sending/querying with large number of IDs less painful. http://lucene.472066.n3.nabble.com/Solr-large-boolean-filter-td4070747.html#a4070964 http://lucene.472066.n3.nabble.com/ACL-implementation-Pseudo-join-performance-amp-Atomic-Updates-td4077894.html But I would really recommend you to do the jvisualvm measurement - that's like bringing the light into darkness. roman On Thu, Jul 18, 2013 at 10:46 AM, Brian Hurt bhur...@gmail.com wrote: I have a situation which is common in our current use case, where I need to get a large number (many hundreds) of documents by id. What I'm doing currently is creating a large query of the form id:12345 OR id:23456 OR ... and sending it off. Unfortunately, this query is taking a long time, especially the first time it's executed. I'm seeing times of like 4+ seconds for this query to return, to get 847 documents. So, my question is: what should I be looking at to improve the performance here? Brian
RE: Solr with Hadoop
I'm familiar with and have used both the DSE cluster as well as am in the process of evaluating cloudera search, in general cloudera search has tight integration with hdfs and takes care of replication and sharding transparently by using the pre-existing hdfs replication and sharding, however cloudera search actually uses solrcloud underneath and you would need to install zookeeper to enable coordination between each of the solr nodes. DataStax allows you to talk to Solr, however their model scales around the data model and architecture of cassandra, release 3.1 allows for some additional solr admin functionality and removes the need to write cassandra specific code. If you go the open source route you have a few options: 1) You can build a custom plugin inside solr that would internally query hdfs and return data, you would need to figure out how to scale this potentially using a solution very similar to cloudera search (i.e. leverage solrcloud), and if using solrcloud you would need ot install zookeeper for node coordination 2) You could write create a flume channel that accumulates specific events from hdfs and create a sink to write data directly to solr 3) I would look at cloudera search if you need tight integration into hadoop, it might save you some time and efforts I dont think you want to have solr trigger map-reduce jobs if you're looking at having very fast throughput through your search service. Hope this helps, ping me offline if you have more questions. Regards From: mlie...@impetus.com To: solr-user@lucene.apache.org Subject: Re: Solr with Hadoop Date: Thu, 18 Jul 2013 15:41:36 + Rajesh, If you require to have an integration between Solr and Hadoop or NoSQL, I would recommend using a commercial distribution. I think most are free to use as long as you don't require support. I inquired about the Cloudera Search capability, but it seems like that far it is just preliminary: there is no tight integration yet between Hbase and Solr, for example, other than full text search on the HDFS data (I believe enabled in Hue). I am not too familiar with what MapR's M7 has to offer. However Datastax does a good job of tightly integrating Solr with Cassandra, and lets you query over the data ingested from Solr in Hive for example, which is pretty nice. Solr would not trigger Hadoop jobs, though. Cheers, Matt On 7/17/13 7:37 PM, Rajesh Jain rjai...@gmail.com wrote: I have a newbie question on integrating Solr with Hadoop. There are some vendors like Cloudera/MapR who have announced Solr Search for Hadoop. If I use the Apache distro, how can I use Solr Search on docs in HDFS/Hadoop Is there a tutorial on how to use it or getting started. I am using Flume to sink CSV docs into Hadoop/HDFS and I would like to use Solr to provide Search. Does Solr Search trigger MapReduce Jobs (like Splunk-Hunk) does? Thanks, Rajesh NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Getting a large number of documents by id
And I guess, if only a subset of fields is being requested but there are other large fields present, there could be the cost of loading those extra fields into memory before discarding them. In which case, using enableLazyFieldLoading may help. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Jul 18, 2013 at 11:47 AM, Roman Chyla roman.ch...@gmail.com wrote: Look at speed of reading the data - likely, it takes long time to assemble a big response, especially if there are many long fields - you may want to try SSD disks, if you have that option. Also, to gain better understanding: Start your solr, start jvisualvm and attach to your running solr. Start sending queries and observe where the most time is spent - it is very easy, you don't have to be a programmer to do it. The crucial parts are (but they will show up under different names) are: 1. query parsing 2. search execution 3. response assembly quite likely, your query is a huge boolean OR clause, that may not be as efficient as some filter query. Your use case is actually not at all exotic. There will soon be a JIRA ticket that makes the scenario of sending/querying with large number of IDs less painful. http://lucene.472066.n3.nabble.com/Solr-large-boolean-filter-td4070747.html#a4070964 http://lucene.472066.n3.nabble.com/ACL-implementation-Pseudo-join-performance-amp-Atomic-Updates-td4077894.html But I would really recommend you to do the jvisualvm measurement - that's like bringing the light into darkness. roman On Thu, Jul 18, 2013 at 10:46 AM, Brian Hurt bhur...@gmail.com wrote: I have a situation which is common in our current use case, where I need to get a large number (many hundreds) of documents by id. What I'm doing currently is creating a large query of the form id:12345 OR id:23456 OR ... and sending it off. Unfortunately, this query is taking a long time, especially the first time it's executed. I'm seeing times of like 4+ seconds for this query to return, to get 847 documents. So, my question is: what should I be looking at to improve the performance here? Brian
XInclude and Document Entity not working on schema.xml
Hello, I am using the solr nightly version 4.5-2013-07-18_06-04-44 and I want to use Document Entity in schema.xml, I get this exception : java.lang.RuntimeException: schema fieldtype string(org.apache.solr.schema.StrField) invalid arguments:{xml:base=solrres:/commonschema_types.xml} at org.apache.solr.schema.FieldType.setArgs(FieldType.java:187) at org.apache.solr.schema.FieldTypePluginLoader.init(FieldTypePluginLoader.java:141) at org.apache.solr.schema.FieldTypePluginLoader.init(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:190) ... 16 more schema.xml: ?xml version=1.0 encoding=UTF-8 ? !DOCTYPE schema [ !ENTITY commonschema_types SYSTEM commonschema_types.xml ] schema name=searchSolrSchema version=1.5 types !-- Stuff -- commonschema_types; /types !-- Stuff -- /schema commonschema_types.xml: ?xml version=1.0 encoding=UTF-8 ? fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.TrieLongField precisionStep=0 positionIncrementGap=0/ !-- Stuff -- The same error appears in this bug (fixed ?): https://issues.apache.org/jira/browse/SOLR-3087 It works with solr-4.2.1. //- I also try to use use XML XInclude mechanism (http://en.wikipedia.org/wiki/XInclude) to include parts of schema.xml. When I try to include a fieldType, I get this exception : org.apache.solr.common.SolrException: Unknown fieldType 'long' specified on field _version_ at org.apache.solr.schema.IndexSchema.loadFields(IndexSchema.java:644) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:470) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:164) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:267) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:622) ... 10 more The type is not found. I include 'schema_integration.xml' like this in 'schema.xml' : ?xml version=1.0 encoding=UTF-8 ? schema name=default version=1.5 types !-- Stuff -- xi:include href=commonschema_types.xml xmlns:xi=http://www.w3.org/2001/XInclude/ /types !-- Stuff -- fields field name=_version_ type=long indexed=true stored=true multiValued=false/ !-- Stuff -- /fields /schema Is it a bug of the nightly version ? Elodie Sannier Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: solr autodetectparser tikaconfig dataimporter error
i have now changed some things and the import runs without error. in schema.xml i haven't got the field text but contentsExact. unfortunatly the text (from file) isn't indexed even though i mapped it to the proper field. what am i doing wrong? data-config.xml: dataConfig dataSource type=BinFileDataSource name=data/ dataSource type=BinURLDataSource name=dataUrl/ dataSource type=URLDataSource baseUrl=http://127.0.0.1/tkb/internet/; name=main/ document entity name=rec processor=XPathEntityProcessor url=docImport.xml forEach=/albums/album dataSource=main !--transformer=script:GenerateId-- field column=title xpath=//title / field column=id xpath=//file / field column=path xpath=//path / field column=Author xpath=//author / !-- field column=tstamp2013-07-05T14:59:46.889Z/field -- entity name=f processor=FileListEntityProcessor baseDir=C:\web\development\tkb\internet\public fileName=${rec.id} dataSource=data onError=skip entity name=tika processor=TikaEntityProcessor url=${f.fileAbsolutePath} field column=text name=contentsExact / /entity /entity /entity /document /dataConfig i noticed, that when I move the field author into the tika-entity it isn't indexed. can this have something to do why the text from the file isn't indexed? Do I have to do something special about the entity-levels in document ps: how do i import tsstamp, it's a static value? On 14. Jul 2013, at 10:30 PM, Jack Krupansky wrote: Caused by: java.lang.NoSuchMethodError: That means you have some out of date jars or some newer jars mixed in with the old ones. -- Jack Krupansky -Original Message- From: Andreas Owen Sent: Sunday, July 14, 2013 3:07 PM To: solr-user@lucene.apache.org Subject: Re: solr autodetectparser tikaconfig dataimporter error hi is there nowone with a idea what this error is or even give me a pointer where to look? If not is there a alternitave way to import documents from a xml-file with meta-data and the filename to parse? thanks for any help. On 12. Jul 2013, at 10:38 PM, Andreas Owen wrote: i am using solr 3.5, tika-app-1.4 and tagcloud 1.2.1. when i try to = import a file via xml i get this error, it doesn't matter what file format i try = to index txt, cfm, pdf all the same error: SEVERE: Exception while processing: rec document : SolrInputDocument[{id=3Did(1.0)=3D{myTest.txt}, title=3Dtitle(1.0)=3D{Beratungsseminar kundenbrief}, = contents=3Dcontents(1.0)=3D{wie kommuniziert man}, author=3Dauthor(1.0)=3D{Peter Z.}, = path=3Dpath(1.0)=3D{download/online}}]:org.apache.solr.handler.dataimport.= DataImportHandlerException: java.lang.NoSuchMethodError: = org.apache.tika.parser.AutoDetectParser.setConfig(Lorg/apache/tika/config/= TikaConfig;)V at = org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav= a:669) at = org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav= a:622) at = org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:2= 68) at = org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)= at = org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.= java:359) at = org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:4= 27) at = org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:40= 8) Caused by: java.lang.NoSuchMethodError: = org.apache.tika.parser.AutoDetectParser.setConfig(Lorg/apache/tika/config/= TikaConfig;)V at = org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityP= rocessor.java:122) at = org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityPr= ocessorWrapper.java:238) at = org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav= a:596) ... 6 more Jul 11, 2013 5:23:36 PM org.apache.solr.common.SolrException log SEVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoSuchMethodError: = org.apache.tika.parser.AutoDetectParser.setConfig(Lorg/apache/tika/config/= TikaConfig;)V at = org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav= a:669) at = org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav= a:622) at = org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:2= 68) at = org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)= at = org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.= java:359) at = org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:4= 27) at =
Luke's analysis of Trie Dates
I have a TrieDateField dynamic field setup in my schema, pretty standard... dynamicField name=*_tdt type=tdate indexed=true stored=false/ fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ In my code I only set one field, creation_tdt and I round it to the nearest second before storing it. However when I analyze it with Luke I get: lst name=fields lst name=creation_tdt str name=typetdate/str str name=schemaIT--OF--/str str name=dynamicBase*_tdt/str str name=index(unstored field)/str int name=docs22404/int int name=distinct-1/int lst name=topTerms int name=2013-07-18T13:37:33.696Z22404/int int name=1970-01-01T00:00:00Z22404/int int name=1970-01-01T00:00:00Z22404/int int name=2013-07-08T20:36:32.896Z22404/int int name=1970-01-01T00:00:00Z22404/int int name=2011-05-17T22:07:37.984Z22404/int int name=1970-01-01T00:00:00Z22404/int int name=2013-07-18T15:09:18.72Z16014/int int name=2013-07-18T15:04:56.576Z6390/int int name=2013-07-18T15:09:10.528Z1535/int int name=2013-07-18T15:09:55.584Z1459/int int name=2013-07-18T15:09:14.624Z1268/int int name=2013-07-18T15:09:06.432Z1193/int int name=2013-07-18T15:09:18.72Z1187/int int name=2013-07-18T15:09:51.488Z1152/int int name=2013-07-18T15:09:59.68Z1129/int int name=2013-07-18T15:09:02.336Z1089/int ... So my questions is, where are all these entries coming from? They are not the dates I specified because they have millis, and my field isn't multivalued, so the term counts dont add up (how could I have more than 22404 terms if I only have 22404 documents). Why multiple 1970-01-01T00:00:00Z entries? Is this somehow related to Trie fields and how they are indexed? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Luke-s-analysis-of-Trie-Dates-tp4078885.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Luke's analysis of Trie Dates
On Thu, Jul 18, 2013 at 12:53 PM, JohnRodey timothydd...@yahoo.com wrote: I have a TrieDateField dynamic field setup in my schema, pretty standard... dynamicField name=*_tdt type=tdate indexed=true stored=false/ fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ In my code I only set one field, creation_tdt and I round it to the nearest second before storing it. However when I analyze it with Luke I get: lst name=fields lst name=creation_tdt str name=typetdate/str str name=schemaIT--OF--/str str name=dynamicBase*_tdt/str str name=index(unstored field)/str int name=docs22404/int int name=distinct-1/int lst name=topTerms int name=2013-07-18T13:37:33.696Z22404/int int name=1970-01-01T00:00:00Z22404/int int name=1970-01-01T00:00:00Z22404/int int name=2013-07-08T20:36:32.896Z22404/int int name=1970-01-01T00:00:00Z22404/int int name=2011-05-17T22:07:37.984Z22404/int int name=1970-01-01T00:00:00Z22404/int int name=2013-07-18T15:09:18.72Z16014/int int name=2013-07-18T15:04:56.576Z6390/int int name=2013-07-18T15:09:10.528Z1535/int int name=2013-07-18T15:09:55.584Z1459/int int name=2013-07-18T15:09:14.624Z1268/int int name=2013-07-18T15:09:06.432Z1193/int int name=2013-07-18T15:09:18.72Z1187/int int name=2013-07-18T15:09:51.488Z1152/int int name=2013-07-18T15:09:59.68Z1129/int int name=2013-07-18T15:09:02.336Z1089/int ... So my questions is, where are all these entries coming from? They are not the dates I specified because they have millis, and my field isn't multivalued, so the term counts dont add up (how could I have more than 22404 terms if I only have 22404 documents). Why multiple 1970-01-01T00:00:00Z entries? Is this somehow related to Trie fields and how they are indexed? Yes, it's due to how trie fields are indexed (can have multiple indexed tokens per logical value to speed up range queries). If you want counts of values (as opposed to tokens), use faceting. -Yonik http://lucidworks.com
Re: JVM Crashed - SOLR deployed in Tomcat
Thanks for your reply. Yes, it worked. No more crashes after switching to 1.6.0_30 -- View this message in context: http://lucene.472066.n3.nabble.com/JVM-Crashed-SOLR-deployed-in-Tomcat-tp4078439p4078906.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing into SolrCloud
Hey folks, I've been migrating an application which indexes about 15M documents from straight-up Lucene into SolrCloud. We've set up 5 Solr instances with a 3 zookeeper ensemble using HAProxy for load balancing. The documents are processed on a quad core machine with 6 threads and indexed into SolrCloud through HAProxy using ConcurrentUpdateSolrServer in order to batch the updates. The indexing box is heavily-loaded during indexing but I don't think it is so bad that it would cause issues. I'm using Solr 4.3.1 on client and server side, zookeeper 3.4.5 and HAProxy 1.4.22. I've been accepting the default HttpClient with 50K buffered docs and 2 threads, i.e., int solrMaxBufferedDocs = 5; int solrThreadCount = 2; solrServer = new ConcurrentUpdateSolrServer(solrHttpIPAddress, solrMaxBufferedDocs, solrThreadCount); autoCommit is configured in the solrconfig as follows: autoCommit maxTime60/maxTime maxDocs50/maxDocs openSearcherfalse/openSearcher /autoCommit I'm getting the following errors on the client and server sides respectively: Client side: 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught when processing request: Software caused connection abort: socket write error 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO SystemDefaultHttpClient - Retrying request 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught when processing request: Software caused connection abort: socket write error 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO SystemDefaultHttpClient - Retrying request Server side: 7988753 [qtp1956653918-23] ERROR org.apache.solr.core.SolrCore â java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] early EOF at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393) When I disabled autoCommit on the server side, I didn't see any errors there but I still get the issue client-side after about 2 million documents - which is about 45 minutes. Has anyone seen this issue before? I couldn't find anything useful on the usual places. I suppose I could setup wireshark to see what is happening but I'm hoping that someone has a better suggestion. Thanks in advance for any help! Best regards, Jim Beale hibu.com 2201 Renaissance Boulevard, King of Prussia, PA, 19406 Office: 610-879-3864 Mobile: 610-220-3067 The information contained in this email message, including any attachments, is intended solely for use by the individual or entity named above and may be confidential. If the reader of this message is not the intended recipient, you are hereby notified that you must not read, use, disclose, distribute or copy any part of this communication. If you have received this communication in error, please immediately notify me by email and destroy the original message, including any attachments. Thank you. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org The information contained in this email message, including any attachments, is intended solely for use by the individual or entity named above and may be confidential. If the reader of this message is not the intended recipient, you are hereby notified that you must not read, use, disclose, distribute or copy any part of this communication. If you have received this communication in error, please immediately notify me by email and destroy the original message, including any attachments. Thank you.
Re: Getting a large number of documents by id
Thanks everyone for the response. On Thu, Jul 18, 2013 at 11:22 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: You could start from doing id:(12345 23456) to reduce the query length and possibly speed up parsing. I didn't know about this syntax- it looks useful. You could also move the query from 'q' parameter to 'fq' parameter, since you probably don't care about ranking ('fq' does not rank). Yes, I don't care about rank, so this helps. If these are unique every time, you could probably look at not caching (can't remember exact syntax). That's all I can think of at the moment without digging deep into why you need to do this at all. Short version of a long story: I'm implementing a graph database on top of solr. Which is not what solr is designed for, I know. This is a case where I'm following a set of edges from a given node to it's 847 children, and I need to get the children. And yes, I've looked at neo4j- it doesn't help. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Jul 18, 2013 at 10:46 AM, Brian Hurt bhur...@gmail.com wrote: I have a situation which is common in our current use case, where I need to get a large number (many hundreds) of documents by id. What I'm doing currently is creating a large query of the form id:12345 OR id:23456 OR ... and sending it off. Unfortunately, this query is taking a long time, especially the first time it's executed. I'm seeing times of like 4+ seconds for this query to return, to get 847 documents. So, my question is: what should I be looking at to improve the performance here? Brian
Auto-sharding and numShard parameter
Hi to all, Probably this question has a simple answer but I just want to be sure of the potential drawbacks..when I run SolrCloud I run the main solr instance with the -numShard option (e.g. 2). Then as data grows, shards could potentially become a huge number. If I hadstio to restart all nodes and I re-run the master with the numShard=2, what will happen? It will be just ignored or Solr will try to reduce shards...? Another question...in SolrCloud, how do I restart all the cloud at once? Is it possible? Best, Flavio
Need ideas to perform historical search
I am trying to implement Historical search using SOLR. Ex: If I search on address 800 5th Ave and provide a time range, it should list the name of the person who was living at the address during the time period. I am trying to figure out a way to store the data without redundancy. I can do a join in the database to return all the names who were living in a particular address during a particular time but I know it's difficult to do that in SOLR and SOLR is not a database (it works best when the data is denormalized).,.. Is there any other way / idea by which I can reduce the redundancy of creating multiple records for a particular person again and again? -- View this message in context: http://lucene.472066.n3.nabble.com/Need-ideas-to-perform-historical-search-tp4078980.html Sent from the Solr - User mailing list archive at Nabble.com.
Spellcheck questions
Exploring various SpellCheckers in solr and have a few questions, 1. Which algorithm is used for generating suggestions when using IndexBasedSpellChecker. I know its Levenshtein (with edit distance=2 - default) in DirectSolrSpellChecker. 2. If i have 2 indices, can I setup multiple IndexBasedSpellCheckers to point to different spellcheck dictionaries to generate suggestions from both. 3. Can I use IndexBasedSpellChecker and FileBasedSpellChecker together? I tried doing it and ran into an exception All checkers need to use the same StringDistance. Any help will be much apprecited. Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-questions-tp4078985.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellcheck questions
check the below link to get more info on IndexBasedSpellCheckers http://searchhub.org/2010/08/31/getting-started-spell-checking-with-apache-lucene-and-solr/ -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-questions-tp4078985p4079000.html Sent from the Solr - User mailing list archive at Nabble.com.
additional requests sent to solr
Hello, I send to solr( to server1 in the cluster of two servers) the folowing request http://server1:8983/solr/mycollection/select?q=alexwt=xmldefType=edismaxfacet.field=schoolfacet.field=companyfacet=truefacet.limit=10facet.mincount=1qf=school_txt+company_txt+nameshards=server1:8983/solr/mycollection,server2.com:8983/solr/mycollection I see in the logs 2 additional requests INFO: [mycollection] webapp=/solr path=/select params={facet=truef.company.facet.limit=25qf=school_txt+company_txt+namedistrib=falsewt=javabinversion=2rows=10defType=edismaxf.school_facet.facet.limit=25NOW=1374191542130shard.url=server1:8983/solr/mycollectionfl=id,scorestart=0q=alexfacet.field=schoolfacet.field=companyisShard=truefsv=true} hits=9118 status=0 QTime=72 Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute INFO: [mycollection] webapp=/solr path=/select params={facet=truefacet.mincount=1company__terms=Googleids=957642543183429632,957841245982425088,67612781366,56659036467,50875569066,957707339232706560,465078975511facet.limit=10qf=school_txt+company_txt+namedistrib=falsewt=javabinversion=2rows=10defType=edismaxNOW=1374191542130shard.url=server1:8983/solr/mycollectionschool__terms=Michigan+State+University,Brigham+Young+University,Northeastern+Universityq=alexfacet.field={!terms%3D$school__terms}schoolfacet.field={!terms%3D$company__terms}companyisShard=true} status=0 QTime=6 Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute INFO: [mycollection] webapp=/solr path=/select params={facet=trueshards=server1.prod.mylife.com:8983/solr/mycollection,server2:8983/solr/mycollectionfacet.mincount=1q=alexfacet.limit=10qf=school_txt+company_txt+namefacet.field=schoolfacet.field=companywt=xmldefType=edismax} hits=97262 status=0 QTime=168 I can understand that the first and the third log records are related to the above request, but cannot inderstand where the second log comes from. I see in it, company__terms and {!terms%3D$school__terms}schoolfacet.field={!terms%3D$company__terms}, whish seems does not have anything to do with the initial request. This is solr-4.2.0 Any ideas about it are welcome. Thanks in advance. Alex.
Solr 4.3 open a lot more files than solr 3.6
Hi, After upgrading solr from 3.6 to 4.3, we found that solr opened a lot more files compared to solr 3.6 (when core is open). Since we have many cores (more than 2K and still grow), we would like to reduce the number of open files. We already used shareSchema and sharedLib, we also shared SolrConfig across all cores, we also commented out autoSoftCommit in solrconfig.xml. In solr 3.6, it seems that indexWriter is opened only if indexing request comes and immediately closed after request is done, but in solr 4.3, IndexWriter kept open, is there an easy way to go back to 3.6 behavior (we donot need to use Near RealTime Search), can we change code to disable keeping IndexWriter open (if no better way)? Any guidance to reduce open files would be very helpful? Thanks very much for helps, Lisheng
Re: add to ContributorsGroup - Instructions for setting up SolrCloud on jboss
Thank you for adding to the wiki! It's always appreciated... On Wed, Jul 17, 2013 at 5:18 PM, Ali, Saqib docbook@gmail.com wrote: Thanks Erick! I have added the instructions for running SolrCloud on Jboss: http://wiki.apache.org/solr/SolrCloud%20using%20Jboss I will refine the instructions further, and also post some screenshots. Thanks. On Sun, Jul 14, 2013 at 5:05 AM, Erick Erickson erickerick...@gmail.comwrote: Done, sorry it took so long, hadn't looked at the list in a couple of days. Erick On Fri, Jul 12, 2013 at 5:46 PM, Ali, Saqib docbook@gmail.com wrote: username: saqib On Fri, Jul 12, 2013 at 2:35 PM, Ali, Saqib docbook@gmail.com wrote: Hello, Can you please add me to the ContributorsGroup? I would like to add instructions for setting up SolrCloud using Jboss. thanks.
Re: Need ideas to perform historical search
Why do you care about redundancy? That's the search engine's architectural tradeoff (as far as I understand). And, the tokens are all normalized under the covers, so it does not take as much space as you expect. Specifically regarding your issue, maybe you should store 'occupancy' as the record. That's similar to what they do at Gilt: http://www.slideshare.net/trenaman/personalized-search-on-the-largest-flash-sale-site-in-america(slide 36+) The other option is to use location as spans with some clever queries: http://wiki.apache.org/solr/SpatialForTimeDurations (follow the links). Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Jul 18, 2013 at 5:58 PM, SolrLover bbar...@gmail.com wrote: I am trying to implement Historical search using SOLR. Ex: If I search on address 800 5th Ave and provide a time range, it should list the name of the person who was living at the address during the time period. I am trying to figure out a way to store the data without redundancy. I can do a join in the database to return all the names who were living in a particular address during a particular time but I know it's difficult to do that in SOLR and SOLR is not a database (it works best when the data is denormalized).,.. Is there any other way / idea by which I can reduce the redundancy of creating multiple records for a particular person again and again? -- View this message in context: http://lucene.472066.n3.nabble.com/Need-ideas-to-perform-historical-search-tp4078980.html Sent from the Solr - User mailing list archive at Nabble.com.