where should I check for solrj SolrServerException
Hi, I am using Solrj and sometimes get connection problems like : org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at Caused by: java.net.SocketException: Connection reset Caused by: org.apache.http.NoHttpResponseException: The target server failed to respond Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to 10.10.68.183:8983 timed out it seems that it’s because of resource problem but the system load is not high. which parameter or log should i check to find the reason? Thanks, Chunk.
Re: Modify/add/remove params at search component
Hi Umesh Prassad, it makes my code simpler than before. Thank you, Chunki. On Aug 4, 2014, at 9:48 PM, Umesh Prasad umesh.i...@gmail.com wrote: Use ModifiableParams SolrParams params = rb.req.getParams(); ModifiableSolrParams modifableSolrParams = new ModifiableSolrParams(params); modifableSolrParams.set(ParamName, paramValue); rb.req.setParams(modifableSolrParams) On 4 August 2014 12:47, Lee Chunki lck7...@coupang.com wrote: Hi, I am building a new search component and it runs after QueryComponent. What I want to do is set params like start, rows, query and so on at new search component. I could set/get query by using setQueryString() http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/handler/component/ResponseBuilder.html#setQueryString(java.lang.String) getQueryString() http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/handler/component/ResponseBuilder.html#getQueryString() and get params by using rb.req.getParams() but how can I set params at search component? Thanks, Chunki. -- Thanks Regards Umesh Prasad Search l...@flipkart.com in.linkedin.com/pub/umesh-prasad/6/5bb/580/
Re: where should I check for solrj SolrServerException
Does it by any chance happen after a period of inactivity? And you are holding on to the client? If so, check you don't have a firewall in between that times out and drops the assumed dead connection. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Tue, Aug 5, 2014 at 8:15 AM, Lee Chunki lck7...@coupang.com wrote: Hi, I am using Solrj and sometimes get connection problems like : org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at Caused by: java.net.SocketException: Connection reset Caused by: org.apache.http.NoHttpResponseException: The target server failed to respond Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to 10.10.68.183:8983 timed out it seems that it’s because of resource problem but the system load is not high. which parameter or log should i check to find the reason? Thanks, Chunk.
Re: where should I check for solrj SolrServerException
The system architecture is solrj client —— L4 ——— three sold servers it works well most of time but the error occurs less 20 times a day. Thanks, Chunki On Aug 5, 2014, at 3:35 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Does it by any chance happen after a period of inactivity? And you are holding on to the client? If so, check you don't have a firewall in between that times out and drops the assumed dead connection. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Tue, Aug 5, 2014 at 8:15 AM, Lee Chunki lck7...@coupang.com wrote: Hi, I am using Solrj and sometimes get connection problems like : org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at Caused by: java.net.SocketException: Connection reset Caused by: org.apache.http.NoHttpResponseException: The target server failed to respond Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to 10.10.68.183:8983 timed out it seems that it’s because of resource problem but the system load is not high. which parameter or log should i check to find the reason? Thanks, Chunk.
AUTO: Nicholas M. Wertzberger is out of the office (returning 08/08/2014)
I am out of the office until 08/08/2014. I am in a training class until Friday. Please contact Jason Brown for anything JAS Team related. Note: This is an automated response to your message Re: Solr Faceting issue sent on 8/5/2014 12:12:04 AM. This is the only notification you will receive while this person is away. ** This email and any attachments may contain information that is confidential and/or privileged for the sole use of the intended recipient. Any use, review, disclosure, copying, distribution or reliance by others, and any forwarding of this email or its contents, without the express permission of the sender is strictly prohibited by law. If you are not the intended recipient, please contact the sender immediately, delete the e-mail and destroy all copies. **
Re: Paging bug in ReRankingQParserPlugin?
The comment in the code reads slightly different: // This enusres that reRankDocs = docs needed to satisfy the result set. reRankDocs = Math.max(start+rows, reRankDocs); I think you're right though that this is confusing. The way the ReRankingQParserPlugin works is that it grabs the top X documents (reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough to satisfy the page then the result won't have enough documents. The intended use of this was actually to stop using query re-ranking when you paged past the reRanked results. So if you re-rank the top 200 documents, you would drop the re-ranking parameter when you page to documents 201-220. So the line: reRankDocs = Math.max(start+rows, reRankDocs); Saves you from an unexpected shortfall in documents if you do page beyond the reRankDocs. At the very least the expected use should be documented and if we can figure out better behavior here that would be great. Joel Bernstein Search Engineer at Heliosearch On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac adairko...@gmail.com wrote: Looking at this line in the code: // This enusres that reRankDocs = docs needed to satisfy the result set. reRankDocs = Math.max(start+rows, reRankDocs); This looks like it would cause skips and duplicates while paging through the results, since if you exceed the reRankDocs parameter and keep finding things that match the re-ranking query, they'll get boosted earlier (skipped), thus pushing down items you already saw (causing duplicates). It's obviously intentional behavior, but there's no documentation I can see of why, if you request fewer documents to be re-ranked than you're asking to view, it goes ahead and ignores the number you asked for. What if I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better to make the client choose whether to increase the reRankDocs or leave it the same? If no one replies and I have time, I might check out 4.9 and see if I can confirm or disprove the bug, but figured I'd bring it up now in case I don't end up having time. It would be good to document the reason for this behavior if it turns out it's necessary. Thanks. I'm excited about this feature btw. --Adair
no of request count in solr
is there any way to get the request count per hour or per day in solr.Thanks,RR -- View this message in context: http://lucene.472066.n3.nabble.com/no-of-request-count-in-solr-tp4151191.html Sent from the Solr - User mailing list archive at Nabble.com.
solr over hdfs for accessing/ changing indexes outside solr
Dear all, Hi, I changed solr 4.9 to write index and data on hdfs. Now I am going to connect to those data from the outside of solr for changing some of the values. Could somebody please tell me how that is possible? Suppose I am using Hbase over hdfs for do these changes. Best regards. -- A.Nazemian
Re: Auto Complete
hello, did you find any solution to this problem ? regards 2014-08-04 16:16 GMT+02:00 Michael Della Bitta-2 [via Lucene] ml-node+s472066n4150990...@n3.nabble.com: How are you implementing autosuggest? I'm assuming you're querying an indexed field and getting a stored value back. But there are a wide variety of ways of doing it. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Mon, Aug 4, 2014 at 10:10 AM, benjelloun [hidden email] http://user/SendEmail.jtp?type=nodenode=4150990i=0 wrote: hello you didnt enderstand well my probleme, i give exemple: i have document contain genève with accent when i do q=gene -- autoSuggest geneve because of ASCIIFoldingFilterFactory preserveOriginal=true when i do q=genè -- autoSuggest genève but what i need to is: q=gene without accent and get this result: genève with accent -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150989.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150990.html To unsubscribe from Auto Complete, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4150987code=YW5hc3MuYm5qQGdtYWlsLmNvbXw0MTUwOTg3fC0xMDQyNjMzMDgx . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4151211.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Auto Complete
Unless I'm mistaken, it seems like you've created this index specifically for autocomplete? Or is this index used for general search also? The easy way to understand this question: Is there one entry in your index for each term you want to autocomplete? Or are there multiple entries that might contain the same term? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Aug 5, 2014 at 9:10 AM, benjelloun anass@gmail.com wrote: hello, did you find any solution to this problem ? regards 2014-08-04 16:16 GMT+02:00 Michael Della Bitta-2 [via Lucene] ml-node+s472066n4150990...@n3.nabble.com: How are you implementing autosuggest? I'm assuming you're querying an indexed field and getting a stored value back. But there are a wide variety of ways of doing it. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Mon, Aug 4, 2014 at 10:10 AM, benjelloun [hidden email] http://user/SendEmail.jtp?type=nodenode=4150990i=0 wrote: hello you didnt enderstand well my probleme, i give exemple: i have document contain genève with accent when i do q=gene -- autoSuggest geneve because of ASCIIFoldingFilterFactory preserveOriginal=true when i do q=genè -- autoSuggest genève but what i need to is: q=gene without accent and get this result: genève with accent -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150989.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150990.html To unsubscribe from Auto Complete, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4150987code=YW5hc3MuYm5qQGdtYWlsLmNvbXw0MTUwOTg3fC0xMDQyNjMzMDgx . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4151211.html Sent from the Solr - User mailing list archive at Nabble.com.
Delta Import - Cleaning Index
Hi everyone I have a Solr Index that has 20+ million products, the core is about 70GB. What I would like to do, is a weekly delta-import, but it seems to be growing in size each week. (Currently its running a full-import + clean=false) Shouldn't the Delta-Import with the Clean=True option import the records and update the old records in the core? It should result in +- the same size? When I do a delta-import + clean=true via the Solr Dashboard, it cleans the whole 20+million and only the update records are left. Any ideas? Thank you. -- *Jako de Wet* *Business DevelopmentSpecialist* *SAPnet* 98 Beach Rd, 1st Floor Metropole Plaza Strand 7140 South Africa Phone: +27-21-853-3564 Fax: +27-21-853-3479 Website: www.sapnet.co.za E-mail: j...@sapnet.co.za [image: http://www.sapnet.co.za/sapnet_logo.png] This transmission is for the intended addressee only and is confidential information. If you have received this transmission in error, please delete it and notify the sender. The contents of this e-mail are the opinion of the writer only and are not endorsed by SAPnet unless expressly stated otherwise. All information contained in this transmission (attachments included) is the property of Publications Network (Pty) Ltd t/a SAPnet and protected under Copyright © 2005 by Publications Network (Pty) Ltd t/a SAPnet. SAPnet reserve all rights and unauthorized reproduction, in any manner, is prohibited.
Re: solr over hdfs for accessing/ changing indexes outside solr
On 8/5/2014 7:04 AM, Ali Nazemian wrote: I changed solr 4.9 to write index and data on hdfs. Now I am going to connect to those data from the outside of solr for changing some of the values. Could somebody please tell me how that is possible? Suppose I am using Hbase over hdfs for do these changes. I don't know how you could safely modify the index without a Lucene application or another instance of Solr, but if you do manage to modify the index, simply reloading the core or restarting Solr should cause it to pick up the changes. Either you would need to make sure that Solr never modifies the index, or you would need some way of coordinating updates so that Solr and the other application would never try to modify the index at the same time. Thanks, Shawn
Re: Delta Import - Cleaning Index
On 8/5/2014 7:20 AM, Jako de Wet wrote: I have a Solr Index that has 20+ million products, the core is about 70GB. What I would like to do, is a weekly delta-import, but it seems to be growing in size each week. (Currently its running a full-import + clean=false) Shouldn't the Delta-Import with the Clean=True option import the records and update the old records in the core? It should result in +- the same size? When I do a delta-import + clean=true via the Solr Dashboard, it cleans the whole 20+million and only the update records are left. The clean parameter refers to the whole index. You asked it to clean the index, so it did -- it deleted all documents. Deleted documents are not actually deleted, they are marked as deleted -- they still take up disk space. In order to actually get rid of them, they need to be merged out. When segments are merged, only the non-deleted documents are copied to the new segment. A full optimize (which is a forced merge down to one segment) is the only way to be absolutely sure that all deleted documents are gone. A full optimize will completely rewrite the index, which is a lot of disk I/O. That can lead to query performance issues while the optimize is happening and for a short time afterwards. Note that when you index a document with the same value in the uniqueKey field as an existing document, the old document is deleted before the new one is indexed. Thanks, Shawn
Re: Delta Import - Cleaning Index
Hi Shawn Thanks for the insight. Why the size increase when not specifying the clean parameter then? The PK for the documents remain the same throughout the whole import process. Should a full optimize combine all the results into one and decrease the physical size of the core? On Tue, Aug 5, 2014 at 3:28 PM, Shawn Heisey s...@elyograg.org wrote: On 8/5/2014 7:20 AM, Jako de Wet wrote: I have a Solr Index that has 20+ million products, the core is about 70GB. What I would like to do, is a weekly delta-import, but it seems to be growing in size each week. (Currently its running a full-import + clean=false) Shouldn't the Delta-Import with the Clean=True option import the records and update the old records in the core? It should result in +- the same size? When I do a delta-import + clean=true via the Solr Dashboard, it cleans the whole 20+million and only the update records are left. The clean parameter refers to the whole index. You asked it to clean the index, so it did -- it deleted all documents. Deleted documents are not actually deleted, they are marked as deleted -- they still take up disk space. In order to actually get rid of them, they need to be merged out. When segments are merged, only the non-deleted documents are copied to the new segment. A full optimize (which is a forced merge down to one segment) is the only way to be absolutely sure that all deleted documents are gone. A full optimize will completely rewrite the index, which is a lot of disk I/O. That can lead to query performance issues while the optimize is happening and for a short time afterwards. Note that when you index a document with the same value in the uniqueKey field as an existing document, the old document is deleted before the new one is indexed. Thanks, Shawn -- *Jako de Wet* *Business DevelopmentSpecialist* *SAPnet* 98 Beach Rd, 1st Floor Metropole Plaza Strand 7140 South Africa Phone: +27-21-853-3564 Fax: +27-21-853-3479 Website: www.sapnet.co.za E-mail: j...@sapnet.co.za [image: http://www.sapnet.co.za/sapnet_logo.png] This transmission is for the intended addressee only and is confidential information. If you have received this transmission in error, please delete it and notify the sender. The contents of this e-mail are the opinion of the writer only and are not endorsed by SAPnet unless expressly stated otherwise. All information contained in this transmission (attachments included) is the property of Publications Network (Pty) Ltd t/a SAPnet and protected under Copyright © 2005 by Publications Network (Pty) Ltd t/a SAPnet. SAPnet reserve all rights and unauthorized reproduction, in any manner, is prohibited.
Re: no of request count in solr
On 8/5/2014 6:06 AM, rockstar007 wrote: is there any way to get the request count per hour or per day in solr.Thanks,RR There is no information about requests per hour or day, but the number of requests is available, if you track it yourself on an hourly basis, you can calculate it. It's in the admin UI under Plugins/Stats, or you can use the same handler the admin UI does at the following URL: /solr/corename/admin/mbeans?stats=true Thanks, Shawn
Re: Auto Complete
yeah thats true i creat this index just for auto complete here is my schema: dynamicField name=*_en type=text_en indexed=true stored=false required=false multiValued=true/ dynamicField name=*_fr type=text_fr indexed=true stored=false required=false multiValued=true/ dynamicField name=*_ar type=text_ar indexed=true stored=false required=false multiValued=true/ copyField source=*_en dest=suggestField/ copyField source=*_fr dest=suggestField/ copyField source=*_ar dest=suggestField/ the i use suggestField for autocomplet like i mentioned above do you have any other configuration which can do what i need ? 2014-08-05 15:19 GMT+02:00 Michael Della Bitta-2 [via Lucene] ml-node+s472066n4151216...@n3.nabble.com: Unless I'm mistaken, it seems like you've created this index specifically for autocomplete? Or is this index used for general search also? The easy way to understand this question: Is there one entry in your index for each term you want to autocomplete? Or are there multiple entries that might contain the same term? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Aug 5, 2014 at 9:10 AM, benjelloun [hidden email] http://user/SendEmail.jtp?type=nodenode=4151216i=0 wrote: hello, did you find any solution to this problem ? regards 2014-08-04 16:16 GMT+02:00 Michael Della Bitta-2 [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4151216i=1: How are you implementing autosuggest? I'm assuming you're querying an indexed field and getting a stored value back. But there are a wide variety of ways of doing it. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Mon, Aug 4, 2014 at 10:10 AM, benjelloun [hidden email] http://user/SendEmail.jtp?type=nodenode=4150990i=0 wrote: hello you didnt enderstand well my probleme, i give exemple: i have document contain genève with accent when i do q=gene -- autoSuggest geneve because of ASCIIFoldingFilterFactory preserveOriginal=true when i do q=genè -- autoSuggest genève but what i need to is: q=gene without accent and get this result: genève with accent -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150989.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150990.html To unsubscribe from Auto Complete, click here . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4151211.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4151216.html To unsubscribe from Auto Complete, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4150987code=YW5hc3MuYm5qQGdtYWlsLmNvbXw0MTUwOTg3fC0xMDQyNjMzMDgx . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4151222.html Sent from the Solr - User mailing list archive at Nabble.com.
ExternalFileFieldReloader and commit
When there are multiple 'external file field' files available, Solr will reload the last one (lexicographically) with a commit, but only if changes were made to the index. Otherwise, it skips the reload and logs: No uncommitted changes. Skipping IW.commit. Has anyone else noticed this? It seems like a bug to me. (yes, I do have firstSearcher and newSearcher event listeners in solrconfig.xml) Peter
Re: Getting Solr 4 to index the simple names of files
Solution found: I was using the SimplePostTool utility to crawl and post documents to Solr on the default example settings (except for having added a few file types to be indexed). Instead of finding a field that exactly passed the name of the document, I used the resourcename text field that was already being parsed. In my javascript in the AJAX interface I then cut the resourcename down into simply a file name (and linked it to the corresponding file) with: var fullFileName = doc.resourcename; var output = 'div ' + fullFileName.substring(afterLastBackslash) + ' + '; Thank you for the help! -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-Solr-4-to-index-the-simple-names-of-files-tp4144318p4151227.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delta Import - Cleaning Index
On 8/5/2014 7:31 AM, Jako de Wet wrote: Thanks for the insight. Why the size increase when not specifying the clean parameter then? The PK for the documents remain the same throughout the whole import process. Should a full optimize combine all the results into one and decrease the physical size of the core? When you delete all documents, all of the original segments have no undeleted documents in them, so Lucene knows it can completely remove those segments even when there is no merging. I don't know what situations will trigger such automatic removal, but Lucene is smart enough to know that it can do it. If you simply rely on uniqueKey replacement, the space taken up by deleted documents cannot be automatically recovered, because there are good documents in those segments. Only a merge can recover the space, and only an optimize can guarantee any specific document's segment will be merged. Thanks, Shawn
Re: Implementing custom analyzer for multi-language stemming
I've started a GitHub project to try out some cross-lingual analysis ideas ( https://github.com/whateverdood/cross-lingual-search). I haven't played over there for about 3 months, but plan on restarting work there shortly. In a nutshell, the interesting component (SimplePolyGlotStemmingTokenFilter) relies on ICU4J ScriptAttributes: each token is inspected for it's script, i.e. latin or arabic, and then a ScriptStemmer recruits the appropriate stemmer to handle the token. Of course this is extremely primitive and basic, but I think it would be possible to write a CharFilter or TokenFilter that inspects the entire TokenStream to guess the language(s), perhaps even noting where languages change. Language and position information could be tracked, the TokenStream rewound and then Tokens emitted with LanguageAttributes for downstream Token stemmers to deal with. Or is that a crazy idea? On Tue, Aug 5, 2014 at 12:10 AM, TK kuros...@sonic.net wrote: On 7/30/14, 10:47 AM, Eugene wrote: Hello, fellow Solr and Lucene users and developers! In our project we receive text from users in different languages. We detect language automatically and use Google Translate APIs a lot (so having arbitrary number of languages in our system doesn't concern us). However we need to be able to search using stemming. Having nearly hundred of fields (several fields for each language with language-specific stemmers) listed in our search query is not an option. So we need a way to have a single index which has stemmed tokens for different languages. Do you mean to have a Tokenizer that switches among supported languages depending on the lang field? This is something I thought about when I started working on Solr/Lucene and soon I realized it is not possible because of the way Lucene is designed; The Tokenizer in an analyzer chain cannot peek other field's value, or there is no way to control which field is processed first. If that's not what you are trying to achieve, could you tell us what it is? If you have different language text in a single field, and if someone search for a word common to many languages, such as sports (or Lucene for that matter), Solr will return the documents of different languages, most of which the user doesn't understand. Would that be useful? If you have a special use case, would you like to share it? -- Kuro
Re: Auto Complete
i found this solution but when i test it nothing in suggestion searchComponent class=solr.SpellCheckComponent name=fuzzySuggest lst name=spellchecker str name=namefuzzySuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.FuzzyLookupFactory/str str name=fieldsuggestField/str str name=storeDirsuggestFolders/str str name=buildOnCommittrue/str bool name=exactMatchFirsttrue/bool str name=suggestAnalyzerFieldTypetexts/str bool name=preserveSepfalse/bool int name=maxEdits2/int str name=sourceLocationsuggestFolders/fuzzysuggest.txt/str /lst str name=queryAnalyzerFieldTypephrase_suggest/str /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/fuzzySuggest lst name=defaults str name=namefuzzySuggest/str str name=spellchecktrue/str str name=spellcheck.dictionaryfuzzySuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations10/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=components strfuzzySuggest/str /arr /requestHandler 2014-08-05 15:32 GMT+02:00 anass benjelloun anass@gmail.com: yeah thats true i creat this index just for auto complete here is my schema: dynamicField name=*_en type=text_en indexed=true stored=false required=false multiValued=true/ dynamicField name=*_fr type=text_fr indexed=true stored=false required=false multiValued=true/ dynamicField name=*_ar type=text_ar indexed=true stored=false required=false multiValued=true/ copyField source=*_en dest=suggestField/ copyField source=*_fr dest=suggestField/ copyField source=*_ar dest=suggestField/ the i use suggestField for autocomplet like i mentioned above do you have any other configuration which can do what i need ? 2014-08-05 15:19 GMT+02:00 Michael Della Bitta-2 [via Lucene] ml-node+s472066n4151216...@n3.nabble.com: Unless I'm mistaken, it seems like you've created this index specifically for autocomplete? Or is this index used for general search also? The easy way to understand this question: Is there one entry in your index for each term you want to autocomplete? Or are there multiple entries that might contain the same term? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Aug 5, 2014 at 9:10 AM, benjelloun [hidden email] http://user/SendEmail.jtp?type=nodenode=4151216i=0 wrote: hello, did you find any solution to this problem ? regards 2014-08-04 16:16 GMT+02:00 Michael Della Bitta-2 [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4151216i=1: How are you implementing autosuggest? I'm assuming you're querying an indexed field and getting a stored value back. But there are a wide variety of ways of doing it. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Mon, Aug 4, 2014 at 10:10 AM, benjelloun [hidden email] http://user/SendEmail.jtp?type=nodenode=4150990i=0 wrote: hello you didnt enderstand well my probleme, i give exemple: i have document contain genève with accent when i do q=gene -- autoSuggest geneve because of ASCIIFoldingFilterFactory preserveOriginal=true when i do q=genè -- autoSuggest genève but what i need to is: q=gene without accent and get this result: genève with accent -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150989.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150990.html To unsubscribe from Auto Complete, click here . NAML
Re: Auto Complete
In this case, I recommend using the approach that this tutorial uses: http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ Basically the idea is you index the data a few different ways and then use edismax to query them all with different boosts. You'd use the stored version of you field for display, so your accented characters would not get stripped. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Aug 5, 2014 at 9:32 AM, benjelloun anass@gmail.com wrote: yeah thats true i creat this index just for auto complete here is my schema: dynamicField name=*_en type=text_en indexed=true stored=false required=false multiValued=true/ dynamicField name=*_fr type=text_fr indexed=true stored=false required=false multiValued=true/ dynamicField name=*_ar type=text_ar indexed=true stored=false required=false multiValued=true/ copyField source=*_en dest=suggestField/ copyField source=*_fr dest=suggestField/ copyField source=*_ar dest=suggestField/ the i use suggestField for autocomplet like i mentioned above do you have any other configuration which can do what i need ? 2014-08-05 15:19 GMT+02:00 Michael Della Bitta-2 [via Lucene] ml-node+s472066n4151216...@n3.nabble.com: Unless I'm mistaken, it seems like you've created this index specifically for autocomplete? Or is this index used for general search also? The easy way to understand this question: Is there one entry in your index for each term you want to autocomplete? Or are there multiple entries that might contain the same term? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Aug 5, 2014 at 9:10 AM, benjelloun [hidden email] http://user/SendEmail.jtp?type=nodenode=4151216i=0 wrote: hello, did you find any solution to this problem ? regards 2014-08-04 16:16 GMT+02:00 Michael Della Bitta-2 [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4151216i=1 : How are you implementing autosuggest? I'm assuming you're querying an indexed field and getting a stored value back. But there are a wide variety of ways of doing it. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Mon, Aug 4, 2014 at 10:10 AM, benjelloun [hidden email] http://user/SendEmail.jtp?type=nodenode=4150990i=0 wrote: hello you didnt enderstand well my probleme, i give exemple: i have document contain genève with accent when i do q=gene -- autoSuggest geneve because of ASCIIFoldingFilterFactory preserveOriginal=true when i do q=genè -- autoSuggest genève but what i need to is: q=gene without accent and get this result: genève with accent -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150989.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4150990.html To unsubscribe from Auto Complete, click here . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Complete-tp4150987p4151211.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below:
Re: Paging bug in ReRankingQParserPlugin?
Thanks, great explanation! Yeah, if it keeps the current behavior added documentation would be great. Are there any other features that expect parameters to change as one pages? If not I'm concerned that it might be hard to support for clients that assume only the index params will change. It also makes it harder to work if we want to add re-ranking on a strict small set of results on the first page, because then we'd have to stitch together two result sets. We don't currently want to do that, though. For what it's worth, what my colleague who linked me the feature and I both assumed the behavior would be is that it would get all the results and return the ones past the re-ranking point as-is. Is that possible? Thanks, Adair On Tue, Aug 5, 2014 at 5:53 AM, Joel Bernstein joels...@gmail.com wrote: The comment in the code reads slightly different: // This enusres that reRankDocs = docs needed to satisfy the result set. reRankDocs = Math.max(start+rows, reRankDocs); I think you're right though that this is confusing. The way the ReRankingQParserPlugin works is that it grabs the top X documents (reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough to satisfy the page then the result won't have enough documents. The intended use of this was actually to stop using query re-ranking when you paged past the reRanked results. So if you re-rank the top 200 documents, you would drop the re-ranking parameter when you page to documents 201-220. So the line: reRankDocs = Math.max(start+rows, reRankDocs); Saves you from an unexpected shortfall in documents if you do page beyond the reRankDocs. At the very least the expected use should be documented and if we can figure out better behavior here that would be great. Joel Bernstein Search Engineer at Heliosearch On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac adairko...@gmail.com wrote: Looking at this line in the code: // This enusres that reRankDocs = docs needed to satisfy the result set. reRankDocs = Math.max(start+rows, reRankDocs); This looks like it would cause skips and duplicates while paging through the results, since if you exceed the reRankDocs parameter and keep finding things that match the re-ranking query, they'll get boosted earlier (skipped), thus pushing down items you already saw (causing duplicates). It's obviously intentional behavior, but there's no documentation I can see of why, if you request fewer documents to be re-ranked than you're asking to view, it goes ahead and ignores the number you asked for. What if I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better to make the client choose whether to increase the reRankDocs or leave it the same? If no one replies and I have time, I might check out 4.9 and see if I can confirm or disprove the bug, but figured I'd bring it up now in case I don't end up having time. It would be good to document the reason for this behavior if it turns out it's necessary. Thanks. I'm excited about this feature btw. --Adair
Re: solr over hdfs for accessing/ changing indexes outside solr
Probably the most correct way to modify the index would be to use the Solr REST API to push your changes out. Another thing you might want to look at is Lilly. Basically it's a way to set up a Solr collection as an HBase replication target, so changes to your HBase table would automatically propagate over to Solr. http://www.ngdata.com/on-lily-hbase-hadoop-and-solr/ Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Aug 5, 2014 at 9:04 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear all, Hi, I changed solr 4.9 to write index and data on hdfs. Now I am going to connect to those data from the outside of solr for changing some of the values. Could somebody please tell me how that is possible? Suppose I am using Hbase over hdfs for do these changes. Best regards. -- A.Nazemian
Re: Paging bug in ReRankingQParserPlugin?
I updated the docs for now. But I agree this paging issue needs to be handled transparently. Feel free to create a jira issue for this or I can create one when I have time to start looking into it. Joel Bernstein Search Engineer at Heliosearch On Tue, Aug 5, 2014 at 12:04 PM, Adair Kovac adairko...@gmail.com wrote: Thanks, great explanation! Yeah, if it keeps the current behavior added documentation would be great. Are there any other features that expect parameters to change as one pages? If not I'm concerned that it might be hard to support for clients that assume only the index params will change. It also makes it harder to work if we want to add re-ranking on a strict small set of results on the first page, because then we'd have to stitch together two result sets. We don't currently want to do that, though. For what it's worth, what my colleague who linked me the feature and I both assumed the behavior would be is that it would get all the results and return the ones past the re-ranking point as-is. Is that possible? Thanks, Adair On Tue, Aug 5, 2014 at 5:53 AM, Joel Bernstein joels...@gmail.com wrote: The comment in the code reads slightly different: // This enusres that reRankDocs = docs needed to satisfy the result set. reRankDocs = Math.max(start+rows, reRankDocs); I think you're right though that this is confusing. The way the ReRankingQParserPlugin works is that it grabs the top X documents (reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough to satisfy the page then the result won't have enough documents. The intended use of this was actually to stop using query re-ranking when you paged past the reRanked results. So if you re-rank the top 200 documents, you would drop the re-ranking parameter when you page to documents 201-220. So the line: reRankDocs = Math.max(start+rows, reRankDocs); Saves you from an unexpected shortfall in documents if you do page beyond the reRankDocs. At the very least the expected use should be documented and if we can figure out better behavior here that would be great. Joel Bernstein Search Engineer at Heliosearch On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac adairko...@gmail.com wrote: Looking at this line in the code: // This enusres that reRankDocs = docs needed to satisfy the result set. reRankDocs = Math.max(start+rows, reRankDocs); This looks like it would cause skips and duplicates while paging through the results, since if you exceed the reRankDocs parameter and keep finding things that match the re-ranking query, they'll get boosted earlier (skipped), thus pushing down items you already saw (causing duplicates). It's obviously intentional behavior, but there's no documentation I can see of why, if you request fewer documents to be re-ranked than you're asking to view, it goes ahead and ignores the number you asked for. What if I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better to make the client choose whether to increase the reRankDocs or leave it the same? If no one replies and I have time, I might check out 4.9 and see if I can confirm or disprove the bug, but figured I'd bring it up now in case I don't end up having time. It would be good to document the reason for this behavior if it turns out it's necessary. Thanks. I'm excited about this feature btw. --Adair
Re: Paging bug in ReRankingQParserPlugin?
You can also have a sliding re-ranking horizon. That is how we did it in Ultraseek. http://observer.wunderwood.org/2007/04/04/progressive-reranking/ wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Aug 5, 2014, at 9:38 AM, Joel Bernstein joels...@gmail.com wrote: I updated the docs for now. But I agree this paging issue needs to be handled transparently. Feel free to create a jira issue for this or I can create one when I have time to start looking into it. Joel Bernstein Search Engineer at Heliosearch On Tue, Aug 5, 2014 at 12:04 PM, Adair Kovac adairko...@gmail.com wrote: Thanks, great explanation! Yeah, if it keeps the current behavior added documentation would be great. Are there any other features that expect parameters to change as one pages? If not I'm concerned that it might be hard to support for clients that assume only the index params will change. It also makes it harder to work if we want to add re-ranking on a strict small set of results on the first page, because then we'd have to stitch together two result sets. We don't currently want to do that, though. For what it's worth, what my colleague who linked me the feature and I both assumed the behavior would be is that it would get all the results and return the ones past the re-ranking point as-is. Is that possible? Thanks, Adair On Tue, Aug 5, 2014 at 5:53 AM, Joel Bernstein joels...@gmail.com wrote: The comment in the code reads slightly different: // This enusres that reRankDocs = docs needed to satisfy the result set. reRankDocs = Math.max(start+rows, reRankDocs); I think you're right though that this is confusing. The way the ReRankingQParserPlugin works is that it grabs the top X documents (reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough to satisfy the page then the result won't have enough documents. The intended use of this was actually to stop using query re-ranking when you paged past the reRanked results. So if you re-rank the top 200 documents, you would drop the re-ranking parameter when you page to documents 201-220. So the line: reRankDocs = Math.max(start+rows, reRankDocs); Saves you from an unexpected shortfall in documents if you do page beyond the reRankDocs. At the very least the expected use should be documented and if we can figure out better behavior here that would be great. Joel Bernstein Search Engineer at Heliosearch On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac adairko...@gmail.com wrote: Looking at this line in the code: // This enusres that reRankDocs = docs needed to satisfy the result set. reRankDocs = Math.max(start+rows, reRankDocs); This looks like it would cause skips and duplicates while paging through the results, since if you exceed the reRankDocs parameter and keep finding things that match the re-ranking query, they'll get boosted earlier (skipped), thus pushing down items you already saw (causing duplicates). It's obviously intentional behavior, but there's no documentation I can see of why, if you request fewer documents to be re-ranked than you're asking to view, it goes ahead and ignores the number you asked for. What if I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better to make the client choose whether to increase the reRankDocs or leave it the same? If no one replies and I have time, I might check out 4.9 and see if I can confirm or disprove the bug, but figured I'd bring it up now in case I don't end up having time. It would be good to document the reason for this behavior if it turns out it's necessary. Thanks. I'm excited about this feature btw. --Adair
Re: solr over hdfs for accessing/ changing indexes outside solr
Actually I am going to do some analysis on the solr data using map reduce. For this purpose it might be needed to change some part of data or add new fields from outside solr. On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote: On 8/5/2014 7:04 AM, Ali Nazemian wrote: I changed solr 4.9 to write index and data on hdfs. Now I am going to connect to those data from the outside of solr for changing some of the values. Could somebody please tell me how that is possible? Suppose I am using Hbase over hdfs for do these changes. I don't know how you could safely modify the index without a Lucene application or another instance of Solr, but if you do manage to modify the index, simply reloading the core or restarting Solr should cause it to pick up the changes. Either you would need to make sure that Solr never modifies the index, or you would need some way of coordinating updates so that Solr and the other application would never try to modify the index at the same time. Thanks, Shawn -- A.Nazemian
Re: solr update dynamic field generates multiValued error
Hey Erick, i think that you were right, there was a mix in the schemas and that was generating the error on some of the documents. Thanks for the help guys! 2014-08-05 1:28 GMT-03:00 Erick Erickson erickerick...@gmail.com: Hmmm, I jus tried this with a 4.x build and I can update the document multiple times without a problem. I just indexed the standard exampledocs and then updated a doc like this (vidcard.xml was the base): add doc field name=idEN7800GTX/2DHTV/256M/field field name=manu_id_s update=seteoe changed this puppy/field /doc !-- yes, you can add more than one document at a time -- /add I'm not getting any multiple values in the _coordinate fields. However, I _do_ get the error if my dynamic *_coordinate field is set to stored=true. Did you perhaps change this at some point? Whenever I change the schema, I try to 'rm -rf solr/collection/data' just to be sure I've purged all traces of the former schema definition. Best, Erick On Mon, Aug 4, 2014 at 7:04 PM, Franco Giacosa fgiac...@gmail.com wrote: No, they are not declarad explicitly. This is how they are created: field name=latLong type=location indexed=true stored=true/ dynamicField name=*_coordinate type=tdouble indexed=true stored=false/ fieldType name=location class=solr.LatLonType subFieldSuffix=_coordinate/ 2014-08-04 22:28 GMT-03:00 Michael Ryan mr...@moreover.com: Are the latLong_0_coordinate and latLong_1_coordinate fields populated using copyField? If so, this sounds like it could be https://issues.apache.org/jira/browse/SOLR-3502. -Michael -Original Message- From: Franco Giacosa [mailto:fgiac...@gmail.com] Sent: Monday, August 04, 2014 9:05 PM To: solr-user@lucene.apache.org Subject: solr update dynamic field generates multiValued error Hello everyone, this is my first time posting a question, so forgive me if i'm missing something. This is my problem: I have a schema.xml that has the following latLong information The dynamicField generates 2 dynamic fields that have the lat and the long (latLong_0_coordinate and latLong_1_coordinate) So for example a document will have latLong_0_coordinate: 40.4114, latLong_1_coordinate: -74.1031, latLong: 40.4114,-74.1031, Now when I try to update a document (i don't update the latLong field. I just update other parts of the document using atomic update) solr re-creates the dynamicField and adds the same value again, like its using add instead of set. So when i do an update the fields of the doc look like this latLong_0_coordinate: [40.4114,40.4114] latLong_1_coordinate: [-74.1031,-74.1031] latLong: 40.4114,-74.1031, So the dynamicFields now have 2 values, so the next time that I want to update the document a schema error is throw because im trying to store a collection into a none multivalued field. Thanks in advanced.
Re: Paging bug in ReRankingQParserPlugin?
Thanks, Joel. I created SOLR-6323. On Tue, Aug 5, 2014 at 10:38 AM, Joel Bernstein joels...@gmail.com wrote: I updated the docs for now. But I agree this paging issue needs to be handled transparently. Feel free to create a jira issue for this or I can create one when I have time to start looking into it. Joel Bernstein Search Engineer at Heliosearch On Tue, Aug 5, 2014 at 12:04 PM, Adair Kovac adairko...@gmail.com wrote: Thanks, great explanation! Yeah, if it keeps the current behavior added documentation would be great. Are there any other features that expect parameters to change as one pages? If not I'm concerned that it might be hard to support for clients that assume only the index params will change. It also makes it harder to work if we want to add re-ranking on a strict small set of results on the first page, because then we'd have to stitch together two result sets. We don't currently want to do that, though. For what it's worth, what my colleague who linked me the feature and I both assumed the behavior would be is that it would get all the results and return the ones past the re-ranking point as-is. Is that possible? Thanks, Adair On Tue, Aug 5, 2014 at 5:53 AM, Joel Bernstein joels...@gmail.com wrote: The comment in the code reads slightly different: // This enusres that reRankDocs = docs needed to satisfy the result set. reRankDocs = Math.max(start+rows, reRankDocs); I think you're right though that this is confusing. The way the ReRankingQParserPlugin works is that it grabs the top X documents (reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough to satisfy the page then the result won't have enough documents. The intended use of this was actually to stop using query re-ranking when you paged past the reRanked results. So if you re-rank the top 200 documents, you would drop the re-ranking parameter when you page to documents 201-220. So the line: reRankDocs = Math.max(start+rows, reRankDocs); Saves you from an unexpected shortfall in documents if you do page beyond the reRankDocs. At the very least the expected use should be documented and if we can figure out better behavior here that would be great. Joel Bernstein Search Engineer at Heliosearch On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac adairko...@gmail.com wrote: Looking at this line in the code: // This enusres that reRankDocs = docs needed to satisfy the result set. reRankDocs = Math.max(start+rows, reRankDocs); This looks like it would cause skips and duplicates while paging through the results, since if you exceed the reRankDocs parameter and keep finding things that match the re-ranking query, they'll get boosted earlier (skipped), thus pushing down items you already saw (causing duplicates). It's obviously intentional behavior, but there's no documentation I can see of why, if you request fewer documents to be re-ranked than you're asking to view, it goes ahead and ignores the number you asked for. What if I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better to make the client choose whether to increase the reRankDocs or leave it the same? If no one replies and I have time, I might check out 4.9 and see if I can confirm or disprove the bug, but figured I'd bring it up now in case I don't end up having time. It would be good to document the reason for this behavior if it turns out it's necessary. Thanks. I'm excited about this feature btw. --Adair
Re: solr over hdfs for accessing/ changing indexes outside solr
What you haven't told us is what you mean by modify the index outside Solr. SolrJ? Using raw Lucene? Trying to modify things by writing your own codec? Standard Java I/O operations? Other? You could use SolrJ to connect to an existing Solr server and both read and modify at will form your M/R jobs. But if you're thinking of trying to write/modify the segment files by raw I/O operations, good luck! I'm 99.99% certain that's going to cause you endless grief. Best, Erick On Tue, Aug 5, 2014 at 9:55 AM, Ali Nazemian alinazem...@gmail.com wrote: Actually I am going to do some analysis on the solr data using map reduce. For this purpose it might be needed to change some part of data or add new fields from outside solr. On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote: On 8/5/2014 7:04 AM, Ali Nazemian wrote: I changed solr 4.9 to write index and data on hdfs. Now I am going to connect to those data from the outside of solr for changing some of the values. Could somebody please tell me how that is possible? Suppose I am using Hbase over hdfs for do these changes. I don't know how you could safely modify the index without a Lucene application or another instance of Solr, but if you do manage to modify the index, simply reloading the core or restarting Solr should cause it to pick up the changes. Either you would need to make sure that Solr never modifies the index, or you would need some way of coordinating updates so that Solr and the other application would never try to modify the index at the same time. Thanks, Shawn -- A.Nazemian
Re: ExternalFileFieldReloader and commit
Hi Peter, It seems like a bug to me, too. Please file a JIRA ticket if you can so that someone can take it. Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html (2014/08/05 22:34), Peter Keegan wrote: When there are multiple 'external file field' files available, Solr will reload the last one (lexicographically) with a commit, but only if changes were made to the index. Otherwise, it skips the reload and logs: No uncommitted changes. Skipping IW.commit. Has anyone else noticed this? It seems like a bug to me. (yes, I do have firstSearcher and newSearcher event listeners in solrconfig.xml) Peter
Re: Implementing custom analyzer for multi-language stemming
On 8/5/14, 8:36 AM, Rich Cariens wrote: Of course this is extremely primitive and basic, but I think it would be possible to write a CharFilter or TokenFilter that inspects the entire TokenStream to guess the language(s), perhaps even noting where languages change. Language and position information could be tracked, the TokenStream rewound and then Tokens emitted with LanguageAttributes for downstream Token stemmers to deal with. I'm curious how you are planning to handle the languageAttribute. Would each token have this attribute denoting a span of Tokens with a language? But then how would you search English documents that includes the term die while skipping all the German documents which most likely to have die? Automatic language detection works OK for long text of regular kind of contents. But it doesn't work well with short text. What strategy would you use to deal with short text? -- TK
Re: solr over hdfs for accessing/ changing indexes outside solr
Dear Erick, Hi, Thank you for you reply. Yeah I am aware that SolrJ is my last option. I was thinking about raw I/O operation. So according to your reply probably it is not applicable somehow. What about the Lily project that Michael mentioned? Is that consider SolrJ too? Are you aware of Cloudera search? I know they provide an integrated Hadoop ecosystem. Do you know what is their suggestion? Best regards. On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson erickerick...@gmail.com wrote: What you haven't told us is what you mean by modify the index outside Solr. SolrJ? Using raw Lucene? Trying to modify things by writing your own codec? Standard Java I/O operations? Other? You could use SolrJ to connect to an existing Solr server and both read and modify at will form your M/R jobs. But if you're thinking of trying to write/modify the segment files by raw I/O operations, good luck! I'm 99.99% certain that's going to cause you endless grief. Best, Erick On Tue, Aug 5, 2014 at 9:55 AM, Ali Nazemian alinazem...@gmail.com wrote: Actually I am going to do some analysis on the solr data using map reduce. For this purpose it might be needed to change some part of data or add new fields from outside solr. On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote: On 8/5/2014 7:04 AM, Ali Nazemian wrote: I changed solr 4.9 to write index and data on hdfs. Now I am going to connect to those data from the outside of solr for changing some of the values. Could somebody please tell me how that is possible? Suppose I am using Hbase over hdfs for do these changes. I don't know how you could safely modify the index without a Lucene application or another instance of Solr, but if you do manage to modify the index, simply reloading the core or restarting Solr should cause it to pick up the changes. Either you would need to make sure that Solr never modifies the index, or you would need some way of coordinating updates so that Solr and the other application would never try to modify the index at the same time. Thanks, Shawn -- A.Nazemian -- A.Nazemian
Re: solr over hdfs for accessing/ changing indexes outside solr
Dear Erick, I remembered some times ago, somebody asked about what is the point of modify Solr to use HDFS for storing indexes. As far as I remember somebody told him integrating Solr with HDFS has two advantages. 1) having hadoop replication and HA. 2) using indexes and Solr documents for other purposes such as Analysis. So why we go for HDFS in the case of analysis if we want to use SolrJ for this purpose? What is the point? Regards. On Wed, Aug 6, 2014 at 8:59 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear Erick, Hi, Thank you for you reply. Yeah I am aware that SolrJ is my last option. I was thinking about raw I/O operation. So according to your reply probably it is not applicable somehow. What about the Lily project that Michael mentioned? Is that consider SolrJ too? Are you aware of Cloudera search? I know they provide an integrated Hadoop ecosystem. Do you know what is their suggestion? Best regards. On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson erickerick...@gmail.com wrote: What you haven't told us is what you mean by modify the index outside Solr. SolrJ? Using raw Lucene? Trying to modify things by writing your own codec? Standard Java I/O operations? Other? You could use SolrJ to connect to an existing Solr server and both read and modify at will form your M/R jobs. But if you're thinking of trying to write/modify the segment files by raw I/O operations, good luck! I'm 99.99% certain that's going to cause you endless grief. Best, Erick On Tue, Aug 5, 2014 at 9:55 AM, Ali Nazemian alinazem...@gmail.com wrote: Actually I am going to do some analysis on the solr data using map reduce. For this purpose it might be needed to change some part of data or add new fields from outside solr. On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote: On 8/5/2014 7:04 AM, Ali Nazemian wrote: I changed solr 4.9 to write index and data on hdfs. Now I am going to connect to those data from the outside of solr for changing some of the values. Could somebody please tell me how that is possible? Suppose I am using Hbase over hdfs for do these changes. I don't know how you could safely modify the index without a Lucene application or another instance of Solr, but if you do manage to modify the index, simply reloading the core or restarting Solr should cause it to pick up the changes. Either you would need to make sure that Solr never modifies the index, or you would need some way of coordinating updates so that Solr and the other application would never try to modify the index at the same time. Thanks, Shawn -- A.Nazemian -- A.Nazemian -- A.Nazemian