issue in launching SolrCloud windows/cygwin
Here is the issue am facing issue in using the 'solr' script on Windows with cygwin terminal: $ bin/solr -e cloud bin/solr: line 16: $'\r': command not found bin/solr: line 17: $'\r': command not found bin/solr: line 46: $'\r': command not found which: no lsof in (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program Files/TortoiseSVN/bin:/cygdrive/c/Program Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts) bin/solr: line 52: $'\r': command not found bin/solr: line 87: syntax error near unexpected token `$HOME/.solr.in.sh' 'in/solr: line 87: ` $HOME/.solr.in.sh \ further $ bin/solr start -cloud -d node1 -p 8983 bin/solr: line 16: $'\r': command not found bin/solr: line 17: $'\r': command not found bin/solr: line 46: $'\r': command not found which: no lsof in (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program Files/TortoiseSVN/bin:/cygdrive/c/Program Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts) bin/solr: line 52: $'\r': command not found bin/solr: line 87: syntax error near unexpected token `$HOME/.solr.in.sh' 'in/solr: line 87: ` $HOME/.solr.in.sh \ Is there any other way I can run the SolrCloud using java -jar start.jar options?
Re: issue in launching SolrCloud windows/cygwin
Hello Anurag, the CRLF problem with Cygwin can be cured by running the scripts all through this filter: tr -d '\r' $script $script.new ; mv $script.new $script with $script holding the path of the script to be massaged. Generally, however, I would advise to use the standard scripts only for testing or demonstration purposes as you're very likely to have to change parameters or settings for your production environment, anyway. Using the latest Jetty is one such example. Best regards, --Jürgen On 19.10.2014 08:51, Anurag Sharma wrote: Here is the issue am facing issue in using the 'solr' script on Windows with cygwin terminal: $ bin/solr -e cloud bin/solr: line 16: $'\r': command not found bin/solr: line 17: $'\r': command not found bin/solr: line 46: $'\r': command not found which: no lsof in (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program Files/TortoiseSVN/bin:/cygdrive/c/Program Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts) bin/solr: line 52: $'\r': command not found bin/solr: line 87: syntax error near unexpected token `$HOME/.solr.in.sh' 'in/solr: line 87: ` $HOME/.solr.in.sh \ further $ bin/solr start -cloud -d node1 -p 8983 bin/solr: line 16: $'\r': command not found bin/solr: line 17: $'\r': command not found bin/solr: line 46: $'\r': command not found which: no lsof in (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program Files/TortoiseSVN/bin:/cygdrive/c/Program Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts) bin/solr: line 52: $'\r': command not found bin/solr: line 87: syntax error near unexpected token `$HOME/.solr.in.sh' 'in/solr: line 87: ` $HOME/.solr.in.sh \ Is there any other way I can run the SolrCloud using java -jar start.jar options? -- Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С уважением *i.A. Jürgen Wagner* Head of Competence Center Intelligence Senior Cloud Consultant Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543 E-Mail: juergen.wag...@devoteam.com mailto:juergen.wag...@devoteam.com, URL: www.devoteam.de http://www.devoteam.de/ Managing Board: Jürgen Hatzipantelis (CEO) Address of Record: 64331 Weiterstadt, Germany; Commercial Register: Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
Re: issue in launching SolrCloud windows/cygwin
Hello Jurgen, Thanks a lot for yoru prompt response. It solved the CRLF problem but the script is not supported on cygwin due to severe limitations and lack of adherence to BASH standards, such as lack of lsof, curl, and ps options. I found there is a native solr.cmd script for windows which works without any issue on windows shell. This solves the problem for now. Regards, Anurag On Sun, Oct 19, 2014 at 12:39 PM, Jürgen Wagner (DVT) juergen.wag...@devoteam.com wrote: Hello Anurag, the CRLF problem with Cygwin can be cured by running the scripts all through this filter: tr -d '\r' $script $script.new ; mv $script.new $script with $script holding the path of the script to be massaged. Generally, however, I would advise to use the standard scripts only for testing or demonstration purposes as you're very likely to have to change parameters or settings for your production environment, anyway. Using the latest Jetty is one such example. Best regards, --Jürgen On 19.10.2014 08:51, Anurag Sharma wrote: Here is the issue am facing issue in using the 'solr' script on Windows with cygwin terminal: $ bin/solr -e cloud bin/solr: line 16: $'\r': command not found bin/solr: line 17: $'\r': command not found bin/solr: line 46: $'\r': command not found which: no lsof in (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program Files/TortoiseSVN/bin:/cygdrive/c/Program Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts) bin/solr: line 52: $'\r': command not found bin/solr: line 87: syntax error near unexpected token `$HOME/.solr.in.sh' 'in/solr: line 87: ` $HOME/.solr.in.sh \ further $ bin/solr start -cloud -d node1 -p 8983 bin/solr: line 16: $'\r': command not found bin/solr: line 17: $'\r': command not found bin/solr: line 46: $'\r': command not found which: no lsof in (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program Files/TortoiseSVN/bin:/cygdrive/c/Program Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts) bin/solr: line 52: $'\r': command not found bin/solr: line 87: syntax error near unexpected token `$HOME/.solr.in.sh' 'in/solr: line 87: ` $HOME/.solr.in.sh \ Is there any other way I can run the SolrCloud using java -jar start.jar options? -- Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С уважением *i.A. Jürgen Wagner* Head of Competence Center Intelligence Senior Cloud Consultant Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543 E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de -- Managing Board: Jürgen Hatzipantelis (CEO) Address of Record: 64331 Weiterstadt, Germany; Commercial Register: Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
Re: issue in launching SolrCloud windows/cygwin
Run Solr straight from the Windows cmd if CygWin isn't a requirement. For example, running java -jar start.jar from the example directory will start single instance Solr. To run SolrCloud follow the instructions in Simple Two-Shard Cluster on the Same Machine from this link http://bit.ly/1rlmYvF . @nazik_huq On Sun, Oct 19, 2014 at 2:51 AM, Anurag Sharma anura...@gmail.com wrote: Here is the issue am facing issue in using the 'solr' script on Windows with cygwin terminal: $ bin/solr -e cloud bin/solr: line 16: $'\r': command not found bin/solr: line 17: $'\r': command not found bin/solr: line 46: $'\r': command not found which: no lsof in (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program Files/TortoiseSVN/bin:/cygdrive/c/Program Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts) bin/solr: line 52: $'\r': command not found bin/solr: line 87: syntax error near unexpected token `$HOME/.solr.in.sh ' 'in/solr: line 87: ` $HOME/.solr.in.sh \ further $ bin/solr start -cloud -d node1 -p 8983 bin/solr: line 16: $'\r': command not found bin/solr: line 17: $'\r': command not found bin/solr: line 46: $'\r': command not found which: no lsof in (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program Files/TortoiseSVN/bin:/cygdrive/c/Program Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts) bin/solr: line 52: $'\r': command not found bin/solr: line 87: syntax error near unexpected token `$HOME/.solr.in.sh ' 'in/solr: line 87: ` $HOME/.solr.in.sh \ Is there any other way I can run the SolrCloud using java -jar start.jar options?
CopyField from text to multi value
Hi, I would like to copy a textual field content into a multivalue filed. For example, Let's say my field text contains: I am a solr user I would like to have a multi-value copyFields with the following content: [I, am, a, solr, user] Thanks, Tomer Levi Software Engineer Big Data Group Product Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.commailto:tomer.l...@nice.com www.nice.comhttp://www.nice.com/ [cid:image001.png@01CFEBB6.C9EC8550]http://twitter.com/NICE_Systems/[cid:image002.png@01CFEBB6.C9EC8550]http://www.facebook.com/pages/NICE-Systems/149072782602/[cid:image003.png@01CFEBB6.C9EC8550]http://www.linkedin.com/company/nice-systems[cid:image004.png@01CFEBB6.C9EC8550]http://www.nice.com/blog [cid:image005.jpg@01CFEBB6.C9EC8550]http://www.nice.com/big-data-solutions
Re: issue in launching SolrCloud windows/cygwin
Hi Nazik, Thanks for the response. The link mentioned by you is very useful. I used the windows cmd and started the cloud using solr.cmd script. The script is very rich in taking multiple options. Anurag On Sun, Oct 19, 2014 at 5:01 PM, Nazik Huq nazik...@gmail.com wrote: Run Solr straight from the Windows cmd if CygWin isn't a requirement. For example, running java -jar start.jar from the example directory will start single instance Solr. To run SolrCloud follow the instructions in Simple Two-Shard Cluster on the Same Machine from this link http://bit.ly/1rlmYvF . @nazik_huq On Sun, Oct 19, 2014 at 2:51 AM, Anurag Sharma anura...@gmail.com wrote: Here is the issue am facing issue in using the 'solr' script on Windows with cygwin terminal: $ bin/solr -e cloud bin/solr: line 16: $'\r': command not found bin/solr: line 17: $'\r': command not found bin/solr: line 46: $'\r': command not found which: no lsof in (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program Files/TortoiseSVN/bin:/cygdrive/c/Program Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts) bin/solr: line 52: $'\r': command not found bin/solr: line 87: syntax error near unexpected token `$HOME/. solr.in.sh ' 'in/solr: line 87: ` $HOME/.solr.in.sh \ further $ bin/solr start -cloud -d node1 -p 8983 bin/solr: line 16: $'\r': command not found bin/solr: line 17: $'\r': command not found bin/solr: line 46: $'\r': command not found which: no lsof in (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program Files/TortoiseSVN/bin:/cygdrive/c/Program Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts) bin/solr: line 52: $'\r': command not found bin/solr: line 87: syntax error near unexpected token `$HOME/. solr.in.sh ' 'in/solr: line 87: ` $HOME/.solr.in.sh \ Is there any other way I can run the SolrCloud using java -jar start.jar options?
Re: CopyField from text to multi value
Not quite sure what you're asking here. If you do a copyField, the raw input is, well, copied to the destination field and _then_ the analysis chain is applied. Which seems to be what you want, the destination field would be a text-based field, perhaps text_general or some such from the distro. And perhaps there;s some confusion about what multiValued means here. It does _not_ mean tokenized, i.e. broken up into words. non-multiValued fields can be tokenized. multiValued means tha tmore than one entry for the field can be in a doc. I.e. (using the XML form of an input doc as an example) add doc field name=multisome text/field field name=multiand now for something completely different/field /doc /add will succeed with a field defined as multiValued=true, but fail with something with multiValued=false. In either case, though, whether the input was broken up into multiple, independently-searchable tokens (words) is orthogonal to whether it's multiValued or not, and is entirely dependent on the analysis chain in the fieldType for the field in question. Best, Erick On Sun, Oct 19, 2014 at 9:07 AM, Tomer Levi tomer.l...@nice.com wrote: Hi, I would like to copy a textual field content into a multivalue filed. For example, Let’s say my field text contains: *“I am a solr user”* I would like to have a multi-value copyFields with the following content*: [“I”, “am”, “a”, “solr”, “user”]* *Thanks,* *Tomer Levi* *Software Engineer * *Big Data Group* *Product Technology Unit* (T) +972 (9) 775-2693 tomer.l...@nice.com www.nice.com [image: http://tlvbiztalk03/SignatureMaker/img/newsocial_03.png] http://twitter.com/NICE_Systems/[image: http://tlvbiztalk03/SignatureMaker/img/newsocial_04.png] http://www.facebook.com/pages/NICE-Systems/149072782602/[image: http://tlvbiztalk03/SignatureMaker/img/newsocial_05.png] http://www.linkedin.com/company/nice-systems[image: http://tlvbiztalk03/SignatureMaker/img/newsocial_06.png] http://www.nice.com/blog [image: http://tlvbiztalk03/SignatureMaker/img/banner_BIG-DATA.jpg] http://www.nice.com/big-data-solutions
RE: CopyField from text to multi value
Hi Erick, Thanks for the explanation, I understand that the analysis chain is applied after the raw input was copied. I need to store the output of the analysis chain as a new multi-value field, and I think that ShingleFilterFactory might do that, isn’t it? Tomer -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, October 19, 2014 4:31 PM To: solr-user@lucene.apache.org Subject: Re: CopyField from text to multi value Not quite sure what you're asking here. If you do a copyField, the raw input is, well, copied to the destination field and _then_ the analysis chain is applied. Which seems to be what you want, the destination field would be a text-based field, perhaps text_general or some such from the distro. And perhaps there;s some confusion about what multiValued means here. It does _not_ mean tokenized, i.e. broken up into words. non-multiValued fields can be tokenized. multiValued means tha tmore than one entry for the field can be in a doc. I.e. (using the XML form of an input doc as an example) add doc field name=multisome text/field field name=multiand now for something completely different/field /doc /add will succeed with a field defined as multiValued=true, but fail with something with multiValued=false. In either case, though, whether the input was broken up into multiple, independently-searchable tokens (words) is orthogonal to whether it's multiValued or not, and is entirely dependent on the analysis chain in the fieldType for the field in question. Best, Erick On Sun, Oct 19, 2014 at 9:07 AM, Tomer Levi tomer.l...@nice.com wrote: Hi, I would like to copy a textual field content into a multivalue filed. For example, Let’s say my field text contains: *“I am a solr user”* I would like to have a multi-value copyFields with the following content*: [“I”, “am”, “a”, “solr”, “user”]* *Thanks,* *Tomer Levi* *Software Engineer * *Big Data Group* *Product Technology Unit* (T) +972 (9) 775-2693 tomer.l...@nice.com www.nice.com [image: http://tlvbiztalk03/SignatureMaker/img/newsocial_03.png] http://twitter.com/NICE_Systems/[image: http://tlvbiztalk03/SignatureMaker/img/newsocial_04.png] http://www.facebook.com/pages/NICE-Systems/149072782602/[image: http://tlvbiztalk03/SignatureMaker/img/newsocial_05.png] http://www.linkedin.com/company/nice-systems[image: http://tlvbiztalk03/SignatureMaker/img/newsocial_06.png] http://www.nice.com/blog [image: http://tlvbiztalk03/SignatureMaker/img/banner_BIG-DATA.jpg] http://www.nice.com/big-data-solutions
Re: CopyField from text to multi value
As always, you need to first examine how you intend to query the fields before you dive into data modeling. In this case, is there any particular reason that you need the individual terms as separate values, as opposed to simply using a tokenized text field? -- Jack Krupansky From: Tomer Levi Sent: Sunday, October 19, 2014 9:07 AM To: solr-user@lucene.apache.org Subject: CopyField from text to multi value Hi, I would like to copy a textual field content into a multivalue filed. For example, Let’s say my field text contains: “I am a solr user” I would like to have a multi-value copyFields with the following content: [“I”, “am”, “a”, “solr”, “user”] Thanks, Tomer Levi Software Engineer Big Data Group Product Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.com www.nice.com
Re: CopyField from text to multi value
This really feels like an XY problem, which I think Jack is alluding to. bq: I understand that the analysis chain is applied after the raw input was copied. I need to store the output of the analysis chain as a new multi-value field This statement is really confusing. You can't have the output of the analysis chain used as input to a copyField, it just doesn't work that way which is what you seem to want to do with the second sentence. Then you bring shingles into the picture... So let's take Jack's suggestion and back up and tell us what the use-case you're trying to support is rather than leaving us to guess what problem you're trying to solve.. Best, Erick On Sun, Oct 19, 2014 at 9:43 AM, Jack Krupansky j...@basetechnology.com wrote: As always, you need to first examine how you intend to query the fields before you dive into data modeling. In this case, is there any particular reason that you need the individual terms as separate values, as opposed to simply using a tokenized text field? -- Jack Krupansky From: Tomer Levi Sent: Sunday, October 19, 2014 9:07 AM To: solr-user@lucene.apache.org Subject: CopyField from text to multi value Hi, I would like to copy a textual field content into a multivalue filed. For example, Let’s say my field text contains: “I am a solr user” I would like to have a multi-value copyFields with the following content: [“I”, “am”, “a”, “solr”, “user”] Thanks, Tomer Levi Software Engineer Big Data Group Product Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.com www.nice.com
RE: CopyField from text to multi value
Thanks again for the help. The use case is this. In my UI I would like to indicate which words leaded to every document in the response. It actually seems like a simple highlight case but instead of getting the highlight result as this is a brlong/br string brwith/br text, Our UI team wants a list of words, i.e:[long, with]. So, I assumed that I can just tokenize the original text - copy the tokens into new multi-value fields - ask Solr to highlight the multi-value field That is my use case. Thanks again Tomer -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, October 19, 2014 5:18 PM To: solr-user@lucene.apache.org Subject: Re: CopyField from text to multi value This really feels like an XY problem, which I think Jack is alluding to. bq: I understand that the analysis chain is applied after the raw input was copied. I need to store the output of the analysis chain as a new multi-value field This statement is really confusing. You can't have the output of the analysis chain used as input to a copyField, it just doesn't work that way which is what you seem to want to do with the second sentence. Then you bring shingles into the picture... So let's take Jack's suggestion and back up and tell us what the use-case you're trying to support is rather than leaving us to guess what problem you're trying to solve.. Best, Erick On Sun, Oct 19, 2014 at 9:43 AM, Jack Krupansky j...@basetechnology.commailto:j...@basetechnology.com wrote: As always, you need to first examine how you intend to query the fields before you dive into data modeling. In this case, is there any particular reason that you need the individual terms as separate values, as opposed to simply using a tokenized text field? -- Jack Krupansky From: Tomer Levi Sent: Sunday, October 19, 2014 9:07 AM To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org Subject: CopyField from text to multi value Hi, I would like to copy a textual field content into a multivalue filed. For example, Let’s say my field text contains: “I am a solr user” I would like to have a multi-value copyFields with the following content: [“I”, “am”, “a”, “solr”, “user”] Thanks, Tomer Levi Software Engineer Big Data Group Product Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.commailto:tomer.l...@nice.com www.nice.comhttp://www.nice.com
Query parsing - difference between Analysis and parsedquery_toString output
Hi, I use Solr 4.9 and imported about 20K documents from CSV data. In schema there is following definition for text_general field which I want to process by tokenization, stop word removal, stemming. fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.StopFilterFactory ignoreCase=true enablePositionIncrements=true / filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Using Solr Admin Analysis for that field type I see that both index and query value proceed as expected: Hershey's - *hershey*, The Hershey's Company - the *hershey* compani I was expected the same processing for select query, but it seems doesn't happen and no result found in below example: q: manufacture_t:The Hershey Company^100 OR title_t:The Hershey Company^1000 parsedquery_toString: manufacture_t:the text:Hershey text:Company^100.0 title_t:the text:Hershey text:Company^1000.0, indexed document: docs: [ { id: 00010700501806, description_t: [ Hershey's Whoppers Carton - 12 Pack ], title_t: [ Whoppers Carton - 12 Pack ], manufacture_t: [ Hershey's ], What do I miss? Thanks in advance, Tanya -- View this message in context: http://lucene.472066.n3.nabble.com/Query-parsing-difference-between-Analysis-and-parsedquery-toString-output-tp4164851.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: CopyField from text to multi value
I think that info is available with termvectors. That should give a list of the query terms that matched each document, if I understand it correctly. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Oct 19, 2014, at 7:37 AM, Tomer Levi tomer.l...@nice.com wrote: Thanks again for the help. The use case is this. In my UI I would like to indicate which words leaded to every document in the response. It actually seems like a simple highlight case but instead of getting the highlight result as this is a brlong/br string brwith/br text, Our UI team wants a list of words, i.e:[long, with]. So, I assumed that I can just tokenize the original text - copy the tokens into new multi-value fields - ask Solr to highlight the multi-value field That is my use case. Thanks again Tomer -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, October 19, 2014 5:18 PM To: solr-user@lucene.apache.org Subject: Re: CopyField from text to multi value This really feels like an XY problem, which I think Jack is alluding to. bq: I understand that the analysis chain is applied after the raw input was copied. I need to store the output of the analysis chain as a new multi-value field This statement is really confusing. You can't have the output of the analysis chain used as input to a copyField, it just doesn't work that way which is what you seem to want to do with the second sentence. Then you bring shingles into the picture... So let's take Jack's suggestion and back up and tell us what the use-case you're trying to support is rather than leaving us to guess what problem you're trying to solve.. Best, Erick On Sun, Oct 19, 2014 at 9:43 AM, Jack Krupansky j...@basetechnology.commailto:j...@basetechnology.com wrote: As always, you need to first examine how you intend to query the fields before you dive into data modeling. In this case, is there any particular reason that you need the individual terms as separate values, as opposed to simply using a tokenized text field? -- Jack Krupansky From: Tomer Levi Sent: Sunday, October 19, 2014 9:07 AM To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org Subject: CopyField from text to multi value Hi, I would like to copy a textual field content into a multivalue filed. For example, Let’s say my field text contains: “I am a solr user” I would like to have a multi-value copyFields with the following content: [“I”, “am”, “a”, “solr”, “user”] Thanks, Tomer Levi Software Engineer Big Data Group Product Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.commailto:tomer.l...@nice.com www.nice.comhttp://www.nice.com
Re: Query parsing - difference between Analysis and parsedquery_toString output
This trips _everybody_ up. Analysis doesn't happen until things get through the query parser. So, let's assume your query is q=manufacture_t:The Hershey Company^100 OR title_t:The Hershey Company^1000 The problem is that the query _parser_ doesn't understand that your intent is that the hershey company be evaluated against the manuracture_t field, and the title_t field. All it sees is manufacture_t:the then, as a naked token, hershey and company. So, it does the best it can and assumes that hershey and company should be evaluated against your default text field, in this case text. You have two choices here: 1 form your query like maufacture_t:The Hershey Company,or manufacture_t:(The Hershey Company). The first form requires that the words The, Hershey, and Company appear in sequence, and the second form just requires that all three appear in somewhere in the field in any order. Actually, the second form requires that only one of the terms appears in the field assuming your default q.op is OR. If you require all three either define the default operator to be AND or enter it as manuracture_t:(The AND Hershey AND company). Best, Erick On Sun, Oct 19, 2014 at 4:49 PM, tinush tanya.karpin...@gmail.com wrote: Hi, I use Solr 4.9 and imported about 20K documents from CSV data. In schema there is following definition for text_general field which I want to process by tokenization, stop word removal, stemming. fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.StopFilterFactory ignoreCase=true enablePositionIncrements=true / filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Using Solr Admin Analysis for that field type I see that both index and query value proceed as expected: Hershey's - *hershey*, The Hershey's Company - the *hershey* compani I was expected the same processing for select query, but it seems doesn't happen and no result found in below example: q: manufacture_t:The Hershey Company^100 OR title_t:The Hershey Company^1000 parsedquery_toString: manufacture_t:the text:Hershey text:Company^100.0 title_t:the text:Hershey text:Company^1000.0, indexed document: docs: [ { id: 00010700501806, description_t: [ Hershey's Whoppers Carton - 12 Pack ], title_t: [ Whoppers Carton - 12 Pack ], manufacture_t: [ Hershey's ], What do I miss? Thanks in advance, Tanya -- View this message in context: http://lucene.472066.n3.nabble.com/Query-parsing-difference-between-Analysis-and-parsedquery-toString-output-tp4164851.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Recovering from Out of Mem
I assume you will have to write a script to restart the service as well? On Fri, Oct 17, 2014 at 7:17 PM, Tim Potter tim.pot...@lucidworks.com wrote: You'd still want to kill it ... so you'll need to register a cmd script with the JVM using -XX:OnOutOfMemoryError=kill.cmd and then you could either 1) trap the PID at startup using something like: title SolrCloud for /F tokens=2 delims= %%A in ('TASKLIST /FI ^WINDOWTITLE eq SolrCloud^ /NH') do ( set /A SOLR_PID=%%A echo !SOLR_PID!solr.pid or 2) if you keep track of the port (which all my Windows scripts do), then you can do: For /f tokens=5 %%j in ('netstat -aon ^| find /i listening ^| find :%SOLR_PORT%') do ( taskkill /t /f /pid %%j nul 21 ) On Fri, Oct 17, 2014 at 1:11 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: I know this might sound weird but any easy way to do it in Windows? On Tue, Oct 14, 2014 at 7:51 PM, Boogie Shafer boogie.sha...@proquest.com wrote: yago, you can put more complex restart logic as shown in the examples below or just do something similar to the java_oom.sh i posted earlier where you just spit out an email alert and deal with service restarts and troubleshooting manually e.g. something like the following for a java_error.sh will drop an email with a timestamp echo `date` | mail -s Java Error: General - $HOSTNAME not...@domain.com From: Tim Potter tim.pot...@lucidworks.com Sent: Tuesday, October 14, 2014 07:35 To: solr-user@lucene.apache.org Subject: Re: Recovering from Out of Mem jfyi - the bin/solr script does the following: -XX:OnOutOfMemoryError=$SOLR_TIP/bin/oom_solr.sh $SOLR_PORT where $SOLR_PORT is the port Solr is bound to, e.g. 8983 The oom_solr.sh script looks like: SOLR_PORT=$1 SOLR_PID=`ps waux | grep start.jar | grep $SOLR_PORT | grep -v grep | awk '{print $2}' | sort -r` if [ $SOLR_PID == ]; then echo Couldn't find Solr process running on port $SOLR_PORT! exit fi NOW=$(date +%F%T) ( echo Running OOM killer script for process $SOLR_PID for Solr on port $SOLR_PORT kill -9 $SOLR_PID echo Killed process $SOLR_PID ) | tee solr_oom_killer-$SOLR_PORT-$NOW.log I usually run Solr behind a supervisor type process (supervisord or upstart) that will restart it if the process dies. On Tue, Oct 14, 2014 at 8:09 AM, Markus Jelsma mar...@openindex.io wrote: This will do: kill -9 `ps aux | grep -v grep | grep tomcat6 | awk '{print $2}'` pkill should also work On Tuesday 14 October 2014 07:02:03 Yago Riveiro wrote: Boogie, Any example for java_error.sh script? — /Yago Riveiro On Tue, Oct 14, 2014 at 2:48 PM, Boogie Shafer boogie.sha...@proquest.com wrote: a really simple approach is to have the OOM generate an email e.g. 1) create a simple script (call it java_oom.sh) and drop it in your tomcat bin dir echo `date` | mail -s Java Error: OutOfMemory - $HOSTNAME not...@domain.com 2) configure your java options (in setenv.sh or similar) to trigger heap dump and the email script when OOM occurs # config error behaviors CATALINA_OPTS=$CATALINA_OPTS -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$TOMCAT_DIR/temp/tomcat-dump.hprof -XX:OnError=$TOMCAT_DIR/bin/java_error.sh -XX:OnOutOfMemoryError=$TOMCAT_DIR/bin/java_oom.sh -XX:ErrorFile=$TOMCAT_DIR/temp/java_error%p.log From: Mark Miller markrmil...@gmail.com Sent: Tuesday, October 14, 2014 06:30 To: solr-user@lucene.apache.org Subject: Re: Recovering from Out of Mem Best is to pass the Java cmd line option that kills the process on OOM and setup a supervisor on the process to restart it. You need a somewhat recent release for this to work properly though. - Mark On Oct 14, 2014, at 9:06 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: I know there are some suggestions to avoid OOM issue e.g. setting appropriate Max Heap size etc. However, what's the best way to recover from it as it goes into non-responding state? We are using Tomcat on back end. The scenario is that once we face OOM issue it keeps on taking queries (doesn't give any error) but they just time out. So even though we have a fail over system implemented but we don't have a way to distinguish if these are real time out queries OR due to OOM. -- Regards, Salman Akram -- Regards, Salman Akram -- Regards, Salman Akram
Re: Recovering from Out of Mem
You can create a script to ping on Solr every 10 sec. if no response, then restart it (Kill process id and run Solr again). This is the fastest and easiest way to do that on windows. -- View this message in context: http://lucene.472066.n3.nabble.com/Recovering-from-Out-of-Mem-tp4164167p4164882.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to properly use Levenstein distance with ~ in Java
You can use Levenstein Distance algorithm inside solr without writing code by specifing the source of terms in solrconfig.xml searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=classnamesolr.IndexBasedSpellChecker/str str name=spellcheckIndexDir./spellchecker/str str name=fieldcontent/str str name=buildOnCommittrue/str /lst /searchComponent This example shows the results of a simple query that defines a query using the spellcheck.q parameter. The query also includes a spellcheck.build=true parameter, which is needs to be called only once in order to build the index. spellcheck.build should not be specified with for each request. http://localhost:8983/solr/spellCheckCompRH?q=*:*spellcheck.q=hell%20ultrasharspellcheck=truespellcheck.build=true lst name=spellcheck lst name=suggestions lst name=hell int name=numFound1/int int name=startOffset0/int int name=endOffset4/int arr name=suggestion strdell/str /arr /lst lst name=ultrashar int name=numFound1/int int name=startOffset5/int int name=endOffset14/int arr name=suggestion strultrasharp/str /arr /lst /lst /lst Once the suggestions are collected, they are ranked by the configured distance measure (Levenstein Distance by default) and then by aggregate frequency. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-tp4164793p4164883.html Sent from the Solr - User mailing list archive at Nabble.com.