issue in launching SolrCloud windows/cygwin

2014-10-19 Thread Anurag Sharma
Here is the issue am facing issue in using the 'solr' script on Windows
with cygwin terminal:

$ bin/solr -e cloud
bin/solr: line 16: $'\r': command not found
bin/solr: line 17: $'\r': command not found
bin/solr: line 46: $'\r': command not found
which: no lsof in
(/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program
Files/TortoiseSVN/bin:/cygdrive/c/Program
Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program
Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files
(x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts)
bin/solr: line 52: $'\r': command not found
bin/solr: line 87: syntax error near unexpected token `$HOME/.solr.in.sh'
'in/solr: line 87: `   $HOME/.solr.in.sh \


further
$ bin/solr start -cloud -d node1 -p 8983
bin/solr: line 16: $'\r': command not found
bin/solr: line 17: $'\r': command not found
bin/solr: line 46: $'\r': command not found
which: no lsof in
(/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program
Files/TortoiseSVN/bin:/cygdrive/c/Program
Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program
Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files
(x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts)
bin/solr: line 52: $'\r': command not found
bin/solr: line 87: syntax error near unexpected token `$HOME/.solr.in.sh'
'in/solr: line 87: `   $HOME/.solr.in.sh \

Is there any other way I can run the SolrCloud using java -jar start.jar
options?


Re: issue in launching SolrCloud windows/cygwin

2014-10-19 Thread Jürgen Wagner (DVT)
Hello Anurag,
  the CRLF problem with Cygwin can be cured by running the scripts all
through this filter:

tr -d '\r'  $script  $script.new ; mv $script.new $script

with $script holding the path of the script to be massaged.

Generally, however, I would advise to use the standard scripts only for
testing or demonstration purposes as you're very likely to have to
change parameters or settings for your production environment, anyway.
Using the latest Jetty is one such example.

Best regards,
--Jürgen

On 19.10.2014 08:51, Anurag Sharma wrote:
 Here is the issue am facing issue in using the 'solr' script on Windows
 with cygwin terminal:

 $ bin/solr -e cloud
 bin/solr: line 16: $'\r': command not found
 bin/solr: line 17: $'\r': command not found
 bin/solr: line 46: $'\r': command not found
 which: no lsof in
 (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program
 Files/TortoiseSVN/bin:/cygdrive/c/Program
 Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program
 Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files
 (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts)
 bin/solr: line 52: $'\r': command not found
 bin/solr: line 87: syntax error near unexpected token `$HOME/.solr.in.sh'
 'in/solr: line 87: `   $HOME/.solr.in.sh \


 further
 $ bin/solr start -cloud -d node1 -p 8983
 bin/solr: line 16: $'\r': command not found
 bin/solr: line 17: $'\r': command not found
 bin/solr: line 46: $'\r': command not found
 which: no lsof in
 (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program
 Files/TortoiseSVN/bin:/cygdrive/c/Program
 Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program
 Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files
 (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts)
 bin/solr: line 52: $'\r': command not found
 bin/solr: line 87: syntax error near unexpected token `$HOME/.solr.in.sh'
 'in/solr: line 87: `   $HOME/.solr.in.sh \

 Is there any other way I can run the SolrCloud using java -jar start.jar
 options?



-- 

Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
уважением
*i.A. Jürgen Wagner*
Head of Competence Center Intelligence
 Senior Cloud Consultant

Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
E-Mail: juergen.wag...@devoteam.com
mailto:juergen.wag...@devoteam.com, URL: www.devoteam.de
http://www.devoteam.de/


Managing Board: Jürgen Hatzipantelis (CEO)
Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071




Re: issue in launching SolrCloud windows/cygwin

2014-10-19 Thread Anurag Sharma
Hello Jurgen,

Thanks a lot for yoru prompt response.

It solved the CRLF problem but the script is not supported on cygwin due to
severe limitations and lack of adherence to BASH standards, such as lack of
lsof, curl, and ps options.

I found there is a native solr.cmd script for windows which works without
any issue on windows shell. This solves the problem for now.

Regards,
Anurag



On Sun, Oct 19, 2014 at 12:39 PM, Jürgen Wagner (DVT) 
juergen.wag...@devoteam.com wrote:

  Hello Anurag,
   the CRLF problem with Cygwin can be cured by running the scripts all
 through this filter:

 tr -d '\r'  $script  $script.new ; mv $script.new $script

 with $script holding the path of the script to be massaged.

 Generally, however, I would advise to use the standard scripts only for
 testing or demonstration purposes as you're very likely to have to change
 parameters or settings for your production environment, anyway. Using the
 latest Jetty is one such example.

 Best regards,
 --Jürgen


 On 19.10.2014 08:51, Anurag Sharma wrote:

 Here is the issue am facing issue in using the 'solr' script on Windows
 with cygwin terminal:

 $ bin/solr -e cloud
 bin/solr: line 16: $'\r': command not found
 bin/solr: line 17: $'\r': command not found
 bin/solr: line 46: $'\r': command not found
 which: no lsof in
 (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program
 Files/TortoiseSVN/bin:/cygdrive/c/Program
 Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program
 Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files
 (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts)
 bin/solr: line 52: $'\r': command not found
 bin/solr: line 87: syntax error near unexpected token `$HOME/.solr.in.sh'
 'in/solr: line 87: `   $HOME/.solr.in.sh \


 further
 $ bin/solr start -cloud -d node1 -p 8983
 bin/solr: line 16: $'\r': command not found
 bin/solr: line 17: $'\r': command not found
 bin/solr: line 46: $'\r': command not found
 which: no lsof in
 (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program
 Files/TortoiseSVN/bin:/cygdrive/c/Program
 Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program
 Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files
 (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts)
 bin/solr: line 52: $'\r': command not found
 bin/solr: line 87: syntax error near unexpected token `$HOME/.solr.in.sh'
 'in/solr: line 87: `   $HOME/.solr.in.sh \

 Is there any other way I can run the SolrCloud using java -jar start.jar
 options?




 --

 Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
 уважением
 *i.A. Jürgen Wagner*
 Head of Competence Center Intelligence
  Senior Cloud Consultant

 Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
 Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
 E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
 --
 Managing Board: Jürgen Hatzipantelis (CEO)
 Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
 Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071





Re: issue in launching SolrCloud windows/cygwin

2014-10-19 Thread Nazik Huq
Run Solr straight from the Windows cmd if CygWin isn't a requirement. For
example, running java -jar start.jar from the example directory will
start single instance Solr.

To run SolrCloud follow the instructions in Simple Two-Shard Cluster on
the Same Machine from this link http://bit.ly/1rlmYvF .

@nazik_huq

On Sun, Oct 19, 2014 at 2:51 AM, Anurag Sharma anura...@gmail.com wrote:

 Here is the issue am facing issue in using the 'solr' script on Windows
 with cygwin terminal:

 $ bin/solr -e cloud
 bin/solr: line 16: $'\r': command not found
 bin/solr: line 17: $'\r': command not found
 bin/solr: line 46: $'\r': command not found
 which: no lsof in

 (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program
 Files/TortoiseSVN/bin:/cygdrive/c/Program
 Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program
 Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files
 (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts)
 bin/solr: line 52: $'\r': command not found
 bin/solr: line 87: syntax error near unexpected token `$HOME/.solr.in.sh
 '
 'in/solr: line 87: `   $HOME/.solr.in.sh \


 further
 $ bin/solr start -cloud -d node1 -p 8983
 bin/solr: line 16: $'\r': command not found
 bin/solr: line 17: $'\r': command not found
 bin/solr: line 46: $'\r': command not found
 which: no lsof in

 (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program
 Files/TortoiseSVN/bin:/cygdrive/c/Program
 Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program
 Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files
 (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts)
 bin/solr: line 52: $'\r': command not found
 bin/solr: line 87: syntax error near unexpected token `$HOME/.solr.in.sh
 '
 'in/solr: line 87: `   $HOME/.solr.in.sh \

 Is there any other way I can run the SolrCloud using java -jar start.jar
 options?



CopyField from text to multi value

2014-10-19 Thread Tomer Levi
Hi,
I would like to copy a textual field content into a multivalue filed.
For example,
Let's say my field text contains: I am a solr user
I would like to have a multi-value copyFields with the following content: [I, 
am, a, solr, user]

Thanks,
Tomer Levi

Software Engineer
Big Data Group

Product  Technology Unit

(T) +972 (9) 775-2693



tomer.l...@nice.commailto:tomer.l...@nice.com

www.nice.comhttp://www.nice.com/

[cid:image001.png@01CFEBB6.C9EC8550]http://twitter.com/NICE_Systems/[cid:image002.png@01CFEBB6.C9EC8550]http://www.facebook.com/pages/NICE-Systems/149072782602/[cid:image003.png@01CFEBB6.C9EC8550]http://www.linkedin.com/company/nice-systems[cid:image004.png@01CFEBB6.C9EC8550]http://www.nice.com/blog




[cid:image005.jpg@01CFEBB6.C9EC8550]http://www.nice.com/big-data-solutions





Re: issue in launching SolrCloud windows/cygwin

2014-10-19 Thread Anurag Sharma
Hi Nazik,

Thanks for the response. The link mentioned by you is very useful.  I used
the windows cmd and started the cloud using solr.cmd script. The script is
very rich in taking multiple options.

Anurag

On Sun, Oct 19, 2014 at 5:01 PM, Nazik Huq nazik...@gmail.com wrote:

 Run Solr straight from the Windows cmd if CygWin isn't a requirement. For
 example, running java -jar start.jar from the example directory will
 start single instance Solr.

 To run SolrCloud follow the instructions in Simple Two-Shard Cluster on
 the Same Machine from this link http://bit.ly/1rlmYvF .

 @nazik_huq

 On Sun, Oct 19, 2014 at 2:51 AM, Anurag Sharma anura...@gmail.com wrote:

  Here is the issue am facing issue in using the 'solr' script on Windows
  with cygwin terminal:
 
  $ bin/solr -e cloud
  bin/solr: line 16: $'\r': command not found
  bin/solr: line 17: $'\r': command not found
  bin/solr: line 46: $'\r': command not found
  which: no lsof in
 
 
 (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program
  Files/TortoiseSVN/bin:/cygdrive/c/Program
  Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program
  Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files
  (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts)
  bin/solr: line 52: $'\r': command not found
  bin/solr: line 87: syntax error near unexpected token `$HOME/.
 solr.in.sh
  '
  'in/solr: line 87: `   $HOME/.solr.in.sh \
 
 
  further
  $ bin/solr start -cloud -d node1 -p 8983
  bin/solr: line 16: $'\r': command not found
  bin/solr: line 17: $'\r': command not found
  bin/solr: line 46: $'\r': command not found
  which: no lsof in
 
 
 (/usr/local/bin:/usr/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program
  Files/TortoiseSVN/bin:/cygdrive/c/Program
  Files/Java/jdk1.7.0_51/bin:/cygdrive/c/Program
  Files/apache-ant-1.9.3/bin:/cygdrive/c/Program Files
  (x86)/Python-27:/cygdrive/c/Program Files (x86)/Python-27/Scripts)
  bin/solr: line 52: $'\r': command not found
  bin/solr: line 87: syntax error near unexpected token `$HOME/.
 solr.in.sh
  '
  'in/solr: line 87: `   $HOME/.solr.in.sh \
 
  Is there any other way I can run the SolrCloud using java -jar
 start.jar
  options?
 



Re: CopyField from text to multi value

2014-10-19 Thread Erick Erickson
Not quite sure what you're asking here. If you do a copyField, the raw
input is, well, copied to the destination field and _then_ the analysis
chain is applied. Which seems to be what you want, the destination field
would be a text-based field, perhaps text_general or some such from the
distro.

And perhaps there;s some confusion about what multiValued means here. It
does _not_ mean tokenized, i.e. broken up into words. non-multiValued
fields can be tokenized.

multiValued means tha tmore than one entry for the field can be in a doc.
I.e. (using the XML form of an input doc as an example)

add
  doc
  field name=multisome text/field
  field name=multiand now for something completely different/field
 /doc
/add

will succeed with a field defined as multiValued=true, but fail with
something with multiValued=false.

In either case, though, whether the input was broken up into multiple,
independently-searchable tokens (words) is orthogonal to whether it's
multiValued or not, and is entirely dependent on the analysis chain in the
fieldType for the field in question.

Best,
Erick

On Sun, Oct 19, 2014 at 9:07 AM, Tomer Levi tomer.l...@nice.com wrote:

 Hi,

 I would like to copy a textual field content into a multivalue filed.

 For example,

 Let’s say my field text contains: *“I am a solr user”*

 I would like to have a multi-value copyFields with the following content*:
 [“I”, “am”, “a”, “solr”, “user”]*



 *Thanks,*

 *Tomer Levi*

 *Software Engineer  *

 *Big Data Group*

 *Product  Technology Unit*

 (T) +972 (9) 775-2693



 tomer.l...@nice.com

 www.nice.com

 [image: http://tlvbiztalk03/SignatureMaker/img/newsocial_03.png]
 http://twitter.com/NICE_Systems/[image:
 http://tlvbiztalk03/SignatureMaker/img/newsocial_04.png]
 http://www.facebook.com/pages/NICE-Systems/149072782602/[image:
 http://tlvbiztalk03/SignatureMaker/img/newsocial_05.png]
 http://www.linkedin.com/company/nice-systems[image:
 http://tlvbiztalk03/SignatureMaker/img/newsocial_06.png]
 http://www.nice.com/blog



 [image: http://tlvbiztalk03/SignatureMaker/img/banner_BIG-DATA.jpg]
 http://www.nice.com/big-data-solutions







RE: CopyField from text to multi value

2014-10-19 Thread Tomer Levi

Hi Erick,
Thanks for the explanation, I understand that the analysis chain is applied 
after the raw input was copied.
I need to store the output of the analysis chain as a new multi-value field, 
and I think that ShingleFilterFactory might do that, isn’t it?

Tomer

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Sunday, October 19, 2014 4:31 PM
To: solr-user@lucene.apache.org
Subject: Re: CopyField from text to multi value

Not quite sure what you're asking here. If you do a copyField, the raw input 
is, well, copied to the destination field and _then_ the analysis chain is 
applied. Which seems to be what you want, the destination field would be a 
text-based field, perhaps text_general or some such from the distro.

And perhaps there;s some confusion about what multiValued means here. It does 
_not_ mean tokenized, i.e. broken up into words. non-multiValued fields can 
be tokenized.

multiValued means tha tmore than one entry for the field can be in a doc.
I.e. (using the XML form of an input doc as an example)

add
  doc
  field name=multisome text/field
  field name=multiand now for something completely different/field  
/doc /add

will succeed with a field defined as multiValued=true, but fail with 
something with multiValued=false.

In either case, though, whether the input was broken up into multiple, 
independently-searchable tokens (words) is orthogonal to whether it's 
multiValued or not, and is entirely dependent on the analysis chain in the 
fieldType for the field in question.

Best,
Erick

On Sun, Oct 19, 2014 at 9:07 AM, Tomer Levi tomer.l...@nice.com wrote:

 Hi,

 I would like to copy a textual field content into a multivalue filed.

 For example,

 Let’s say my field text contains: *“I am a solr user”*

 I would like to have a multi-value copyFields with the following content*:
 [“I”, “am”, “a”, “solr”, “user”]*



 *Thanks,*

 *Tomer Levi*

 *Software Engineer  *

 *Big Data Group*

 *Product  Technology Unit*

 (T) +972 (9) 775-2693



 tomer.l...@nice.com

 www.nice.com

 [image: http://tlvbiztalk03/SignatureMaker/img/newsocial_03.png]
 http://twitter.com/NICE_Systems/[image:
 http://tlvbiztalk03/SignatureMaker/img/newsocial_04.png]
 http://www.facebook.com/pages/NICE-Systems/149072782602/[image:
 http://tlvbiztalk03/SignatureMaker/img/newsocial_05.png]
 http://www.linkedin.com/company/nice-systems[image:
 http://tlvbiztalk03/SignatureMaker/img/newsocial_06.png]
 http://www.nice.com/blog



 [image: http://tlvbiztalk03/SignatureMaker/img/banner_BIG-DATA.jpg]
 http://www.nice.com/big-data-solutions







Re: CopyField from text to multi value

2014-10-19 Thread Jack Krupansky
As always, you need to first examine how you intend to query the fields before 
you dive into data modeling. In this case, is there any particular reason that 
you need the individual terms as separate values, as opposed to simply using a 
tokenized text field?

-- Jack Krupansky

From: Tomer Levi 
Sent: Sunday, October 19, 2014 9:07 AM
To: solr-user@lucene.apache.org 
Subject: CopyField from text to multi value

Hi,

I would like to copy a textual field content into a multivalue filed.

For example,

Let’s say my field text contains: “I am a solr user”

I would like to have a multi-value copyFields with the following content: [“I”, 
“am”, “a”, “solr”, “user”]

 

Thanks,

  Tomer Levi
 
  Software Engineer  

  Big Data Group
 
  Product  Technology Unit
 
  (T) +972 (9) 775-2693
 
   
 
  tomer.l...@nice.com 
 
  www.nice.com
 

 
 
   
 

 

 

 


Re: CopyField from text to multi value

2014-10-19 Thread Erick Erickson
This really feels like an  XY problem, which I think Jack is alluding to.

bq:  I understand that the analysis chain is applied after the raw
input was copied.
I need to store the output of the analysis chain as a new multi-value field

This statement is really confusing. You can't have the output of the analysis
chain used as input to a copyField, it just doesn't work that way which is what
you seem to want to do with the second sentence. Then you bring shingles
into the picture...

So let's take Jack's suggestion and  back up and tell us what the use-case
you're trying to support is rather than leaving us to guess what problem
you're trying to solve..

Best,
Erick


On Sun, Oct 19, 2014 at 9:43 AM, Jack Krupansky j...@basetechnology.com wrote:
 As always, you need to first examine how you intend to query the fields 
 before you dive into data modeling. In this case, is there any particular 
 reason that you need the individual terms as separate values, as opposed to 
 simply using a tokenized text field?

 -- Jack Krupansky

 From: Tomer Levi
 Sent: Sunday, October 19, 2014 9:07 AM
 To: solr-user@lucene.apache.org
 Subject: CopyField from text to multi value

 Hi,

 I would like to copy a textual field content into a multivalue filed.

 For example,

 Let’s say my field text contains: “I am a solr user”

 I would like to have a multi-value copyFields with the following content: 
 [“I”, “am”, “a”, “solr”, “user”]



 Thanks,

   Tomer Levi

   Software Engineer

   Big Data Group

   Product  Technology Unit

   (T) +972 (9) 775-2693



   tomer.l...@nice.com

   www.nice.com














RE: CopyField from text to multi value

2014-10-19 Thread Tomer Levi
Thanks again for the help.



The use case is this.

In my UI I would like to indicate which words leaded to every document in the 
response.

It actually seems like a simple highlight case but instead of getting the 
highlight result as this is a brlong/br string brwith/br text,

Our UI team wants a list of words, i.e:[long, with].



So, I assumed that I can just tokenize the original text - copy the tokens 
into new multi-value fields - ask Solr to highlight the multi-value field



That is my use case.

Thanks again

Tomer





-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Sunday, October 19, 2014 5:18 PM
To: solr-user@lucene.apache.org
Subject: Re: CopyField from text to multi value



This really feels like an  XY problem, which I think Jack is alluding to.



bq:  I understand that the analysis chain is applied after the raw input was 
copied.

I need to store the output of the analysis chain as a new multi-value field



This statement is really confusing. You can't have the output of the analysis 
chain used as input to a copyField, it just doesn't work that way which is what 
you seem to want to do with the second sentence. Then you bring shingles into 
the picture...



So let's take Jack's suggestion and  back up and tell us what the use-case 
you're trying to support is rather than leaving us to guess what problem you're 
trying to solve..



Best,

Erick





On Sun, Oct 19, 2014 at 9:43 AM, Jack Krupansky 
j...@basetechnology.commailto:j...@basetechnology.com wrote:

 As always, you need to first examine how you intend to query the fields 
 before you dive into data modeling. In this case, is there any particular 
 reason that you need the individual terms as separate values, as opposed to 
 simply using a tokenized text field?



 -- Jack Krupansky



 From: Tomer Levi

 Sent: Sunday, October 19, 2014 9:07 AM

 To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org

 Subject: CopyField from text to multi value



 Hi,



 I would like to copy a textual field content into a multivalue filed.



 For example,



 Let’s say my field text contains: “I am a solr user”



 I would like to have a multi-value copyFields with the following

 content: [“I”, “am”, “a”, “solr”, “user”]







 Thanks,



   Tomer Levi



   Software Engineer



   Big Data Group



   Product  Technology Unit



   (T) +972 (9) 775-2693







   tomer.l...@nice.commailto:tomer.l...@nice.com



   www.nice.comhttp://www.nice.com


























Query parsing - difference between Analysis and parsedquery_toString output

2014-10-19 Thread tinush
Hi, 

I use Solr 4.9 and imported about 20K documents from CSV data. 

In schema there is following definition for text_general field which I want
to process by tokenization, stop word removal, stemming. 

fieldType name=text_general class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
filter class=solr.StopFilterFactory ignoreCase=true
enablePositionIncrements=true /
filter class=solr.ASCIIFoldingFilterFactory /
filter class=solr.SnowballPorterFilterFactory
language=English/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.ASCIIFoldingFilterFactory /
filter class=solr.SnowballPorterFilterFactory
language=English/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

Using Solr Admin Analysis for that field type I see that both index and
query value proceed as expected: Hershey's - *hershey*, The Hershey's
Company - the *hershey* compani 

I was expected the same processing for select query, but it seems doesn't
happen and no result found in below example:
 q: manufacture_t:The Hershey Company^100 OR title_t:The Hershey
Company^1000
 parsedquery_toString: manufacture_t:the text:Hershey text:Company^100.0
title_t:the text:Hershey text:Company^1000.0,

indexed document: 
   docs: [
  {
id: 00010700501806,
description_t: [
  Hershey's Whoppers Carton - 12 Pack 
],
title_t: [
  Whoppers Carton - 12 Pack
],
manufacture_t: [
  Hershey's
],

What do I miss?

Thanks in advance,
Tanya
 


 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-parsing-difference-between-Analysis-and-parsedquery-toString-output-tp4164851.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: CopyField from text to multi value

2014-10-19 Thread Walter Underwood
I think that info is available with termvectors. That should give a list of the 
query terms that matched each document, if I understand it correctly.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Oct 19, 2014, at 7:37 AM, Tomer Levi tomer.l...@nice.com wrote:

 Thanks again for the help.
 
 
 
 The use case is this.
 
 In my UI I would like to indicate which words leaded to every document in the 
 response.
 
 It actually seems like a simple highlight case but instead of getting the 
 highlight result as this is a brlong/br string brwith/br text,
 
 Our UI team wants a list of words, i.e:[long, with].
 
 
 
 So, I assumed that I can just tokenize the original text - copy the tokens 
 into new multi-value fields - ask Solr to highlight the multi-value field
 
 
 
 That is my use case.
 
 Thanks again
 
 Tomer
 
 
 
 
 
 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Sunday, October 19, 2014 5:18 PM
 To: solr-user@lucene.apache.org
 Subject: Re: CopyField from text to multi value
 
 
 
 This really feels like an  XY problem, which I think Jack is alluding to.
 
 
 
 bq:  I understand that the analysis chain is applied after the raw input was 
 copied.
 
 I need to store the output of the analysis chain as a new multi-value field
 
 
 
 This statement is really confusing. You can't have the output of the analysis 
 chain used as input to a copyField, it just doesn't work that way which is 
 what you seem to want to do with the second sentence. Then you bring shingles 
 into the picture...
 
 
 
 So let's take Jack's suggestion and  back up and tell us what the use-case 
 you're trying to support is rather than leaving us to guess what problem 
 you're trying to solve..
 
 
 
 Best,
 
 Erick
 
 
 
 
 
 On Sun, Oct 19, 2014 at 9:43 AM, Jack Krupansky 
 j...@basetechnology.commailto:j...@basetechnology.com wrote:
 
 As always, you need to first examine how you intend to query the fields 
 before you dive into data modeling. In this case, is there any particular 
 reason that you need the individual terms as separate values, as opposed to 
 simply using a tokenized text field?
 
 
 
 -- Jack Krupansky
 
 
 
 From: Tomer Levi
 
 Sent: Sunday, October 19, 2014 9:07 AM
 
 To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
 
 Subject: CopyField from text to multi value
 
 
 
 Hi,
 
 
 
 I would like to copy a textual field content into a multivalue filed.
 
 
 
 For example,
 
 
 
 Let’s say my field text contains: “I am a solr user”
 
 
 
 I would like to have a multi-value copyFields with the following
 
 content: [“I”, “am”, “a”, “solr”, “user”]
 
 
 
 
 
 
 
 Thanks,
 
 
 
  Tomer Levi
 
 
 
  Software Engineer
 
 
 
  Big Data Group
 
 
 
  Product  Technology Unit
 
 
 
  (T) +972 (9) 775-2693
 
 
 
 
 
 
 
  tomer.l...@nice.commailto:tomer.l...@nice.com
 
 
 
  www.nice.comhttp://www.nice.com
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Re: Query parsing - difference between Analysis and parsedquery_toString output

2014-10-19 Thread Erick Erickson
This trips _everybody_ up. Analysis doesn't happen until things get
through the query parser. So,
let's assume your query is
q=manufacture_t:The Hershey Company^100 OR title_t:The Hershey
Company^1000

The problem is that the query _parser_ doesn't understand that
your intent is that the hershey company be evaluated against
the manuracture_t field, and the title_t field. All it sees is
manufacture_t:the then, as a naked token, hershey and company.
So, it does the best it can and assumes that hershey and company
should be evaluated against your default text field, in this case text.

You have two choices here:
1 form your query like maufacture_t:The Hershey Company,or
manufacture_t:(The Hershey Company).

The first form requires that the words The, Hershey, and Company
appear in sequence, and the second form just requires that all three
appear in somewhere in the field in any order.

Actually, the second form requires that only one of the terms appears
in the field assuming your default q.op is OR. If you require all three
either define the default operator to be AND or enter it as
manuracture_t:(The AND Hershey AND company).

Best,
Erick

On Sun, Oct 19, 2014 at 4:49 PM, tinush tanya.karpin...@gmail.com wrote:
 Hi,

 I use Solr 4.9 and imported about 20K documents from CSV data.

 In schema there is following definition for text_general field which I want
 to process by tokenization, stop word removal, stemming.

 fieldType name=text_general class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
 filter class=solr.StopFilterFactory ignoreCase=true
 enablePositionIncrements=true /
 filter class=solr.ASCIIFoldingFilterFactory /
 filter class=solr.SnowballPorterFilterFactory
 language=English/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 enablePositionIncrements=true /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.ASCIIFoldingFilterFactory /
 filter class=solr.SnowballPorterFilterFactory
 language=English/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType

 Using Solr Admin Analysis for that field type I see that both index and
 query value proceed as expected: Hershey's - *hershey*, The Hershey's
 Company - the *hershey* compani

 I was expected the same processing for select query, but it seems doesn't
 happen and no result found in below example:
  q: manufacture_t:The Hershey Company^100 OR title_t:The Hershey
 Company^1000
  parsedquery_toString: manufacture_t:the text:Hershey text:Company^100.0
 title_t:the text:Hershey text:Company^1000.0,

 indexed document:
docs: [
   {
 id: 00010700501806,
 description_t: [
   Hershey's Whoppers Carton - 12 Pack 
 ],
 title_t: [
   Whoppers Carton - 12 Pack
 ],
 manufacture_t: [
   Hershey's
 ],

 What do I miss?

 Thanks in advance,
 Tanya







 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Query-parsing-difference-between-Analysis-and-parsedquery-toString-output-tp4164851.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Recovering from Out of Mem

2014-10-19 Thread Salman Akram
I assume you will have to write a script to restart the service as well?

On Fri, Oct 17, 2014 at 7:17 PM, Tim Potter tim.pot...@lucidworks.com
wrote:

 You'd still want to kill it ... so you'll need to register a cmd script
 with the JVM using -XX:OnOutOfMemoryError=kill.cmd and then you could
 either

 1) trap the PID at startup using something like:

 title SolrCloud

 for /F tokens=2 delims=  %%A in ('TASKLIST /FI ^WINDOWTITLE eq
 SolrCloud^ /NH') do (

 set /A SOLR_PID=%%A

 echo !SOLR_PID!solr.pid


 or


 2) if you keep track of the port (which all my Windows scripts do), then
 you can do:


 For /f tokens=5 %%j in ('netstat -aon ^| find /i listening ^| find
 :%SOLR_PORT%') do (

   taskkill /t /f /pid %%j  nul 21

 )


 On Fri, Oct 17, 2014 at 1:11 AM, Salman Akram 
 salman.ak...@northbaysolutions.net wrote:

  I know this might sound weird but any easy way to do it in Windows?
 
  On Tue, Oct 14, 2014 at 7:51 PM, Boogie Shafer 
 boogie.sha...@proquest.com
  
  wrote:
 
   yago,
  
   you can put more complex restart logic as shown in the examples below
 or
   just do something similar to the java_oom.sh i posted earlier where you
   just spit out an email alert and deal with service restarts and
   troubleshooting manually
  
  
   e.g. something like the following for a java_error.sh will drop an
 email
   with a timestamp
  
  
  
   echo `date` | mail -s Java Error: General - $HOSTNAME
  not...@domain.com
  
  
   
   From: Tim Potter tim.pot...@lucidworks.com
   Sent: Tuesday, October 14, 2014 07:35
   To: solr-user@lucene.apache.org
   Subject: Re: Recovering from Out of Mem
  
   jfyi - the bin/solr script does the following:
  
   -XX:OnOutOfMemoryError=$SOLR_TIP/bin/oom_solr.sh $SOLR_PORT where
   $SOLR_PORT is the port Solr is bound to, e.g. 8983
  
   The oom_solr.sh script looks like:
  
   SOLR_PORT=$1
  
   SOLR_PID=`ps waux | grep start.jar | grep $SOLR_PORT | grep -v grep |
 awk
   '{print $2}' | sort -r`
  
   if [ $SOLR_PID ==  ]; then
  
 echo Couldn't find Solr process running on port $SOLR_PORT!
  
 exit
  
   fi
  
   NOW=$(date +%F%T)
  
   (
  
   echo Running OOM killer script for process $SOLR_PID for Solr on port
   $SOLR_PORT
  
   kill -9 $SOLR_PID
  
   echo Killed process $SOLR_PID
  
   ) | tee solr_oom_killer-$SOLR_PORT-$NOW.log
  
  
   I usually run Solr behind a supervisor type process (supervisord or
   upstart) that will restart it if the process dies.
  
  
   On Tue, Oct 14, 2014 at 8:09 AM, Markus Jelsma mar...@openindex.io
   wrote:
  
This will do:
kill -9 `ps aux | grep -v grep | grep tomcat6 | awk '{print $2}'`
   
pkill should also work
   
On Tuesday 14 October 2014 07:02:03 Yago Riveiro wrote:
 Boogie,




 Any example for java_error.sh script?


 —
 /Yago Riveiro

 On Tue, Oct 14, 2014 at 2:48 PM, Boogie Shafer 
boogie.sha...@proquest.com

 wrote:
  a really simple approach is to have the OOM generate an email
  e.g.
  1) create a simple script (call it java_oom.sh) and drop it in
 your
tomcat
  bin dir echo `date` | mail -s Java Error: OutOfMemory -
 $HOSTNAME
  not...@domain.com 2) configure your java options (in setenv.sh
 or
  similar) to trigger heap dump and the email script when OOM
 occurs
  #
  config error behaviors
  CATALINA_OPTS=$CATALINA_OPTS -XX:+HeapDumpOnOutOfMemoryError
  -XX:HeapDumpPath=$TOMCAT_DIR/temp/tomcat-dump.hprof
  -XX:OnError=$TOMCAT_DIR/bin/java_error.sh
  -XX:OnOutOfMemoryError=$TOMCAT_DIR/bin/java_oom.sh
  -XX:ErrorFile=$TOMCAT_DIR/temp/java_error%p.log
  
  From: Mark Miller markrmil...@gmail.com
  Sent: Tuesday, October 14, 2014 06:30
  To: solr-user@lucene.apache.org
  Subject: Re: Recovering from Out of Mem
  Best is to pass the Java cmd line option that kills the process
 on
   OOM
and
  setup a supervisor on the process to restart it.  You need a
  somewhat
  recent release for this to work properly though. - Mark
 
  On Oct 14, 2014, at 9:06 AM, Salman Akram
  salman.ak...@northbaysolutions.net wrote:
 
  I know there are some suggestions to avoid OOM issue e.g.
 setting
  appropriate Max Heap size etc. However, what's the best way to
   recover
  from
  it as it goes into non-responding state? We are using Tomcat on
  back
end.
 
  The scenario is that once we face OOM issue it keeps on taking
   queries
  (doesn't give any error) but they just time out. So even though
 we
have a
  fail over system implemented but we don't have a way to
  distinguish
   if
  these are real time out queries OR due to OOM.
 
  --
  Regards,
 
  Salman Akram
   
   
  
 
 
 
  --
  Regards,
 
  Salman Akram
 




-- 
Regards,

Salman Akram


Re: Recovering from Out of Mem

2014-10-19 Thread Ramzi Alqrainy
You can create a script to ping on Solr every 10 sec. if no response, then
restart it (Kill process id and run Solr again).
This is the fastest and easiest way to do that on windows.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Recovering-from-Out-of-Mem-tp4164167p4164882.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to properly use Levenstein distance with ~ in Java

2014-10-19 Thread Ramzi Alqrainy
You can use Levenstein Distance algorithm inside solr without writing code by
specifing the source of terms in solrconfig.xml

searchComponent name=spellcheck class=solr.SpellCheckComponent
  lst name=spellchecker
str name=classnamesolr.IndexBasedSpellChecker/str
str name=spellcheckIndexDir./spellchecker/str
str name=fieldcontent/str
str name=buildOnCommittrue/str
  /lst
/searchComponent

This example shows the results of a simple query that defines a query using
the spellcheck.q parameter. The query also includes a spellcheck.build=true
parameter, which is needs to be called only once in order to build the
index. spellcheck.build should not be specified with for each request.

http://localhost:8983/solr/spellCheckCompRH?q=*:*spellcheck.q=hell%20ultrasharspellcheck=truespellcheck.build=true

lst name=spellcheck
  lst name=suggestions
lst name=hell
  int name=numFound1/int
  int name=startOffset0/int
  int name=endOffset4/int
  arr name=suggestion
strdell/str
  /arr
/lst
lst name=ultrashar
  int name=numFound1/int
  int name=startOffset5/int
  int name=endOffset14/int
  arr name=suggestion
strultrasharp/str
  /arr
/lst
  /lst
/lst



Once the suggestions are collected, they are ranked by the configured
distance measure (Levenstein Distance by default) and then by aggregate
frequency.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-tp4164793p4164883.html
Sent from the Solr - User mailing list archive at Nabble.com.