Re: KeywordTokenizerFactory splits the string for the exclamation mark

2014-05-16 Thread Shawn Heisey
> Also could you please tell me the difference between searching a text in
> the
> following ways
>
> q=Exact_Word:"samplestring"
>
> q=samplestring&qf=Exact_Word
>
> I am trying to understand how enclosing the full term in "" is resolving
> this problem ? What does it tell to solr  ?

The quotes tell Solr to do a phrase query. A phrase query must have the
same relative position increments in the index as are found in the query,
or the entire string must be an exact match for a single token in the
index. Basically, the index must have the same words as the query, next to
each other, and in the same order.

> Other than the exclamation mark are there any other characters which tells
> specific things to solr

There are a number of special characters to Solr' standard query parser.
The bottom of this page shows them all:

http://lucene.apache.org/core/4_2_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html?is-external=true

It lists && as a special character. It's not the combination of two &
characters that is special, it is each & character.

Thanks,
Shawn




Re: KeywordTokenizerFactory splits the string for the exclamation mark

2014-05-16 Thread Erick Erickson
Have you looked at the results after adding &debug=query? That often
gives you valuable insights into such questions. Admittedly, the debug
syntax can be "interesting" get used to...

Best,
Erick

On Tue, May 13, 2014 at 9:11 PM, nativecoder  wrote:
> Yes that happens due to the ! mark.
>
> Also can someone please tell me the difference between searching a text in
> the following ways
>
> 1. q=Exact_Word:"samplestring"
>
> 2. q=samplestring&qf=Exact_Word
>
> 3. q="samplestring"&qf=Exact_Word
>
> I think the first and the third one are the same.  is it correct ? How does
> it differ from the second one.
>
> I am trying to understand how enclosing the full term in "" is resolving
> this problem ? What does it tell to solr  ?
>
> Other than the exclamation mark are there any other characters which tells
> specific things to solr
> On May 14, 2014 1:54 AM, "nativecoder [via Lucene]" <
> ml-node+s472066n4135493...@n3.nabble.com> wrote:
>
>> Also could you please tell me the difference between searching a text in
>> the following ways
>>
>> q=Exact_Word:"samplestring"
>>
>> q=samplestring&qf=Exact_Word
>>
>> I am trying to understand how enclosing the full term in "" is resolving
>> this problem ? What does it tell to solr  ?
>>
>> Other than the exclamation mark are there any other characters which tells
>> specific things to solr
>>
>> --
>>  If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460p4135493.html
>>  To start a new topic under Solr - User, email
>> ml-node+s472066n472068...@n3.nabble.com
>> To unsubscribe from KeywordTokenizerFactory splits the string for the
>> exclamation mark, click 
>> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4135460&code=cm9tcm9tMTIzQGdtYWlsLmNvbXw0MTM1NDYwfDE3MDI4MTA4MQ==>
>> .
>> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460p4135525.html
> Sent from the Solr - User mailing list archive at Nabble.com.


KeywordTokenizerFactory splits the string for the exclamation mark

2014-05-15 Thread Romani Rupasinghe
Hi All

I have a following field settings in solr schema









As you can see Exact_Word has the KeywordTokenizerFactory and that should
treat the string as it is.

Following is my responseHeader. As you can see I am searching my string
only in the filed Exact_Word and expecting it to return the Word field and
the score

"responseHeader":{
"status":0,
"QTime":14,
"params":{
  "explainOther":"",
  "fl":"Word,score",
  "debugQuery":"on",
  "indent":"on",
  "start":"0",
  "q":"d!sdasdsdwasd!a...@dsadsadas.edu",
  "qf":"Exact_Word",
  "wt":"json",
  "fq":"",
  "version":"2.2",
  "rows":"10"}},


But when I enter email with the following string "d!
sdasdsdwasd...@dsadsadas.edu" it splits the string to two. I was under the
impression that KeywordTokenizerFactory will treat the string as it is.

Following is the query debug result. There you can see it has split the word
 "parsedquery":"+((DisjunctionMaxQuery((Exact_Word:d))
-DisjunctionMaxQuery((Exact_Word:sdasdsdwasd...@dsadsadas.edu)))~1)",

can someone please tell why it produce the query result as this

If I put a string without the "!" sign as below, the produced query will be
as below
 "parsedquery":"+DisjunctionMaxQuery((
Exact_Word:d_sdasdsdwasd_...@dsadsadas.edu))",. This is what I expected
solr to even with the "!" mark. with "_" mark it wont do a string split and
treats the string as it is

I thought if the KeywordTokenizerFactory is applied then it should return
the exact string as it is

Please help me to understand what is going wrong here


Re: KeywordTokenizerFactory splits the string for the exclamation mark

2014-05-15 Thread Ahmet Arslan
Hi,

It is not KeywordTokenizer, ! character has a special meaning to edismax and 
lucene query parser. It is NOT operator.
If you want to search strings that could contain !, then use other query 
parsers. dismax for example. 





On Wednesday, May 14, 2014 12:02 AM, nativecoder  wrote:
Hi All

I have a following field settings in solr schema

Exact_Word*" omitPositions="true"
termVectors="false" omitTermFreqAndPositions="true" compressed="true"
type="string_ci" multiValued="false" indexed="true" stored="true"
required="false" omitNorms="true"/>







As you can see Exact_Email has the KeywordTokenizerFactory and that should
treat the string as it is.

But when I enter email with the following string
"d!sdasdsdwasd...@dsadsadas.edu" it splits the string to two. I was under
the impression that KeywordTokenizerFactory will treat the string as it is.
*!*
Following is the query debug result. There you can see it has split the word 
"parsedquery":"+((DisjunctionMaxQuery((Exact_Email:d))
-DisjunctionMaxQuery((Exact_Email:sdasdsdwasd...@dsadsadas.edu)))~1)",

can someone please tell why it produce the query result as this 

If I put a string without the "!" sign as below, the produced query will be
as below

"parsedquery":"+DisjunctionMaxQuery((Exact_Email:testresu...@testdomain.com))",

I thought if the KeywordTokenizerFactory is applied then it should return
the exact string as it is

Please help me to understand what is going wrong here




--
View this message in context: 
http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: KeywordTokenizerFactory splits the string for the exclamation mark

2014-05-14 Thread Ahmet Arslan
Hi,

All same, for single term queries it makes no sense/difference to use quotes. 
Quotes are proximity operators, it requires at least two clauses/terms. 

Actually I coudn't re-produce your problem with 4.8.0. With 
q=d!sdasdsdwasd...@dsadsadas.edu it is not treated as NOT operator. What 
version are you using?

For other characters having special meaning, please see Escaping Special 
Characters section 
http://lucene.apache.org/core/4_8_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description

Ahmet





On Wednesday, May 14, 2014 7:18 AM, nativecoder  wrote:
Yes that happens due to the ! mark.

Also can someone please tell me the difference between searching a text in
the following ways

1. q=Exact_Word:"samplestring"

2. q=samplestring&qf=Exact_Word

3. q="samplestring"&qf=Exact_Word

I think the first and the third one are the same.  is it correct ? How does
it differ from the second one.

I am trying to understand how enclosing the full term in "" is resolving
this problem ? What does it tell to solr  ?

Other than the exclamation mark are there any other characters which tells
specific things to solr
On May 14, 2014 1:54 AM, "nativecoder [via Lucene]" <
ml-node+s472066n4135493...@n3.nabble.com> wrote:

> Also could you please tell me the difference between searching a text in
> the following ways
>
> q=Exact_Word:"samplestring"
>
> q=samplestring&qf=Exact_Word
>
> I am trying to understand how enclosing the full term in "" is resolving
> this problem ? What does it tell to solr  ?
>
> Other than the exclamation mark are there any other characters which tells
> specific things to solr
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460p4135493.html
>  To start a new topic under Solr - User, email
> ml-node+s472066n472068...@n3.nabble.com
> To unsubscribe from KeywordTokenizerFactory splits the string for the
> exclamation mark, click 
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4135460&code=cm9tcm9tMTIzQGdtYWlsLmNvbXw0MTM1NDYwfDE3MDI4MTA4MQ==>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460p4135525.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: KeywordTokenizerFactory splits the string for the exclamation mark

2014-05-14 Thread Jack Krupansky

Exclamation point is the shortcut for the "NOT" operator. See the minus in
front of the second generated term?

You need to escape it, either with backslash or enclosing the full term in
quotes. Or use the term query parser.

Here's a list of the special characters for the query parser:
http://lucene.apache.org/core/4_8_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters

-- Jack Krupansky

-Original Message- 
From: Romani Rupasinghe

Sent: Tuesday, May 13, 2014 11:14 AM
To: solr-user@lucene.apache.org
Subject: KeywordTokenizerFactory splits the string for the exclamation mark

Hi All

I have a following field settings in solr schema









As you can see Exact_Word has the KeywordTokenizerFactory and that should
treat the string as it is.

Following is my responseHeader. As you can see I am searching my string
only in the filed Exact_Word and expecting it to return the Word field and
the score

"responseHeader":{
   "status":0,
   "QTime":14,
   "params":{
 "explainOther":"",
 "fl":"Word,score",
 "debugQuery":"on",
 "indent":"on",
 "start":"0",
 "q":"d!sdasdsdwasd!a...@dsadsadas.edu",
 "qf":"Exact_Word",
 "wt":"json",
 "fq":"",
 "version":"2.2",
 "rows":"10"}},


But when I enter email with the following string "d!
sdasdsdwasd...@dsadsadas.edu" it splits the string to two. I was under the
impression that KeywordTokenizerFactory will treat the string as it is.

Following is the query debug result. There you can see it has split the word
"parsedquery":"+((DisjunctionMaxQuery((Exact_Word:d))
-DisjunctionMaxQuery((Exact_Word:sdasdsdwasd...@dsadsadas.edu)))~1)",

can someone please tell why it produce the query result as this

If I put a string without the "!" sign as below, the produced query will be
as below
"parsedquery":"+DisjunctionMaxQuery((
Exact_Word:d_sdasdsdwasd_...@dsadsadas.edu))",. This is what I expected
solr to even with the "!" mark. with "_" mark it wont do a string split and
treats the string as it is

I thought if the KeywordTokenizerFactory is applied then it should return
the exact string as it is

Please help me to understand what is going wrong here 



Re: KeywordTokenizerFactory splits the string for the exclamation mark

2014-05-14 Thread nativecoder
Yes that happens due to the ! mark.

Also can someone please tell me the difference between searching a text in
the following ways

1. q=Exact_Word:"samplestring"

2. q=samplestring&qf=Exact_Word

3. q="samplestring"&qf=Exact_Word

I think the first and the third one are the same.  is it correct ? How does
it differ from the second one.

I am trying to understand how enclosing the full term in "" is resolving
this problem ? What does it tell to solr  ?

Other than the exclamation mark are there any other characters which tells
specific things to solr
On May 14, 2014 1:54 AM, "nativecoder [via Lucene]" <
ml-node+s472066n4135493...@n3.nabble.com> wrote:

> Also could you please tell me the difference between searching a text in
> the following ways
>
> q=Exact_Word:"samplestring"
>
> q=samplestring&qf=Exact_Word
>
> I am trying to understand how enclosing the full term in "" is resolving
> this problem ? What does it tell to solr  ?
>
> Other than the exclamation mark are there any other characters which tells
> specific things to solr
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460p4135493.html
>  To start a new topic under Solr - User, email
> ml-node+s472066n472068...@n3.nabble.com
> To unsubscribe from KeywordTokenizerFactory splits the string for the
> exclamation mark, click 
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4135460&code=cm9tcm9tMTIzQGdtYWlsLmNvbXw0MTM1NDYwfDE3MDI4MTA4MQ==>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460p4135525.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: KeywordTokenizerFactory splits the string for the exclamation mark

2014-05-14 Thread nativecoder
Also could you please tell me the difference between searching a text in the
following ways

q=Exact_Word:"samplestring"

q=samplestring&qf=Exact_Word

I am trying to understand how enclosing the full term in "" is resolving
this problem ? What does it tell to solr  ?

Other than the exclamation mark are there any other characters which tells
specific things to solr



--
View this message in context: 
http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460p4135493.html
Sent from the Solr - User mailing list archive at Nabble.com.


KeywordTokenizerFactory splits the string for the exclamation mark

2014-05-13 Thread nativecoder
Hi jack

Please have a look at this


I have a following field settings in solr schema









As you can see Exact_Word has the KeywordTokenizerFactory and that should
treat the string as it is.

Following is my responseHeader. As you can see I am searching my string
only in the filed Exact_Word and expecting it to return the Word field and
the score

"responseHeader":{
"status":0,
"QTime":14,
"params":{
  "explainOther":"",
  "fl":"Word,score",
  "debugQuery":"on",
  "indent":"on",
  "start":"0",
  "q":"d!sdasdsdwasd!a...@dsadsadas.edu",
  "qf":"Exact_Word",
  "wt":"json",
  "fq":"",
  "version":"2.2",
  "rows":"10"}},


But when I enter email with the following string "d!
sdasdsdwasd...@dsadsadas.edu" it splits the string to two. I was under the
impression that KeywordTokenizerFactory will treat the string as it is.

Following is the query debug result. There you can see it has split the word
 "parsedquery":"+((DisjunctionMaxQuery((Exact_Word:d))
-DisjunctionMaxQuery((Exact_Word:sdasdsdwasd...@dsadsadas.edu)))~1)",

can someone please tell why it produce the query result as this

If I put a string without the "!" sign as below, the produced query will be
as below
 "parsedquery":"+DisjunctionMaxQuery((
Exact_Word:d_sdasdsdwasd_...@dsadsadas.edu))",. This is what I expected
solr to even with the "!" mark. with "_" mark it wont do a string split and
treats the string as it is

I thought if the KeywordTokenizerFactory is applied then it should return
the exact string as it is

Please help me to understand what is going wrong here




--
View this message in context: 
http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135474.html
Sent from the Solr - User mailing list archive at Nabble.com.

KeywordTokenizerFactory splits the string for the exclamation mark

2014-05-13 Thread nativecoder
Hi All

I have a following field settings in solr schema

Exact_Word*" omitPositions="true"
termVectors="false" omitTermFreqAndPositions="true" compressed="true"
type="string_ci" multiValued="false" indexed="true" stored="true"
required="false" omitNorms="true"/>







As you can see Exact_Email has the KeywordTokenizerFactory and that should
treat the string as it is.

But when I enter email with the following string
"d!sdasdsdwasd...@dsadsadas.edu" it splits the string to two. I was under
the impression that KeywordTokenizerFactory will treat the string as it is.
*!*
Following is the query debug result. There you can see it has split the word 
 "parsedquery":"+((DisjunctionMaxQuery((Exact_Email:d))
-DisjunctionMaxQuery((Exact_Email:sdasdsdwasd...@dsadsadas.edu)))~1)",

can someone please tell why it produce the query result as this 

If I put a string without the "!" sign as below, the produced query will be
as below

"parsedquery":"+DisjunctionMaxQuery((Exact_Email:testresu...@testdomain.com))",

I thought if the KeywordTokenizerFactory is applied then it should return
the exact string as it is

Please help me to understand what is going wrong here




--
View this message in context: 
http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460.html
Sent from the Solr - User mailing list archive at Nabble.com.