Re: KeywordTokenizerFactory splits the string for the exclamation mark
> Also could you please tell me the difference between searching a text in > the > following ways > > q=Exact_Word:"samplestring" > > q=samplestring&qf=Exact_Word > > I am trying to understand how enclosing the full term in "" is resolving > this problem ? What does it tell to solr ? The quotes tell Solr to do a phrase query. A phrase query must have the same relative position increments in the index as are found in the query, or the entire string must be an exact match for a single token in the index. Basically, the index must have the same words as the query, next to each other, and in the same order. > Other than the exclamation mark are there any other characters which tells > specific things to solr There are a number of special characters to Solr' standard query parser. The bottom of this page shows them all: http://lucene.apache.org/core/4_2_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html?is-external=true It lists && as a special character. It's not the combination of two & characters that is special, it is each & character. Thanks, Shawn
Re: KeywordTokenizerFactory splits the string for the exclamation mark
Have you looked at the results after adding &debug=query? That often gives you valuable insights into such questions. Admittedly, the debug syntax can be "interesting" get used to... Best, Erick On Tue, May 13, 2014 at 9:11 PM, nativecoder wrote: > Yes that happens due to the ! mark. > > Also can someone please tell me the difference between searching a text in > the following ways > > 1. q=Exact_Word:"samplestring" > > 2. q=samplestring&qf=Exact_Word > > 3. q="samplestring"&qf=Exact_Word > > I think the first and the third one are the same. is it correct ? How does > it differ from the second one. > > I am trying to understand how enclosing the full term in "" is resolving > this problem ? What does it tell to solr ? > > Other than the exclamation mark are there any other characters which tells > specific things to solr > On May 14, 2014 1:54 AM, "nativecoder [via Lucene]" < > ml-node+s472066n4135493...@n3.nabble.com> wrote: > >> Also could you please tell me the difference between searching a text in >> the following ways >> >> q=Exact_Word:"samplestring" >> >> q=samplestring&qf=Exact_Word >> >> I am trying to understand how enclosing the full term in "" is resolving >> this problem ? What does it tell to solr ? >> >> Other than the exclamation mark are there any other characters which tells >> specific things to solr >> >> -- >> If you reply to this email, your message will be added to the discussion >> below: >> >> http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460p4135493.html >> To start a new topic under Solr - User, email >> ml-node+s472066n472068...@n3.nabble.com >> To unsubscribe from KeywordTokenizerFactory splits the string for the >> exclamation mark, click >> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4135460&code=cm9tcm9tMTIzQGdtYWlsLmNvbXw0MTM1NDYwfDE3MDI4MTA4MQ==> >> . >> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >> > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460p4135525.html > Sent from the Solr - User mailing list archive at Nabble.com.
KeywordTokenizerFactory splits the string for the exclamation mark
Hi All I have a following field settings in solr schema As you can see Exact_Word has the KeywordTokenizerFactory and that should treat the string as it is. Following is my responseHeader. As you can see I am searching my string only in the filed Exact_Word and expecting it to return the Word field and the score "responseHeader":{ "status":0, "QTime":14, "params":{ "explainOther":"", "fl":"Word,score", "debugQuery":"on", "indent":"on", "start":"0", "q":"d!sdasdsdwasd!a...@dsadsadas.edu", "qf":"Exact_Word", "wt":"json", "fq":"", "version":"2.2", "rows":"10"}}, But when I enter email with the following string "d! sdasdsdwasd...@dsadsadas.edu" it splits the string to two. I was under the impression that KeywordTokenizerFactory will treat the string as it is. Following is the query debug result. There you can see it has split the word "parsedquery":"+((DisjunctionMaxQuery((Exact_Word:d)) -DisjunctionMaxQuery((Exact_Word:sdasdsdwasd...@dsadsadas.edu)))~1)", can someone please tell why it produce the query result as this If I put a string without the "!" sign as below, the produced query will be as below "parsedquery":"+DisjunctionMaxQuery(( Exact_Word:d_sdasdsdwasd_...@dsadsadas.edu))",. This is what I expected solr to even with the "!" mark. with "_" mark it wont do a string split and treats the string as it is I thought if the KeywordTokenizerFactory is applied then it should return the exact string as it is Please help me to understand what is going wrong here
Re: KeywordTokenizerFactory splits the string for the exclamation mark
Hi, It is not KeywordTokenizer, ! character has a special meaning to edismax and lucene query parser. It is NOT operator. If you want to search strings that could contain !, then use other query parsers. dismax for example. On Wednesday, May 14, 2014 12:02 AM, nativecoder wrote: Hi All I have a following field settings in solr schema Exact_Word*" omitPositions="true" termVectors="false" omitTermFreqAndPositions="true" compressed="true" type="string_ci" multiValued="false" indexed="true" stored="true" required="false" omitNorms="true"/> As you can see Exact_Email has the KeywordTokenizerFactory and that should treat the string as it is. But when I enter email with the following string "d!sdasdsdwasd...@dsadsadas.edu" it splits the string to two. I was under the impression that KeywordTokenizerFactory will treat the string as it is. *!* Following is the query debug result. There you can see it has split the word "parsedquery":"+((DisjunctionMaxQuery((Exact_Email:d)) -DisjunctionMaxQuery((Exact_Email:sdasdsdwasd...@dsadsadas.edu)))~1)", can someone please tell why it produce the query result as this If I put a string without the "!" sign as below, the produced query will be as below "parsedquery":"+DisjunctionMaxQuery((Exact_Email:testresu...@testdomain.com))", I thought if the KeywordTokenizerFactory is applied then it should return the exact string as it is Please help me to understand what is going wrong here -- View this message in context: http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: KeywordTokenizerFactory splits the string for the exclamation mark
Hi, All same, for single term queries it makes no sense/difference to use quotes. Quotes are proximity operators, it requires at least two clauses/terms. Actually I coudn't re-produce your problem with 4.8.0. With q=d!sdasdsdwasd...@dsadsadas.edu it is not treated as NOT operator. What version are you using? For other characters having special meaning, please see Escaping Special Characters section http://lucene.apache.org/core/4_8_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description Ahmet On Wednesday, May 14, 2014 7:18 AM, nativecoder wrote: Yes that happens due to the ! mark. Also can someone please tell me the difference between searching a text in the following ways 1. q=Exact_Word:"samplestring" 2. q=samplestring&qf=Exact_Word 3. q="samplestring"&qf=Exact_Word I think the first and the third one are the same. is it correct ? How does it differ from the second one. I am trying to understand how enclosing the full term in "" is resolving this problem ? What does it tell to solr ? Other than the exclamation mark are there any other characters which tells specific things to solr On May 14, 2014 1:54 AM, "nativecoder [via Lucene]" < ml-node+s472066n4135493...@n3.nabble.com> wrote: > Also could you please tell me the difference between searching a text in > the following ways > > q=Exact_Word:"samplestring" > > q=samplestring&qf=Exact_Word > > I am trying to understand how enclosing the full term in "" is resolving > this problem ? What does it tell to solr ? > > Other than the exclamation mark are there any other characters which tells > specific things to solr > > -- > If you reply to this email, your message will be added to the discussion > below: > > http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460p4135493.html > To start a new topic under Solr - User, email > ml-node+s472066n472068...@n3.nabble.com > To unsubscribe from KeywordTokenizerFactory splits the string for the > exclamation mark, click > here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4135460&code=cm9tcm9tMTIzQGdtYWlsLmNvbXw0MTM1NDYwfDE3MDI4MTA4MQ==> > . > NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460p4135525.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: KeywordTokenizerFactory splits the string for the exclamation mark
Exclamation point is the shortcut for the "NOT" operator. See the minus in front of the second generated term? You need to escape it, either with backslash or enclosing the full term in quotes. Or use the term query parser. Here's a list of the special characters for the query parser: http://lucene.apache.org/core/4_8_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters -- Jack Krupansky -Original Message- From: Romani Rupasinghe Sent: Tuesday, May 13, 2014 11:14 AM To: solr-user@lucene.apache.org Subject: KeywordTokenizerFactory splits the string for the exclamation mark Hi All I have a following field settings in solr schema As you can see Exact_Word has the KeywordTokenizerFactory and that should treat the string as it is. Following is my responseHeader. As you can see I am searching my string only in the filed Exact_Word and expecting it to return the Word field and the score "responseHeader":{ "status":0, "QTime":14, "params":{ "explainOther":"", "fl":"Word,score", "debugQuery":"on", "indent":"on", "start":"0", "q":"d!sdasdsdwasd!a...@dsadsadas.edu", "qf":"Exact_Word", "wt":"json", "fq":"", "version":"2.2", "rows":"10"}}, But when I enter email with the following string "d! sdasdsdwasd...@dsadsadas.edu" it splits the string to two. I was under the impression that KeywordTokenizerFactory will treat the string as it is. Following is the query debug result. There you can see it has split the word "parsedquery":"+((DisjunctionMaxQuery((Exact_Word:d)) -DisjunctionMaxQuery((Exact_Word:sdasdsdwasd...@dsadsadas.edu)))~1)", can someone please tell why it produce the query result as this If I put a string without the "!" sign as below, the produced query will be as below "parsedquery":"+DisjunctionMaxQuery(( Exact_Word:d_sdasdsdwasd_...@dsadsadas.edu))",. This is what I expected solr to even with the "!" mark. with "_" mark it wont do a string split and treats the string as it is I thought if the KeywordTokenizerFactory is applied then it should return the exact string as it is Please help me to understand what is going wrong here
Re: KeywordTokenizerFactory splits the string for the exclamation mark
Yes that happens due to the ! mark. Also can someone please tell me the difference between searching a text in the following ways 1. q=Exact_Word:"samplestring" 2. q=samplestring&qf=Exact_Word 3. q="samplestring"&qf=Exact_Word I think the first and the third one are the same. is it correct ? How does it differ from the second one. I am trying to understand how enclosing the full term in "" is resolving this problem ? What does it tell to solr ? Other than the exclamation mark are there any other characters which tells specific things to solr On May 14, 2014 1:54 AM, "nativecoder [via Lucene]" < ml-node+s472066n4135493...@n3.nabble.com> wrote: > Also could you please tell me the difference between searching a text in > the following ways > > q=Exact_Word:"samplestring" > > q=samplestring&qf=Exact_Word > > I am trying to understand how enclosing the full term in "" is resolving > this problem ? What does it tell to solr ? > > Other than the exclamation mark are there any other characters which tells > specific things to solr > > -- > If you reply to this email, your message will be added to the discussion > below: > > http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460p4135493.html > To start a new topic under Solr - User, email > ml-node+s472066n472068...@n3.nabble.com > To unsubscribe from KeywordTokenizerFactory splits the string for the > exclamation mark, click > here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4135460&code=cm9tcm9tMTIzQGdtYWlsLmNvbXw0MTM1NDYwfDE3MDI4MTA4MQ==> > . > NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460p4135525.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: KeywordTokenizerFactory splits the string for the exclamation mark
Also could you please tell me the difference between searching a text in the following ways q=Exact_Word:"samplestring" q=samplestring&qf=Exact_Word I am trying to understand how enclosing the full term in "" is resolving this problem ? What does it tell to solr ? Other than the exclamation mark are there any other characters which tells specific things to solr -- View this message in context: http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460p4135493.html Sent from the Solr - User mailing list archive at Nabble.com.
KeywordTokenizerFactory splits the string for the exclamation mark
Hi jack Please have a look at this I have a following field settings in solr schema As you can see Exact_Word has the KeywordTokenizerFactory and that should treat the string as it is. Following is my responseHeader. As you can see I am searching my string only in the filed Exact_Word and expecting it to return the Word field and the score "responseHeader":{ "status":0, "QTime":14, "params":{ "explainOther":"", "fl":"Word,score", "debugQuery":"on", "indent":"on", "start":"0", "q":"d!sdasdsdwasd!a...@dsadsadas.edu", "qf":"Exact_Word", "wt":"json", "fq":"", "version":"2.2", "rows":"10"}}, But when I enter email with the following string "d! sdasdsdwasd...@dsadsadas.edu" it splits the string to two. I was under the impression that KeywordTokenizerFactory will treat the string as it is. Following is the query debug result. There you can see it has split the word "parsedquery":"+((DisjunctionMaxQuery((Exact_Word:d)) -DisjunctionMaxQuery((Exact_Word:sdasdsdwasd...@dsadsadas.edu)))~1)", can someone please tell why it produce the query result as this If I put a string without the "!" sign as below, the produced query will be as below "parsedquery":"+DisjunctionMaxQuery(( Exact_Word:d_sdasdsdwasd_...@dsadsadas.edu))",. This is what I expected solr to even with the "!" mark. with "_" mark it wont do a string split and treats the string as it is I thought if the KeywordTokenizerFactory is applied then it should return the exact string as it is Please help me to understand what is going wrong here -- View this message in context: http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135474.html Sent from the Solr - User mailing list archive at Nabble.com.
KeywordTokenizerFactory splits the string for the exclamation mark
Hi All I have a following field settings in solr schema Exact_Word*" omitPositions="true" termVectors="false" omitTermFreqAndPositions="true" compressed="true" type="string_ci" multiValued="false" indexed="true" stored="true" required="false" omitNorms="true"/> As you can see Exact_Email has the KeywordTokenizerFactory and that should treat the string as it is. But when I enter email with the following string "d!sdasdsdwasd...@dsadsadas.edu" it splits the string to two. I was under the impression that KeywordTokenizerFactory will treat the string as it is. *!* Following is the query debug result. There you can see it has split the word "parsedquery":"+((DisjunctionMaxQuery((Exact_Email:d)) -DisjunctionMaxQuery((Exact_Email:sdasdsdwasd...@dsadsadas.edu)))~1)", can someone please tell why it produce the query result as this If I put a string without the "!" sign as below, the produced query will be as below "parsedquery":"+DisjunctionMaxQuery((Exact_Email:testresu...@testdomain.com))", I thought if the KeywordTokenizerFactory is applied then it should return the exact string as it is Please help me to understand what is going wrong here -- View this message in context: http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460.html Sent from the Solr - User mailing list archive at Nabble.com.