Re: Question about wildcards
Hi. In debug mode, the generated query was: str name=rawquerystringfield:*2231-7/str str name=querystringfield:*2231-7/str str name=parsedqueryfield:*2231-7/str str name=parsedquery_toStringfield:*2231-7/str The analisys of indexing the text .2231-7 produces this result: Index Analyzer .22317 .22317 .22317 .22317 #1;1322. #1;7 .22317 And for search for *2231-7 , produces this result: Query Analyzer 22317 22317 22317 22317 22317 I don't understand why he don't find results when i use field:*2231-7. When i use field:*2231 without -7 the document was found. How Ahmet said, i think they using -7 to ignore the document. But in debug query, they don't show this. Any idea to solve this? Thanks 2012/5/18 Ahmet Arslan iori...@yahoo.com I have a field that was indexed with the string .2231-7. When i search using '*' or '?' like this *2231-7 the query don't returns results. When i remove -7 substring and search agin using *2231 the query returns. Finally when i search using .2231-7 the query returns too. May be standard tokenizer is splitting .2231-7 into multiple tokens? You can check that admin/analysis page. May be -7 is treated as negative clause? You can check that with debugQuery=on
Re: Question about wildcards
Before Solr 3.6, which added MultiTermAwareComponent for analyzers, the presence of a wildcard completely short-circuited (prevented) the query-time analysis, so you have to manually emulate all steps of the query analyzer yourself if you want to do a wildcard. Even with 3.6, not all filters are multi-term aware. See: http://wiki.apache.org/solr/MultitermQueryAnalysis Do a query for .2231-7 and that will tell you which analyzer steps you will have to do manually. -- Jack Krupansky -Original Message- From: Anderson vasconcelos Sent: Monday, May 21, 2012 11:03 AM To: solr-user@lucene.apache.org Subject: Re: Question about wildcards Hi. In debug mode, the generated query was: str name=rawquerystringfield:*2231-7/str str name=querystringfield:*2231-7/str str name=parsedqueryfield:*2231-7/str str name=parsedquery_toStringfield:*2231-7/str The analisys of indexing the text .2231-7 produces this result: Index Analyzer .22317 .22317 .22317 .22317 #1;1322. #1;7 .22317 And for search for *2231-7 , produces this result: Query Analyzer 22317 22317 22317 22317 22317 I don't understand why he don't find results when i use field:*2231-7. When i use field:*2231 without -7 the document was found. How Ahmet said, i think they using -7 to ignore the document. But in debug query, they don't show this. Any idea to solve this? Thanks 2012/5/18 Ahmet Arslan iori...@yahoo.com I have a field that was indexed with the string .2231-7. When i search using '*' or '?' like this *2231-7 the query don't returns results. When i remove -7 substring and search agin using *2231 the query returns. Finally when i search using .2231-7 the query returns too. May be standard tokenizer is splitting .2231-7 into multiple tokens? You can check that admin/analysis page. May be -7 is treated as negative clause? You can check that with debugQuery=on
Re: Question about wildcards
I change the fieldtype of field to the follow: fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzertokenizer class=solr.WhitespaceTokenizerFactory//analyzer /fieldType As you see, i just keep the WhitespaceTokenizerFactory. That's works. Now i could find using *2231?7, *2231*7, *2231-7, *2231*,.2231-7. How i can see, with this tokenizer the text was not spplitted. Is that the best way to solve this? Thanks 2012/5/21 Anderson vasconcelos anderson.v...@gmail.com Hi. In debug mode, the generated query was: str name=rawquerystringfield:*2231-7/str str name=querystringfield:*2231-7/str str name=parsedqueryfield:*2231-7/str str name=parsedquery_toStringfield:*2231-7/str The analisys of indexing the text .2231-7 produces this result: Index Analyzer .22317 .22317 .22317 .22317 #1;1322.#1;7 .22317 And for search for *2231-7 , produces this result: Query Analyzer 22317 22317 22317 22317 22317 I don't understand why he don't find results when i use field:*2231-7. When i use field:*2231 without -7 the document was found. How Ahmet said, i think they using -7 to ignore the document. But in debug query, they don't show this. Any idea to solve this? Thanks 2012/5/18 Ahmet Arslan iori...@yahoo.com I have a field that was indexed with the string .2231-7. When i search using '*' or '?' like this *2231-7 the query don't returns results. When i remove -7 substring and search agin using *2231 the query returns. Finally when i search using .2231-7 the query returns too. May be standard tokenizer is splitting .2231-7 into multiple tokens? You can check that admin/analysis page. May be -7 is treated as negative clause? You can check that with debugQuery=on
Re: Question about wildcards
And, generally when I see a field that has values like .2231-7, it should be a string field rather than tokenized text. As a string, you can then do straight wildcards without surprises. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Monday, May 21, 2012 11:23 AM To: solr-user@lucene.apache.org Subject: Re: Question about wildcards Before Solr 3.6, which added MultiTermAwareComponent for analyzers, the presence of a wildcard completely short-circuited (prevented) the query-time analysis, so you have to manually emulate all steps of the query analyzer yourself if you want to do a wildcard. Even with 3.6, not all filters are multi-term aware. See: http://wiki.apache.org/solr/MultitermQueryAnalysis Do a query for .2231-7 and that will tell you which analyzer steps you will have to do manually. -- Jack Krupansky -Original Message- From: Anderson vasconcelos Sent: Monday, May 21, 2012 11:03 AM To: solr-user@lucene.apache.org Subject: Re: Question about wildcards Hi. In debug mode, the generated query was: str name=rawquerystringfield:*2231-7/str str name=querystringfield:*2231-7/str str name=parsedqueryfield:*2231-7/str str name=parsedquery_toStringfield:*2231-7/str The analisys of indexing the text .2231-7 produces this result: Index Analyzer .22317 .22317 .22317 .22317 #1;1322. #1;7 .22317 And for search for *2231-7 , produces this result: Query Analyzer 22317 22317 22317 22317 22317 I don't understand why he don't find results when i use field:*2231-7. When i use field:*2231 without -7 the document was found. How Ahmet said, i think they using -7 to ignore the document. But in debug query, they don't show this. Any idea to solve this? Thanks 2012/5/18 Ahmet Arslan iori...@yahoo.com I have a field that was indexed with the string .2231-7. When i search using '*' or '?' like this *2231-7 the query don't returns results. When i remove -7 substring and search agin using *2231 the query returns. Finally when i search using .2231-7 the query returns too. May be standard tokenizer is splitting .2231-7 into multiple tokens? You can check that admin/analysis page. May be -7 is treated as negative clause? You can check that with debugQuery=on
Re: Question about wildcards
Thanks all for the explanations. Anderson 2012/5/21 Jack Krupansky j...@basetechnology.com And, generally when I see a field that has values like .2231-7, it should be a string field rather than tokenized text. As a string, you can then do straight wildcards without surprises. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Monday, May 21, 2012 11:23 AM To: solr-user@lucene.apache.org Subject: Re: Question about wildcards Before Solr 3.6, which added MultiTermAwareComponent for analyzers, the presence of a wildcard completely short-circuited (prevented) the query-time analysis, so you have to manually emulate all steps of the query analyzer yourself if you want to do a wildcard. Even with 3.6, not all filters are multi-term aware. See: http://wiki.apache.org/solr/**MultitermQueryAnalysishttp://wiki.apache.org/solr/MultitermQueryAnalysis Do a query for .2231-7 and that will tell you which analyzer steps you will have to do manually. -- Jack Krupansky -Original Message- From: Anderson vasconcelos Sent: Monday, May 21, 2012 11:03 AM To: solr-user@lucene.apache.org Subject: Re: Question about wildcards Hi. In debug mode, the generated query was: str name=rawquerystringfield:***2231-7/str str name=querystringfield:***2231-7/str str name=parsedqueryfield:***2231-7/str str name=parsedquery_toString**field:*2231-7/str The analisys of indexing the text .2231-7 produces this result: Index Analyzer .22317 .22317 .22317 .22317 #1;1322. #1;7 .22317 And for search for *2231-7 , produces this result: Query Analyzer 22317 22317 22317 22317 22317 I don't understand why he don't find results when i use field:*2231-7. When i use field:*2231 without -7 the document was found. How Ahmet said, i think they using -7 to ignore the document. But in debug query, they don't show this. Any idea to solve this? Thanks 2012/5/18 Ahmet Arslan iori...@yahoo.com I have a field that was indexed with the string .2231-7. When i search using '*' or '?' like this *2231-7 the query don't returns results. When i remove -7 substring and search agin using *2231 the query returns. Finally when i search using .2231-7 the query returns too. May be standard tokenizer is splitting .2231-7 into multiple tokens? You can check that admin/analysis page. May be -7 is treated as negative clause? You can check that with debugQuery=on
Re: Question about wildcards
I have a field that was indexed with the string .2231-7. When i search using '*' or '?' like this *2231-7 the query don't returns results. When i remove -7 substring and search agin using *2231 the query returns. Finally when i search usingĀ .2231-7 the query returns too. May be standard tokenizer is splitting .2231-7 into multiple tokens? You can check that admin/analysis page. May be -7 is treated as negative clause? You can check that with debugQuery=on