Re: Question about wildcards

2012-05-21 Thread Anderson vasconcelos
Hi.

In debug mode, the generated query was:

str name=rawquerystringfield:*2231-7/str
str name=querystringfield:*2231-7/str
str name=parsedqueryfield:*2231-7/str
str name=parsedquery_toStringfield:*2231-7/str

The analisys of indexing the  text  .2231-7 produces this result:
Index Analyzer  .22317  .22317  .22317  .22317  #1;1322.
#1;7 .22317
And for search for *2231-7 , produces this result:
Query Analyzer  22317  22317  22317  22317 22317

I don't understand why he don't find results when i use field:*2231-7.
When i use field:*2231 without -7 the document was found.

How Ahmet said, i think they using -7 to ignore the document. But in
debug query, they don't show this.

Any idea to solve this?

Thanks


2012/5/18 Ahmet Arslan iori...@yahoo.com



  I have a field that was indexed with the string
  .2231-7. When i
  search using '*' or '?' like this *2231-7 the query
  don't returns
  results. When i remove -7 substring and search agin using
  *2231 the
  query returns. Finally when i search using
  .2231-7 the query returns
  too.

 May be standard tokenizer is splitting .2231-7 into multiple tokens?
 You can check that admin/analysis page.

 May be -7 is treated as negative clause? You can check that with
 debugQuery=on




Re: Question about wildcards

2012-05-21 Thread Jack Krupansky
Before Solr 3.6, which added MultiTermAwareComponent for analyzers, the 
presence of a wildcard completely short-circuited (prevented) the query-time 
analysis, so you have to manually emulate all steps of the query analyzer 
yourself if you want to do a wildcard. Even with 3.6, not all filters are 
multi-term aware.


See:
http://wiki.apache.org/solr/MultitermQueryAnalysis

Do a query for .2231-7 and that will tell you which analyzer steps you 
will have to do manually.


-- Jack Krupansky

-Original Message- 
From: Anderson vasconcelos

Sent: Monday, May 21, 2012 11:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Question about wildcards

Hi.

In debug mode, the generated query was:

str name=rawquerystringfield:*2231-7/str
str name=querystringfield:*2231-7/str
str name=parsedqueryfield:*2231-7/str
str name=parsedquery_toStringfield:*2231-7/str

The analisys of indexing the  text  .2231-7 produces this result:
Index Analyzer  .22317  .22317  .22317  .22317  #1;1322.
#1;7 .22317
And for search for *2231-7 , produces this result:
Query Analyzer  22317  22317  22317  22317 22317

I don't understand why he don't find results when i use field:*2231-7.
When i use field:*2231 without -7 the document was found.

How Ahmet said, i think they using -7 to ignore the document. But in
debug query, they don't show this.

Any idea to solve this?

Thanks


2012/5/18 Ahmet Arslan iori...@yahoo.com




 I have a field that was indexed with the string
 .2231-7. When i
 search using '*' or '?' like this *2231-7 the query
 don't returns
 results. When i remove -7 substring and search agin using
 *2231 the
 query returns. Finally when i search using
 .2231-7 the query returns
 too.

May be standard tokenizer is splitting .2231-7 into multiple tokens?
You can check that admin/analysis page.

May be -7 is treated as negative clause? You can check that with
debugQuery=on






Re: Question about wildcards

2012-05-21 Thread Anderson vasconcelos
I change the fieldtype of field to  the follow:

fieldType name=text_ws class=solr.TextField positionIncrementGap=100
analyzertokenizer
class=solr.WhitespaceTokenizerFactory//analyzer
/fieldType

As you see, i just keep the WhitespaceTokenizerFactory. That's works. Now i
could find using *2231?7, *2231*7, *2231-7,
*2231*,.2231-7.

How i can see, with this tokenizer the text was not spplitted. Is that the
best way to solve this?

Thanks



2012/5/21 Anderson vasconcelos anderson.v...@gmail.com

 Hi.

 In debug mode, the generated query was:

 str name=rawquerystringfield:*2231-7/str
 str name=querystringfield:*2231-7/str
 str name=parsedqueryfield:*2231-7/str
 str name=parsedquery_toStringfield:*2231-7/str

 The analisys of indexing the  text  .2231-7 produces this result:
 Index Analyzer  .22317  .22317  .22317  .22317
 #1;1322.#1;7 .22317
 And for search for *2231-7 , produces this result:
 Query Analyzer  22317  22317  22317  22317 22317

 I don't understand why he don't find results when i use field:*2231-7.
 When i use field:*2231 without -7 the document was found.

 How Ahmet said, i think they using -7 to ignore the document. But in
 debug query, they don't show this.

 Any idea to solve this?

 Thanks


 2012/5/18 Ahmet Arslan iori...@yahoo.com



  I have a field that was indexed with the string
  .2231-7. When i
  search using '*' or '?' like this *2231-7 the query
  don't returns
  results. When i remove -7 substring and search agin using
  *2231 the
  query returns. Finally when i search using
  .2231-7 the query returns
  too.

 May be standard tokenizer is splitting .2231-7 into multiple tokens?
 You can check that admin/analysis page.

 May be -7 is treated as negative clause? You can check that with
 debugQuery=on





Re: Question about wildcards

2012-05-21 Thread Jack Krupansky
And, generally when I see a field that has values like .2231-7, it 
should be a string field rather than tokenized text. As a string, you can 
then do straight wildcards without surprises.



-- Jack Krupansky
-Original Message- 
From: Jack Krupansky

Sent: Monday, May 21, 2012 11:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Question about wildcards

Before Solr 3.6, which added MultiTermAwareComponent for analyzers, the
presence of a wildcard completely short-circuited (prevented) the query-time
analysis, so you have to manually emulate all steps of the query analyzer
yourself if you want to do a wildcard. Even with 3.6, not all filters are
multi-term aware.

See:
http://wiki.apache.org/solr/MultitermQueryAnalysis

Do a query for .2231-7 and that will tell you which analyzer steps you
will have to do manually.

-- Jack Krupansky

-Original Message- 
From: Anderson vasconcelos

Sent: Monday, May 21, 2012 11:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Question about wildcards

Hi.

In debug mode, the generated query was:

str name=rawquerystringfield:*2231-7/str
str name=querystringfield:*2231-7/str
str name=parsedqueryfield:*2231-7/str
str name=parsedquery_toStringfield:*2231-7/str

The analisys of indexing the  text  .2231-7 produces this result:
Index Analyzer  .22317  .22317  .22317  .22317  #1;1322.
#1;7 .22317
And for search for *2231-7 , produces this result:
Query Analyzer  22317  22317  22317  22317 22317

I don't understand why he don't find results when i use field:*2231-7.
When i use field:*2231 without -7 the document was found.

How Ahmet said, i think they using -7 to ignore the document. But in
debug query, they don't show this.

Any idea to solve this?

Thanks


2012/5/18 Ahmet Arslan iori...@yahoo.com




 I have a field that was indexed with the string
 .2231-7. When i
 search using '*' or '?' like this *2231-7 the query
 don't returns
 results. When i remove -7 substring and search agin using
 *2231 the
 query returns. Finally when i search using
 .2231-7 the query returns
 too.

May be standard tokenizer is splitting .2231-7 into multiple tokens?
You can check that admin/analysis page.

May be -7 is treated as negative clause? You can check that with
debugQuery=on




Re: Question about wildcards

2012-05-21 Thread Anderson vasconcelos
Thanks all for the explanations.

Anderson

2012/5/21 Jack Krupansky j...@basetechnology.com

 And, generally when I see a field that has values like .2231-7, it
 should be a string field rather than tokenized text. As a string, you can
 then do straight wildcards without surprises.


 -- Jack Krupansky
 -Original Message- From: Jack Krupansky
 Sent: Monday, May 21, 2012 11:23 AM

 To: solr-user@lucene.apache.org
 Subject: Re: Question about wildcards

 Before Solr 3.6, which added MultiTermAwareComponent for analyzers, the
 presence of a wildcard completely short-circuited (prevented) the
 query-time
 analysis, so you have to manually emulate all steps of the query analyzer
 yourself if you want to do a wildcard. Even with 3.6, not all filters are
 multi-term aware.

 See:
 http://wiki.apache.org/solr/**MultitermQueryAnalysishttp://wiki.apache.org/solr/MultitermQueryAnalysis

 Do a query for .2231-7 and that will tell you which analyzer steps
 you
 will have to do manually.

 -- Jack Krupansky

 -Original Message- From: Anderson vasconcelos
 Sent: Monday, May 21, 2012 11:03 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Question about wildcards

 Hi.

 In debug mode, the generated query was:

 str name=rawquerystringfield:***2231-7/str
 str name=querystringfield:***2231-7/str
 str name=parsedqueryfield:***2231-7/str
 str name=parsedquery_toString**field:*2231-7/str

 The analisys of indexing the  text  .2231-7 produces this result:
 Index Analyzer  .22317  .22317  .22317  .22317
  #1;1322.
 #1;7 .22317
 And for search for *2231-7 , produces this result:
 Query Analyzer  22317  22317  22317  22317 22317

 I don't understand why he don't find results when i use field:*2231-7.
 When i use field:*2231 without -7 the document was found.

 How Ahmet said, i think they using -7 to ignore the document. But in
 debug query, they don't show this.

 Any idea to solve this?

 Thanks


 2012/5/18 Ahmet Arslan iori...@yahoo.com



  I have a field that was indexed with the string
  .2231-7. When i
  search using '*' or '?' like this *2231-7 the query
  don't returns
  results. When i remove -7 substring and search agin using
  *2231 the
  query returns. Finally when i search using
  .2231-7 the query returns
  too.

 May be standard tokenizer is splitting .2231-7 into multiple tokens?
 You can check that admin/analysis page.

 May be -7 is treated as negative clause? You can check that with
 debugQuery=on





Re: Question about wildcards

2012-05-18 Thread Ahmet Arslan


 I have a field that was indexed with the string
 .2231-7. When i
 search using '*' or '?' like this *2231-7 the query
 don't returns
 results. When i remove -7 substring and search agin using
 *2231 the
 query returns. Finally when i search usingĀ 
 .2231-7 the query returns
 too.

May be standard tokenizer is splitting .2231-7 into multiple tokens?
You can check that admin/analysis page.

May be -7 is treated as negative clause? You can check that with debugQuery=on