Re: Wildcard searches with space in TextField/StrField

2016-11-25 Thread Ahmet Arslan
Hi,

You could try this:

drop wildcard stuff altogether:
1) Employ edgengramfilter at index time.
2) Use plain searches at query time.

Ahmet



On Friday, November 25, 2016 4:59 PM, Sandeep Khanzode 
 wrote:
Hi All,

Can someone please assist with this query?

My data consists of:
1.] John Doe
2.] John V. Doe
3.] Johnson Doe
4.] Johnson V. Doe
5.] John Smith
6.] Johnson V. Smith
7.] Matt Doe
8.] Matt V. Doe
9.] Matt Doe
10.] Matthew V. Doe
11.] Matthew Smith

12.] Matthew V. Smith

Querying ...
(a) Matt/Matt* should return records 7-12
(b) John/John* should return records 1-6
(c) Doe/Doe* should return records 1-4, 7-10
(d) Smith/Smith* should return records 5,6,11,12
(e) V/V./V.*/V* should return records 2,4,6,8,10,12
(f) V. Doe/V. Doe* should return records 2,4,8,10
(g) John V/John V./John V*/John V.* should return record 2
(h) V. Smith/V. Smith* should return records 6,12

Any guidance would be appreciated!
I have tried ComplexPhraseQueryParser, but with a single token like Doe*, there 
is an error that indicates that the query is being identified as a prefix 
query. I may be missing something in the syntax.
 SRK 


On Thursday, November 24, 2016 11:16 PM, Sandeep Khanzode 
 wrote:


Hi All, Erick,
Please suggest. Would like to use the ComplexPhraseQueryParser for searching 
text (with wildcard) that may contain special characters.
For example ...John* should match John V. DoeJohn* should match Johnson 
SmithBruce-Willis* should match Bruce-WillisV.* should match John V. F. Doe
SRK 

On Thursday, November 24, 2016 5:57 PM, Sandeep Khanzode 
 wrote:


Hi,
This is the typical TextField with ... 




SRK 

On Thursday, November 24, 2016 1:38 AM, Reth RM  
wrote:


what is the fieldType of those records?  
On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode 
 wrote:

Hi Erick,
I gave this a try. 
These are my results. There is a record with "John D. Smith", and another named 
"John Doe".

1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any 
results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. 



Second observation: There is a record with "John D Smith"
1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. 

3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. 

SRK

On Sunday, November 13, 2016 7:43 AM, Erick Erickson 
 wrote:


 Right, for that kind of use case you want complexPhraseQueryParser,
see: https://cwiki.apache.org/ confluence/display/solr/Other+ 
Parsers#OtherParsers- ComplexPhraseQueryParser

Best,
Erick

On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode
 wrote:
> Thanks, Erick.
>
> I am actually not trying to use the String field (prefer a TextField here).
> But, in my comparisons with TextField, it seems that something like phrase
> matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or
> say, 'my dog has*') can only be accomplished with a string type field,
> especially because, with a WhitespaceTokenizer in TextField, the space will
> be lost, and all tokens will be individually considered. Am I missing
> something?
>
> SRK
>
>
> On Friday, November 11, 2016 10:05 PM, Erick Erickson
>  wrote:
>
>
> You have to query text and string fields differently, that's just the
> way it works. The problem is getting the query string through the
> parser as a _single_ token or as multiple tokens.
>
> Let's say you have a string field with the "a b" example. You have a
> single token
> a b that starts at offset 0.
>
> But with a text field, you have two tokens,
> a at position 0
> b at position 1
>
> But when the query parser sees "a b" (without quotes) it splits it
> into two tokens, and only the text field has both tokens so the string
> field won't match.
>
> OTOH, when the query parser sees "a\ b" it passes this through as a
> single token, which only matches the string field as there's no
> _single_ token "a b" in the text field.
>
> But a more interesting question is why you want to search this way.
> String fields are intended for keywords, machine-generated IDs and the
> like. They're pretty useless for searching anything except
> 1> exact tokens
> 2> prefixes
>
> While if you have "my dog has fleas" in a string field, you _can_
> search "*dog*" and get a hit but the performance is poor when you get
> a large corpus. Performance for "my*" will be pretty good though.
>
> In all this sounds like an XY problem, what's the use-case you're
> trying to solve?
>
> Best,
> Erick
>
>
>
> On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
>  wrote:
>> Hi Erick, Reth,
>>
>> The 'a\ b*' as well as 

Re: Wildcard searches with space in TextField/StrField

2016-11-25 Thread Sandeep Khanzode
Hi All,

Can someone please assist with this query?

My data consists of:
1.] John Doe
2.] John V. Doe
3.] Johnson Doe
4.] Johnson V. Doe
5.] John Smith
6.] Johnson V. Smith
7.] Matt Doe
8.] Matt V. Doe
9.] Matt Doe
10.] Matthew V. Doe
11.] Matthew Smith

12.] Matthew V. Smith

Querying ...
(a) Matt/Matt* should return records 7-12
(b) John/John* should return records 1-6
(c) Doe/Doe* should return records 1-4, 7-10
(d) Smith/Smith* should return records 5,6,11,12
(e) V/V./V.*/V* should return records 2,4,6,8,10,12
(f) V. Doe/V. Doe* should return records 2,4,8,10
(g) John V/John V./John V*/John V.* should return record 2
(h) V. Smith/V. Smith* should return records 6,12

Any guidance would be appreciated!
I have tried ComplexPhraseQueryParser, but with a single token like Doe*, there 
is an error that indicates that the query is being identified as a prefix 
query. I may be missing something in the syntax.
 SRK 

On Thursday, November 24, 2016 11:16 PM, Sandeep Khanzode 
 wrote:
 

 Hi All, Erick,
Please suggest. Would like to use the ComplexPhraseQueryParser for searching 
text (with wildcard) that may contain special characters.
For example ...John* should match John V. DoeJohn* should match Johnson 
SmithBruce-Willis* should match Bruce-WillisV.* should match John V. F. Doe
SRK 

    On Thursday, November 24, 2016 5:57 PM, Sandeep Khanzode 
 wrote:
 

 Hi,
This is the typical TextField with ...             
            



SRK 

    On Thursday, November 24, 2016 1:38 AM, Reth RM  
wrote:
 

 what is the fieldType of those records?  
On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode 
 wrote:

Hi Erick,
I gave this a try. 
These are my results. There is a record with "John D. Smith", and another named 
"John Doe".

1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any 
results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. 



Second observation: There is a record with "John D Smith"
1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. 

3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. 

SRK

    On Sunday, November 13, 2016 7:43 AM, Erick Erickson 
 wrote:


 Right, for that kind of use case you want complexPhraseQueryParser,
see: https://cwiki.apache.org/ confluence/display/solr/Other+ 
Parsers#OtherParsers- ComplexPhraseQueryParser

Best,
Erick

On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode
 wrote:
> Thanks, Erick.
>
> I am actually not trying to use the String field (prefer a TextField here).
> But, in my comparisons with TextField, it seems that something like phrase
> matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or
> say, 'my dog has*') can only be accomplished with a string type field,
> especially because, with a WhitespaceTokenizer in TextField, the space will
> be lost, and all tokens will be individually considered. Am I missing
> something?
>
> SRK
>
>
> On Friday, November 11, 2016 10:05 PM, Erick Erickson
>  wrote:
>
>
> You have to query text and string fields differently, that's just the
> way it works. The problem is getting the query string through the
> parser as a _single_ token or as multiple tokens.
>
> Let's say you have a string field with the "a b" example. You have a
> single token
> a b that starts at offset 0.
>
> But with a text field, you have two tokens,
> a at position 0
> b at position 1
>
> But when the query parser sees "a b" (without quotes) it splits it
> into two tokens, and only the text field has both tokens so the string
> field won't match.
>
> OTOH, when the query parser sees "a\ b" it passes this through as a
> single token, which only matches the string field as there's no
> _single_ token "a b" in the text field.
>
> But a more interesting question is why you want to search this way.
> String fields are intended for keywords, machine-generated IDs and the
> like. They're pretty useless for searching anything except
> 1> exact tokens
> 2> prefixes
>
> While if you have "my dog has fleas" in a string field, you _can_
> search "*dog*" and get a hit but the performance is poor when you get
> a large corpus. Performance for "my*" will be pretty good though.
>
> In all this sounds like an XY problem, what's the use-case you're
> trying to solve?
>
> Best,
> Erick
>
>
>
> On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
>  wrote:
>> Hi Erick, Reth,
>>
>> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only
>> for StrField for me.
>>
>> Any attempt at creating a 'a\ b*' for a TextField does not match any
>> documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure
>> there are 

Re: Wildcard searches with space in TextField/StrField

2016-11-24 Thread Sandeep Khanzode
Hi All, Erick,
Please suggest. Would like to use the ComplexPhraseQueryParser for searching 
text (with wildcard) that may contain special characters.
For example ...John* should match John V. DoeJohn* should match Johnson 
SmithBruce-Willis* should match Bruce-WillisV.* should match John V. F. Doe
SRK 

On Thursday, November 24, 2016 5:57 PM, Sandeep Khanzode 
 wrote:
 

 Hi,
This is the typical TextField with ...             
            



SRK 

    On Thursday, November 24, 2016 1:38 AM, Reth RM  
wrote:
 

 what is the fieldType of those records?  
On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode 
 wrote:

Hi Erick,
I gave this a try. 
These are my results. There is a record with "John D. Smith", and another named 
"John Doe".

1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any 
results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. 



Second observation: There is a record with "John D Smith"
1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. 

3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. 

SRK

    On Sunday, November 13, 2016 7:43 AM, Erick Erickson 
 wrote:


 Right, for that kind of use case you want complexPhraseQueryParser,
see: https://cwiki.apache.org/ confluence/display/solr/Other+ 
Parsers#OtherParsers- ComplexPhraseQueryParser

Best,
Erick

On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode
 wrote:
> Thanks, Erick.
>
> I am actually not trying to use the String field (prefer a TextField here).
> But, in my comparisons with TextField, it seems that something like phrase
> matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or
> say, 'my dog has*') can only be accomplished with a string type field,
> especially because, with a WhitespaceTokenizer in TextField, the space will
> be lost, and all tokens will be individually considered. Am I missing
> something?
>
> SRK
>
>
> On Friday, November 11, 2016 10:05 PM, Erick Erickson
>  wrote:
>
>
> You have to query text and string fields differently, that's just the
> way it works. The problem is getting the query string through the
> parser as a _single_ token or as multiple tokens.
>
> Let's say you have a string field with the "a b" example. You have a
> single token
> a b that starts at offset 0.
>
> But with a text field, you have two tokens,
> a at position 0
> b at position 1
>
> But when the query parser sees "a b" (without quotes) it splits it
> into two tokens, and only the text field has both tokens so the string
> field won't match.
>
> OTOH, when the query parser sees "a\ b" it passes this through as a
> single token, which only matches the string field as there's no
> _single_ token "a b" in the text field.
>
> But a more interesting question is why you want to search this way.
> String fields are intended for keywords, machine-generated IDs and the
> like. They're pretty useless for searching anything except
> 1> exact tokens
> 2> prefixes
>
> While if you have "my dog has fleas" in a string field, you _can_
> search "*dog*" and get a hit but the performance is poor when you get
> a large corpus. Performance for "my*" will be pretty good though.
>
> In all this sounds like an XY problem, what's the use-case you're
> trying to solve?
>
> Best,
> Erick
>
>
>
> On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
>  wrote:
>> Hi Erick, Reth,
>>
>> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only
>> for StrField for me.
>>
>> Any attempt at creating a 'a\ b*' for a TextField does not match any
>> documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure
>> there are documents that should match.
>> Another (maybe unrelated) observation is if I have 'field:a\ b', then the
>> parsedQuery is field:a field:b. Which does not match as expected (matches
>> individually).
>>
>> Can you please provide an example that I can use in Solr Query dashboard?
>> That will be helpful.
>>
>> I have also seen that wildcard queries work irrespective of field type
>> i.e. StrField as well as TextField. That makes sense because with a
>> WhitespaceTokenizer only creates word boundaries when we do not use a
>> EdgeNGramFilter. If I am not wrong, that is. SRK
>>
>>    On Friday, November 11, 2016 5:00 AM, Erick Erickson
>>  wrote:
>>
>>
>>  You can escape the space with a backslash as  'a\ b*'
>>
>> Best,
>> Erick
>>
>> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM  wrote:
>>> I don't think you can do wildcard on StrField. For text field, if your
>>> query is "category:(test m*)"  the parsed query will be  "category:test
>>> OR
>>> category:m*"
>>> You can add q.op=AND to 

Re: Wildcard searches with space in TextField/StrField

2016-11-24 Thread Sandeep Khanzode
Hi,
This is the typical TextField with ...             
            



SRK 

On Thursday, November 24, 2016 1:38 AM, Reth RM  
wrote:
 

 what is the fieldType of those records?  
On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode 
 wrote:

Hi Erick,
I gave this a try. 
These are my results. There is a record with "John D. Smith", and another named 
"John Doe".

1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any 
results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. 



Second observation: There is a record with "John D Smith"
1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. 

3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. 

SRK

    On Sunday, November 13, 2016 7:43 AM, Erick Erickson 
 wrote:


 Right, for that kind of use case you want complexPhraseQueryParser,
see: https://cwiki.apache.org/ confluence/display/solr/Other+ 
Parsers#OtherParsers- ComplexPhraseQueryParser

Best,
Erick

On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode
 wrote:
> Thanks, Erick.
>
> I am actually not trying to use the String field (prefer a TextField here).
> But, in my comparisons with TextField, it seems that something like phrase
> matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or
> say, 'my dog has*') can only be accomplished with a string type field,
> especially because, with a WhitespaceTokenizer in TextField, the space will
> be lost, and all tokens will be individually considered. Am I missing
> something?
>
> SRK
>
>
> On Friday, November 11, 2016 10:05 PM, Erick Erickson
>  wrote:
>
>
> You have to query text and string fields differently, that's just the
> way it works. The problem is getting the query string through the
> parser as a _single_ token or as multiple tokens.
>
> Let's say you have a string field with the "a b" example. You have a
> single token
> a b that starts at offset 0.
>
> But with a text field, you have two tokens,
> a at position 0
> b at position 1
>
> But when the query parser sees "a b" (without quotes) it splits it
> into two tokens, and only the text field has both tokens so the string
> field won't match.
>
> OTOH, when the query parser sees "a\ b" it passes this through as a
> single token, which only matches the string field as there's no
> _single_ token "a b" in the text field.
>
> But a more interesting question is why you want to search this way.
> String fields are intended for keywords, machine-generated IDs and the
> like. They're pretty useless for searching anything except
> 1> exact tokens
> 2> prefixes
>
> While if you have "my dog has fleas" in a string field, you _can_
> search "*dog*" and get a hit but the performance is poor when you get
> a large corpus. Performance for "my*" will be pretty good though.
>
> In all this sounds like an XY problem, what's the use-case you're
> trying to solve?
>
> Best,
> Erick
>
>
>
> On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
>  wrote:
>> Hi Erick, Reth,
>>
>> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only
>> for StrField for me.
>>
>> Any attempt at creating a 'a\ b*' for a TextField does not match any
>> documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure
>> there are documents that should match.
>> Another (maybe unrelated) observation is if I have 'field:a\ b', then the
>> parsedQuery is field:a field:b. Which does not match as expected (matches
>> individually).
>>
>> Can you please provide an example that I can use in Solr Query dashboard?
>> That will be helpful.
>>
>> I have also seen that wildcard queries work irrespective of field type
>> i.e. StrField as well as TextField. That makes sense because with a
>> WhitespaceTokenizer only creates word boundaries when we do not use a
>> EdgeNGramFilter. If I am not wrong, that is. SRK
>>
>>    On Friday, November 11, 2016 5:00 AM, Erick Erickson
>>  wrote:
>>
>>
>>  You can escape the space with a backslash as  'a\ b*'
>>
>> Best,
>> Erick
>>
>> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM  wrote:
>>> I don't think you can do wildcard on StrField. For text field, if your
>>> query is "category:(test m*)"  the parsed query will be  "category:test
>>> OR
>>> category:m*"
>>> You can add q.op=AND to make an AND between those terms.
>>>
>>> For phrase type wild card query support, as per docs, it
>>> is ComplexPhraseQueryParser that supports it. (I haven't tested it
>>> myself)
>>>
>>>
>>> https://cwiki.apache.org/ confluence/display/solr/Other+ 
>>> Parsers#OtherParsers- ComplexPhraseQueryParser
>>>
>>> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode <
>>> sandeep_khanz...@yahoo.com. invalid> wrote:
>>>

Re: Wildcard searches with space in TextField/StrField

2016-11-23 Thread Reth RM
what is the fieldType of those records?

On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode <
sandeep_khanz...@yahoo.com.invalid> wrote:

> Hi Erick,
> I gave this a try.
> These are my results. There is a record with "John D. Smith", and another
> named "John Doe".
>
> 1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any
> results.
>
> 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results.
>
>
>
> Second observation: There is a record with "John D Smith"
> 1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any
> results.
>
> 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record.
>
> 3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record.
>
> SRK
>
> On Sunday, November 13, 2016 7:43 AM, Erick Erickson <
> erickerick...@gmail.com> wrote:
>
>
>  Right, for that kind of use case you want complexPhraseQueryParser,
> see: https://cwiki.apache.org/confluence/display/solr/Other+
> Parsers#OtherParsers-ComplexPhraseQueryParser
>
> Best,
> Erick
>
> On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode
>  wrote:
> > Thanks, Erick.
> >
> > I am actually not trying to use the String field (prefer a TextField
> here).
> > But, in my comparisons with TextField, it seems that something like
> phrase
> > matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*',
> or
> > say, 'my dog has*') can only be accomplished with a string type field,
> > especially because, with a WhitespaceTokenizer in TextField, the space
> will
> > be lost, and all tokens will be individually considered. Am I missing
> > something?
> >
> > SRK
> >
> >
> > On Friday, November 11, 2016 10:05 PM, Erick Erickson
> >  wrote:
> >
> >
> > You have to query text and string fields differently, that's just the
> > way it works. The problem is getting the query string through the
> > parser as a _single_ token or as multiple tokens.
> >
> > Let's say you have a string field with the "a b" example. You have a
> > single token
> > a b that starts at offset 0.
> >
> > But with a text field, you have two tokens,
> > a at position 0
> > b at position 1
> >
> > But when the query parser sees "a b" (without quotes) it splits it
> > into two tokens, and only the text field has both tokens so the string
> > field won't match.
> >
> > OTOH, when the query parser sees "a\ b" it passes this through as a
> > single token, which only matches the string field as there's no
> > _single_ token "a b" in the text field.
> >
> > But a more interesting question is why you want to search this way.
> > String fields are intended for keywords, machine-generated IDs and the
> > like. They're pretty useless for searching anything except
> > 1> exact tokens
> > 2> prefixes
> >
> > While if you have "my dog has fleas" in a string field, you _can_
> > search "*dog*" and get a hit but the performance is poor when you get
> > a large corpus. Performance for "my*" will be pretty good though.
> >
> > In all this sounds like an XY problem, what's the use-case you're
> > trying to solve?
> >
> > Best,
> > Erick
> >
> >
> >
> > On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
> >  wrote:
> >> Hi Erick, Reth,
> >>
> >> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only
> >> for StrField for me.
> >>
> >> Any attempt at creating a 'a\ b*' for a TextField does not match any
> >> documents. The parsedQuery in debug mode does show 'field:a b*'. I am
> sure
> >> there are documents that should match.
> >> Another (maybe unrelated) observation is if I have 'field:a\ b', then
> the
> >> parsedQuery is field:a field:b. Which does not match as expected
> (matches
> >> individually).
> >>
> >> Can you please provide an example that I can use in Solr Query
> dashboard?
> >> That will be helpful.
> >>
> >> I have also seen that wildcard queries work irrespective of field type
> >> i.e. StrField as well as TextField. That makes sense because with a
> >> WhitespaceTokenizer only creates word boundaries when we do not use a
> >> EdgeNGramFilter. If I am not wrong, that is. SRK
> >>
> >>On Friday, November 11, 2016 5:00 AM, Erick Erickson
> >>  wrote:
> >>
> >>
> >>  You can escape the space with a backslash as  'a\ b*'
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM  wrote:
> >>> I don't think you can do wildcard on StrField. For text field, if your
> >>> query is "category:(test m*)"  the parsed query will be  "category:test
> >>> OR
> >>> category:m*"
> >>> You can add q.op=AND to make an AND between those terms.
> >>>
> >>> For phrase type wild card query support, as per docs, it
> >>> is ComplexPhraseQueryParser that supports it. (I haven't tested it
> >>> myself)
> >>>
> >>>
> >>> https://cwiki.apache.org/confluence/display/solr/Other+
> Parsers#OtherParsers-ComplexPhraseQueryParser
> >>>
> >>> On Thu, Nov 10, 

Re: Wildcard searches with space in TextField/StrField

2016-11-22 Thread Sandeep Khanzode
Hi Erick,
I gave this a try. 
These are my results. There is a record with "John D. Smith", and another named 
"John Doe".

1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any 
results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. 



Second observation: There is a record with "John D Smith"
1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. 

3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. 

SRK 

On Sunday, November 13, 2016 7:43 AM, Erick Erickson 
 wrote:
 

 Right, for that kind of use case you want complexPhraseQueryParser,
see: 
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser

Best,
Erick

On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode
 wrote:
> Thanks, Erick.
>
> I am actually not trying to use the String field (prefer a TextField here).
> But, in my comparisons with TextField, it seems that something like phrase
> matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or
> say, 'my dog has*') can only be accomplished with a string type field,
> especially because, with a WhitespaceTokenizer in TextField, the space will
> be lost, and all tokens will be individually considered. Am I missing
> something?
>
> SRK
>
>
> On Friday, November 11, 2016 10:05 PM, Erick Erickson
>  wrote:
>
>
> You have to query text and string fields differently, that's just the
> way it works. The problem is getting the query string through the
> parser as a _single_ token or as multiple tokens.
>
> Let's say you have a string field with the "a b" example. You have a
> single token
> a b that starts at offset 0.
>
> But with a text field, you have two tokens,
> a at position 0
> b at position 1
>
> But when the query parser sees "a b" (without quotes) it splits it
> into two tokens, and only the text field has both tokens so the string
> field won't match.
>
> OTOH, when the query parser sees "a\ b" it passes this through as a
> single token, which only matches the string field as there's no
> _single_ token "a b" in the text field.
>
> But a more interesting question is why you want to search this way.
> String fields are intended for keywords, machine-generated IDs and the
> like. They're pretty useless for searching anything except
> 1> exact tokens
> 2> prefixes
>
> While if you have "my dog has fleas" in a string field, you _can_
> search "*dog*" and get a hit but the performance is poor when you get
> a large corpus. Performance for "my*" will be pretty good though.
>
> In all this sounds like an XY problem, what's the use-case you're
> trying to solve?
>
> Best,
> Erick
>
>
>
> On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
>  wrote:
>> Hi Erick, Reth,
>>
>> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only
>> for StrField for me.
>>
>> Any attempt at creating a 'a\ b*' for a TextField does not match any
>> documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure
>> there are documents that should match.
>> Another (maybe unrelated) observation is if I have 'field:a\ b', then the
>> parsedQuery is field:a field:b. Which does not match as expected (matches
>> individually).
>>
>> Can you please provide an example that I can use in Solr Query dashboard?
>> That will be helpful.
>>
>> I have also seen that wildcard queries work irrespective of field type
>> i.e. StrField as well as TextField. That makes sense because with a
>> WhitespaceTokenizer only creates word boundaries when we do not use a
>> EdgeNGramFilter. If I am not wrong, that is. SRK
>>
>>    On Friday, November 11, 2016 5:00 AM, Erick Erickson
>>  wrote:
>>
>>
>>  You can escape the space with a backslash as  'a\ b*'
>>
>> Best,
>> Erick
>>
>> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM  wrote:
>>> I don't think you can do wildcard on StrField. For text field, if your
>>> query is "category:(test m*)"  the parsed query will be  "category:test
>>> OR
>>> category:m*"
>>> You can add q.op=AND to make an AND between those terms.
>>>
>>> For phrase type wild card query support, as per docs, it
>>> is ComplexPhraseQueryParser that supports it. (I haven't tested it
>>> myself)
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
>>>
>>> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode <
>>> sandeep_khanz...@yahoo.com.invalid> wrote:
>>>
 Hi,
 How does a search like abc* work in StrField. Since the entire thing is
 stored as a single token, is it a type of a trie structure that allows
 such
 wildcard matching?
 How can searches with space like 'a b*' be executed for text fields
 (tokenized on whitespace)? If we specify 

Re: Wildcard searches with space in TextField/StrField

2016-11-12 Thread Sandeep Khanzode
Thanks, Erick.
I am actually not trying to use the String field (prefer a TextField here). 
But, in my comparisons with TextField, it seems that something like phrase 
matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or 
say, 'my dog has*') can only be accomplished with a string type field, 
especially because, with a WhitespaceTokenizer in TextField, the space will be 
lost, and all tokens will be individually considered. Am I missing something? 
SRK 

On Friday, November 11, 2016 10:05 PM, Erick Erickson 
 wrote:
 

 You have to query text and string fields differently, that's just the
way it works. The problem is getting the query string through the
parser as a _single_ token or as multiple tokens.

Let's say you have a string field with the "a b" example. You have a
single token
a b that starts at offset 0.

But with a text field, you have two tokens,
a at position 0
b at position 1

But when the query parser sees "a b" (without quotes) it splits it
into two tokens, and only the text field has both tokens so the string
field won't match.

OTOH, when the query parser sees "a\ b" it passes this through as a
single token, which only matches the string field as there's no
_single_ token "a b" in the text field.

But a more interesting question is why you want to search this way.
String fields are intended for keywords, machine-generated IDs and the
like. They're pretty useless for searching anything except
1> exact tokens
2> prefixes

While if you have "my dog has fleas" in a string field, you _can_
search "*dog*" and get a hit but the performance is poor when you get
a large corpus. Performance for "my*" will be pretty good though.

In all this sounds like an XY problem, what's the use-case you're
trying to solve?

Best,
Erick



On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
 wrote:
> Hi Erick, Reth,
>
> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only for 
> StrField for me.
>
> Any attempt at creating a 'a\ b*' for a TextField does not match any 
> documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure 
> there are documents that should match.
> Another (maybe unrelated) observation is if I have 'field:a\ b', then the 
> parsedQuery is field:a field:b. Which does not match as expected (matches 
> individually).
>
> Can you please provide an example that I can use in Solr Query dashboard? 
> That will be helpful.
>
> I have also seen that wildcard queries work irrespective of field type i.e. 
> StrField as well as TextField. That makes sense because with a 
> WhitespaceTokenizer only creates word boundaries when we do not use a 
> EdgeNGramFilter. If I am not wrong, that is. SRK
>
>    On Friday, November 11, 2016 5:00 AM, Erick Erickson 
> wrote:
>
>
>  You can escape the space with a backslash as  'a\ b*'
>
> Best,
> Erick
>
> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM  wrote:
>> I don't think you can do wildcard on StrField. For text field, if your
>> query is "category:(test m*)"  the parsed query will be  "category:test OR
>> category:m*"
>> You can add q.op=AND to make an AND between those terms.
>>
>> For phrase type wild card query support, as per docs, it
>> is ComplexPhraseQueryParser that supports it. (I haven't tested it myself)
>>
>> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
>>
>> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode <
>> sandeep_khanz...@yahoo.com.invalid> wrote:
>>
>>> Hi,
>>> How does a search like abc* work in StrField. Since the entire thing is
>>> stored as a single token, is it a type of a trie structure that allows such
>>> wildcard matching?
>>> How can searches with space like 'a b*' be executed for text fields
>>> (tokenized on whitespace)? If we specify this type of query, it is broken
>>> down into two queries with field:a and field:b*. I would like them to be
>>> contiguous, sort of, like a phrase search with wild card.
>>> SRK
>
>
>

   

Re: Wildcard searches with space in TextField/StrField

2016-11-11 Thread Erick Erickson
You have to query text and string fields differently, that's just the
way it works. The problem is getting the query string through the
parser as a _single_ token or as multiple tokens.

Let's say you have a string field with the "a b" example. You have a
single token
a b that starts at offset 0.

But with a text field, you have two tokens,
a at position 0
b at position 1

But when the query parser sees "a b" (without quotes) it splits it
into two tokens, and only the text field has both tokens so the string
field won't match.

OTOH, when the query parser sees "a\ b" it passes this through as a
single token, which only matches the string field as there's no
_single_ token "a b" in the text field.

But a more interesting question is why you want to search this way.
String fields are intended for keywords, machine-generated IDs and the
like. They're pretty useless for searching anything except
1> exact tokens
2> prefixes

While if you have "my dog has fleas" in a string field, you _can_
search "*dog*" and get a hit but the performance is poor when you get
a large corpus. Performance for "my*" will be pretty good though.

In all this sounds like an XY problem, what's the use-case you're
trying to solve?

Best,
Erick



On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
 wrote:
> Hi Erick, Reth,
>
> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only for 
> StrField for me.
>
> Any attempt at creating a 'a\ b*' for a TextField does not match any 
> documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure 
> there are documents that should match.
> Another (maybe unrelated) observation is if I have 'field:a\ b', then the 
> parsedQuery is field:a field:b. Which does not match as expected (matches 
> individually).
>
> Can you please provide an example that I can use in Solr Query dashboard? 
> That will be helpful.
>
> I have also seen that wildcard queries work irrespective of field type i.e. 
> StrField as well as TextField. That makes sense because with a 
> WhitespaceTokenizer only creates word boundaries when we do not use a 
> EdgeNGramFilter. If I am not wrong, that is. SRK
>
> On Friday, November 11, 2016 5:00 AM, Erick Erickson 
>  wrote:
>
>
>  You can escape the space with a backslash as  'a\ b*'
>
> Best,
> Erick
>
> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM  wrote:
>> I don't think you can do wildcard on StrField. For text field, if your
>> query is "category:(test m*)"  the parsed query will be  "category:test OR
>> category:m*"
>> You can add q.op=AND to make an AND between those terms.
>>
>> For phrase type wild card query support, as per docs, it
>> is ComplexPhraseQueryParser that supports it. (I haven't tested it myself)
>>
>> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
>>
>> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode <
>> sandeep_khanz...@yahoo.com.invalid> wrote:
>>
>>> Hi,
>>> How does a search like abc* work in StrField. Since the entire thing is
>>> stored as a single token, is it a type of a trie structure that allows such
>>> wildcard matching?
>>> How can searches with space like 'a b*' be executed for text fields
>>> (tokenized on whitespace)? If we specify this type of query, it is broken
>>> down into two queries with field:a and field:b*. I would like them to be
>>> contiguous, sort of, like a phrase search with wild card.
>>> SRK
>
>
>


Re: Wildcard searches with space in TextField/StrField

2016-11-10 Thread Sandeep Khanzode
Hi Erick, Reth,

The 'a\ b*' as well as the q.op=AND approach worked (successfully) only for 
StrField for me.

Any attempt at creating a 'a\ b*' for a TextField does not match any documents. 
The parsedQuery in debug mode does show 'field:a b*'. I am sure there are 
documents that should match.
Another (maybe unrelated) observation is if I have 'field:a\ b', then the 
parsedQuery is field:a field:b. Which does not match as expected (matches 
individually).

Can you please provide an example that I can use in Solr Query dashboard? That 
will be helpful. 

I have also seen that wildcard queries work irrespective of field type i.e. 
StrField as well as TextField. That makes sense because with a 
WhitespaceTokenizer only creates word boundaries when we do not use a 
EdgeNGramFilter. If I am not wrong, that is. SRK 

On Friday, November 11, 2016 5:00 AM, Erick Erickson 
 wrote:
 

 You can escape the space with a backslash as  'a\ b*'

Best,
Erick

On Thu, Nov 10, 2016 at 2:37 PM, Reth RM  wrote:
> I don't think you can do wildcard on StrField. For text field, if your
> query is "category:(test m*)"  the parsed query will be  "category:test OR
> category:m*"
> You can add q.op=AND to make an AND between those terms.
>
> For phrase type wild card query support, as per docs, it
> is ComplexPhraseQueryParser that supports it. (I haven't tested it myself)
>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
>
> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode <
> sandeep_khanz...@yahoo.com.invalid> wrote:
>
>> Hi,
>> How does a search like abc* work in StrField. Since the entire thing is
>> stored as a single token, is it a type of a trie structure that allows such
>> wildcard matching?
>> How can searches with space like 'a b*' be executed for text fields
>> (tokenized on whitespace)? If we specify this type of query, it is broken
>> down into two queries with field:a and field:b*. I would like them to be
>> contiguous, sort of, like a phrase search with wild card.
>> SRK


   

Re: Wildcard searches with space in TextField/StrField

2016-11-10 Thread Erick Erickson
You can escape the space with a backslash as  'a\ b*'

Best,
Erick

On Thu, Nov 10, 2016 at 2:37 PM, Reth RM  wrote:
> I don't think you can do wildcard on StrField. For text field, if your
> query is "category:(test m*)"  the parsed query will be  "category:test OR
> category:m*"
> You can add q.op=AND to make an AND between those terms.
>
> For phrase type wild card query support, as per docs, it
> is ComplexPhraseQueryParser that supports it. (I haven't tested it myself)
>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
>
> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode <
> sandeep_khanz...@yahoo.com.invalid> wrote:
>
>> Hi,
>> How does a search like abc* work in StrField. Since the entire thing is
>> stored as a single token, is it a type of a trie structure that allows such
>> wildcard matching?
>> How can searches with space like 'a b*' be executed for text fields
>> (tokenized on whitespace)? If we specify this type of query, it is broken
>> down into two queries with field:a and field:b*. I would like them to be
>> contiguous, sort of, like a phrase search with wild card.
>> SRK


Re: Wildcard searches with space in TextField/StrField

2016-11-10 Thread Reth RM
I don't think you can do wildcard on StrField. For text field, if your
query is "category:(test m*)"  the parsed query will be  "category:test OR
category:m*"
You can add q.op=AND to make an AND between those terms.

For phrase type wild card query support, as per docs, it
is ComplexPhraseQueryParser that supports it. (I haven't tested it myself)

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser

On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode <
sandeep_khanz...@yahoo.com.invalid> wrote:

> Hi,
> How does a search like abc* work in StrField. Since the entire thing is
> stored as a single token, is it a type of a trie structure that allows such
> wildcard matching?
> How can searches with space like 'a b*' be executed for text fields
> (tokenized on whitespace)? If we specify this type of query, it is broken
> down into two queries with field:a and field:b*. I would like them to be
> contiguous, sort of, like a phrase search with wild card.
> SRK