Re: Wildcard searches with space in TextField/StrField
Hi, You could try this: drop wildcard stuff altogether: 1) Employ edgengramfilter at index time. 2) Use plain searches at query time. Ahmet On Friday, November 25, 2016 4:59 PM, Sandeep Khanzodewrote: Hi All, Can someone please assist with this query? My data consists of: 1.] John Doe 2.] John V. Doe 3.] Johnson Doe 4.] Johnson V. Doe 5.] John Smith 6.] Johnson V. Smith 7.] Matt Doe 8.] Matt V. Doe 9.] Matt Doe 10.] Matthew V. Doe 11.] Matthew Smith 12.] Matthew V. Smith Querying ... (a) Matt/Matt* should return records 7-12 (b) John/John* should return records 1-6 (c) Doe/Doe* should return records 1-4, 7-10 (d) Smith/Smith* should return records 5,6,11,12 (e) V/V./V.*/V* should return records 2,4,6,8,10,12 (f) V. Doe/V. Doe* should return records 2,4,8,10 (g) John V/John V./John V*/John V.* should return record 2 (h) V. Smith/V. Smith* should return records 6,12 Any guidance would be appreciated! I have tried ComplexPhraseQueryParser, but with a single token like Doe*, there is an error that indicates that the query is being identified as a prefix query. I may be missing something in the syntax. SRK On Thursday, November 24, 2016 11:16 PM, Sandeep Khanzode wrote: Hi All, Erick, Please suggest. Would like to use the ComplexPhraseQueryParser for searching text (with wildcard) that may contain special characters. For example ...John* should match John V. DoeJohn* should match Johnson SmithBruce-Willis* should match Bruce-WillisV.* should match John V. F. Doe SRK On Thursday, November 24, 2016 5:57 PM, Sandeep Khanzode wrote: Hi, This is the typical TextField with ... SRK On Thursday, November 24, 2016 1:38 AM, Reth RM wrote: what is the fieldType of those records? On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode wrote: Hi Erick, I gave this a try. These are my results. There is a record with "John D. Smith", and another named "John Doe". 1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any results. 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. Second observation: There is a record with "John D Smith" 1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any results. 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. 3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. SRK On Sunday, November 13, 2016 7:43 AM, Erick Erickson wrote: Right, for that kind of use case you want complexPhraseQueryParser, see: https://cwiki.apache.org/ confluence/display/solr/Other+ Parsers#OtherParsers- ComplexPhraseQueryParser Best, Erick On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode wrote: > Thanks, Erick. > > I am actually not trying to use the String field (prefer a TextField here). > But, in my comparisons with TextField, it seems that something like phrase > matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or > say, 'my dog has*') can only be accomplished with a string type field, > especially because, with a WhitespaceTokenizer in TextField, the space will > be lost, and all tokens will be individually considered. Am I missing > something? > > SRK > > > On Friday, November 11, 2016 10:05 PM, Erick Erickson > wrote: > > > You have to query text and string fields differently, that's just the > way it works. The problem is getting the query string through the > parser as a _single_ token or as multiple tokens. > > Let's say you have a string field with the "a b" example. You have a > single token > a b that starts at offset 0. > > But with a text field, you have two tokens, > a at position 0 > b at position 1 > > But when the query parser sees "a b" (without quotes) it splits it > into two tokens, and only the text field has both tokens so the string > field won't match. > > OTOH, when the query parser sees "a\ b" it passes this through as a > single token, which only matches the string field as there's no > _single_ token "a b" in the text field. > > But a more interesting question is why you want to search this way. > String fields are intended for keywords, machine-generated IDs and the > like. They're pretty useless for searching anything except > 1> exact tokens > 2> prefixes > > While if you have "my dog has fleas" in a string field, you _can_ > search "*dog*" and get a hit but the performance is poor when you get > a large corpus. Performance for "my*" will be pretty good though. > > In all this sounds like an XY problem, what's the use-case you're > trying to solve? > > Best, > Erick > > > > On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode > wrote: >> Hi Erick, Reth, >> >> The 'a\ b*' as well as
Re: Wildcard searches with space in TextField/StrField
Hi All, Can someone please assist with this query? My data consists of: 1.] John Doe 2.] John V. Doe 3.] Johnson Doe 4.] Johnson V. Doe 5.] John Smith 6.] Johnson V. Smith 7.] Matt Doe 8.] Matt V. Doe 9.] Matt Doe 10.] Matthew V. Doe 11.] Matthew Smith 12.] Matthew V. Smith Querying ... (a) Matt/Matt* should return records 7-12 (b) John/John* should return records 1-6 (c) Doe/Doe* should return records 1-4, 7-10 (d) Smith/Smith* should return records 5,6,11,12 (e) V/V./V.*/V* should return records 2,4,6,8,10,12 (f) V. Doe/V. Doe* should return records 2,4,8,10 (g) John V/John V./John V*/John V.* should return record 2 (h) V. Smith/V. Smith* should return records 6,12 Any guidance would be appreciated! I have tried ComplexPhraseQueryParser, but with a single token like Doe*, there is an error that indicates that the query is being identified as a prefix query. I may be missing something in the syntax. SRK On Thursday, November 24, 2016 11:16 PM, Sandeep Khanzodewrote: Hi All, Erick, Please suggest. Would like to use the ComplexPhraseQueryParser for searching text (with wildcard) that may contain special characters. For example ...John* should match John V. DoeJohn* should match Johnson SmithBruce-Willis* should match Bruce-WillisV.* should match John V. F. Doe SRK On Thursday, November 24, 2016 5:57 PM, Sandeep Khanzode wrote: Hi, This is the typical TextField with ... SRK On Thursday, November 24, 2016 1:38 AM, Reth RM wrote: what is the fieldType of those records? On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode wrote: Hi Erick, I gave this a try. These are my results. There is a record with "John D. Smith", and another named "John Doe". 1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any results. 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. Second observation: There is a record with "John D Smith" 1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any results. 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. 3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. SRK On Sunday, November 13, 2016 7:43 AM, Erick Erickson wrote: Right, for that kind of use case you want complexPhraseQueryParser, see: https://cwiki.apache.org/ confluence/display/solr/Other+ Parsers#OtherParsers- ComplexPhraseQueryParser Best, Erick On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode wrote: > Thanks, Erick. > > I am actually not trying to use the String field (prefer a TextField here). > But, in my comparisons with TextField, it seems that something like phrase > matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or > say, 'my dog has*') can only be accomplished with a string type field, > especially because, with a WhitespaceTokenizer in TextField, the space will > be lost, and all tokens will be individually considered. Am I missing > something? > > SRK > > > On Friday, November 11, 2016 10:05 PM, Erick Erickson > wrote: > > > You have to query text and string fields differently, that's just the > way it works. The problem is getting the query string through the > parser as a _single_ token or as multiple tokens. > > Let's say you have a string field with the "a b" example. You have a > single token > a b that starts at offset 0. > > But with a text field, you have two tokens, > a at position 0 > b at position 1 > > But when the query parser sees "a b" (without quotes) it splits it > into two tokens, and only the text field has both tokens so the string > field won't match. > > OTOH, when the query parser sees "a\ b" it passes this through as a > single token, which only matches the string field as there's no > _single_ token "a b" in the text field. > > But a more interesting question is why you want to search this way. > String fields are intended for keywords, machine-generated IDs and the > like. They're pretty useless for searching anything except > 1> exact tokens > 2> prefixes > > While if you have "my dog has fleas" in a string field, you _can_ > search "*dog*" and get a hit but the performance is poor when you get > a large corpus. Performance for "my*" will be pretty good though. > > In all this sounds like an XY problem, what's the use-case you're > trying to solve? > > Best, > Erick > > > > On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode > wrote: >> Hi Erick, Reth, >> >> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only >> for StrField for me. >> >> Any attempt at creating a 'a\ b*' for a TextField does not match any >> documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure >> there are
Re: Wildcard searches with space in TextField/StrField
Hi All, Erick, Please suggest. Would like to use the ComplexPhraseQueryParser for searching text (with wildcard) that may contain special characters. For example ...John* should match John V. DoeJohn* should match Johnson SmithBruce-Willis* should match Bruce-WillisV.* should match John V. F. Doe SRK On Thursday, November 24, 2016 5:57 PM, Sandeep Khanzodewrote: Hi, This is the typical TextField with ... SRK On Thursday, November 24, 2016 1:38 AM, Reth RM wrote: what is the fieldType of those records? On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode wrote: Hi Erick, I gave this a try. These are my results. There is a record with "John D. Smith", and another named "John Doe". 1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any results. 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. Second observation: There is a record with "John D Smith" 1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any results. 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. 3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. SRK On Sunday, November 13, 2016 7:43 AM, Erick Erickson wrote: Right, for that kind of use case you want complexPhraseQueryParser, see: https://cwiki.apache.org/ confluence/display/solr/Other+ Parsers#OtherParsers- ComplexPhraseQueryParser Best, Erick On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode wrote: > Thanks, Erick. > > I am actually not trying to use the String field (prefer a TextField here). > But, in my comparisons with TextField, it seems that something like phrase > matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or > say, 'my dog has*') can only be accomplished with a string type field, > especially because, with a WhitespaceTokenizer in TextField, the space will > be lost, and all tokens will be individually considered. Am I missing > something? > > SRK > > > On Friday, November 11, 2016 10:05 PM, Erick Erickson > wrote: > > > You have to query text and string fields differently, that's just the > way it works. The problem is getting the query string through the > parser as a _single_ token or as multiple tokens. > > Let's say you have a string field with the "a b" example. You have a > single token > a b that starts at offset 0. > > But with a text field, you have two tokens, > a at position 0 > b at position 1 > > But when the query parser sees "a b" (without quotes) it splits it > into two tokens, and only the text field has both tokens so the string > field won't match. > > OTOH, when the query parser sees "a\ b" it passes this through as a > single token, which only matches the string field as there's no > _single_ token "a b" in the text field. > > But a more interesting question is why you want to search this way. > String fields are intended for keywords, machine-generated IDs and the > like. They're pretty useless for searching anything except > 1> exact tokens > 2> prefixes > > While if you have "my dog has fleas" in a string field, you _can_ > search "*dog*" and get a hit but the performance is poor when you get > a large corpus. Performance for "my*" will be pretty good though. > > In all this sounds like an XY problem, what's the use-case you're > trying to solve? > > Best, > Erick > > > > On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode > wrote: >> Hi Erick, Reth, >> >> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only >> for StrField for me. >> >> Any attempt at creating a 'a\ b*' for a TextField does not match any >> documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure >> there are documents that should match. >> Another (maybe unrelated) observation is if I have 'field:a\ b', then the >> parsedQuery is field:a field:b. Which does not match as expected (matches >> individually). >> >> Can you please provide an example that I can use in Solr Query dashboard? >> That will be helpful. >> >> I have also seen that wildcard queries work irrespective of field type >> i.e. StrField as well as TextField. That makes sense because with a >> WhitespaceTokenizer only creates word boundaries when we do not use a >> EdgeNGramFilter. If I am not wrong, that is. SRK >> >> On Friday, November 11, 2016 5:00 AM, Erick Erickson >> wrote: >> >> >> You can escape the space with a backslash as 'a\ b*' >> >> Best, >> Erick >> >> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM wrote: >>> I don't think you can do wildcard on StrField. For text field, if your >>> query is "category:(test m*)" the parsed query will be "category:test >>> OR >>> category:m*" >>> You can add q.op=AND to
Re: Wildcard searches with space in TextField/StrField
Hi, This is the typical TextField with ... SRK On Thursday, November 24, 2016 1:38 AM, Reth RMwrote: what is the fieldType of those records? On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode wrote: Hi Erick, I gave this a try. These are my results. There is a record with "John D. Smith", and another named "John Doe". 1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any results. 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. Second observation: There is a record with "John D Smith" 1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any results. 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. 3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. SRK On Sunday, November 13, 2016 7:43 AM, Erick Erickson wrote: Right, for that kind of use case you want complexPhraseQueryParser, see: https://cwiki.apache.org/ confluence/display/solr/Other+ Parsers#OtherParsers- ComplexPhraseQueryParser Best, Erick On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode wrote: > Thanks, Erick. > > I am actually not trying to use the String field (prefer a TextField here). > But, in my comparisons with TextField, it seems that something like phrase > matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or > say, 'my dog has*') can only be accomplished with a string type field, > especially because, with a WhitespaceTokenizer in TextField, the space will > be lost, and all tokens will be individually considered. Am I missing > something? > > SRK > > > On Friday, November 11, 2016 10:05 PM, Erick Erickson > wrote: > > > You have to query text and string fields differently, that's just the > way it works. The problem is getting the query string through the > parser as a _single_ token or as multiple tokens. > > Let's say you have a string field with the "a b" example. You have a > single token > a b that starts at offset 0. > > But with a text field, you have two tokens, > a at position 0 > b at position 1 > > But when the query parser sees "a b" (without quotes) it splits it > into two tokens, and only the text field has both tokens so the string > field won't match. > > OTOH, when the query parser sees "a\ b" it passes this through as a > single token, which only matches the string field as there's no > _single_ token "a b" in the text field. > > But a more interesting question is why you want to search this way. > String fields are intended for keywords, machine-generated IDs and the > like. They're pretty useless for searching anything except > 1> exact tokens > 2> prefixes > > While if you have "my dog has fleas" in a string field, you _can_ > search "*dog*" and get a hit but the performance is poor when you get > a large corpus. Performance for "my*" will be pretty good though. > > In all this sounds like an XY problem, what's the use-case you're > trying to solve? > > Best, > Erick > > > > On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode > wrote: >> Hi Erick, Reth, >> >> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only >> for StrField for me. >> >> Any attempt at creating a 'a\ b*' for a TextField does not match any >> documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure >> there are documents that should match. >> Another (maybe unrelated) observation is if I have 'field:a\ b', then the >> parsedQuery is field:a field:b. Which does not match as expected (matches >> individually). >> >> Can you please provide an example that I can use in Solr Query dashboard? >> That will be helpful. >> >> I have also seen that wildcard queries work irrespective of field type >> i.e. StrField as well as TextField. That makes sense because with a >> WhitespaceTokenizer only creates word boundaries when we do not use a >> EdgeNGramFilter. If I am not wrong, that is. SRK >> >> On Friday, November 11, 2016 5:00 AM, Erick Erickson >> wrote: >> >> >> You can escape the space with a backslash as 'a\ b*' >> >> Best, >> Erick >> >> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM wrote: >>> I don't think you can do wildcard on StrField. For text field, if your >>> query is "category:(test m*)" the parsed query will be "category:test >>> OR >>> category:m*" >>> You can add q.op=AND to make an AND between those terms. >>> >>> For phrase type wild card query support, as per docs, it >>> is ComplexPhraseQueryParser that supports it. (I haven't tested it >>> myself) >>> >>> >>> https://cwiki.apache.org/ confluence/display/solr/Other+ >>> Parsers#OtherParsers- ComplexPhraseQueryParser >>> >>> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode < >>> sandeep_khanz...@yahoo.com. invalid> wrote: >>>
Re: Wildcard searches with space in TextField/StrField
what is the fieldType of those records? On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode < sandeep_khanz...@yahoo.com.invalid> wrote: > Hi Erick, > I gave this a try. > These are my results. There is a record with "John D. Smith", and another > named "John Doe". > > 1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any > results. > > 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. > > > > Second observation: There is a record with "John D Smith" > 1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any > results. > > 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. > > 3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. > > SRK > > On Sunday, November 13, 2016 7:43 AM, Erick Erickson < > erickerick...@gmail.com> wrote: > > > Right, for that kind of use case you want complexPhraseQueryParser, > see: https://cwiki.apache.org/confluence/display/solr/Other+ > Parsers#OtherParsers-ComplexPhraseQueryParser > > Best, > Erick > > On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode >wrote: > > Thanks, Erick. > > > > I am actually not trying to use the String field (prefer a TextField > here). > > But, in my comparisons with TextField, it seems that something like > phrase > > matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', > or > > say, 'my dog has*') can only be accomplished with a string type field, > > especially because, with a WhitespaceTokenizer in TextField, the space > will > > be lost, and all tokens will be individually considered. Am I missing > > something? > > > > SRK > > > > > > On Friday, November 11, 2016 10:05 PM, Erick Erickson > > wrote: > > > > > > You have to query text and string fields differently, that's just the > > way it works. The problem is getting the query string through the > > parser as a _single_ token or as multiple tokens. > > > > Let's say you have a string field with the "a b" example. You have a > > single token > > a b that starts at offset 0. > > > > But with a text field, you have two tokens, > > a at position 0 > > b at position 1 > > > > But when the query parser sees "a b" (without quotes) it splits it > > into two tokens, and only the text field has both tokens so the string > > field won't match. > > > > OTOH, when the query parser sees "a\ b" it passes this through as a > > single token, which only matches the string field as there's no > > _single_ token "a b" in the text field. > > > > But a more interesting question is why you want to search this way. > > String fields are intended for keywords, machine-generated IDs and the > > like. They're pretty useless for searching anything except > > 1> exact tokens > > 2> prefixes > > > > While if you have "my dog has fleas" in a string field, you _can_ > > search "*dog*" and get a hit but the performance is poor when you get > > a large corpus. Performance for "my*" will be pretty good though. > > > > In all this sounds like an XY problem, what's the use-case you're > > trying to solve? > > > > Best, > > Erick > > > > > > > > On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode > > wrote: > >> Hi Erick, Reth, > >> > >> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only > >> for StrField for me. > >> > >> Any attempt at creating a 'a\ b*' for a TextField does not match any > >> documents. The parsedQuery in debug mode does show 'field:a b*'. I am > sure > >> there are documents that should match. > >> Another (maybe unrelated) observation is if I have 'field:a\ b', then > the > >> parsedQuery is field:a field:b. Which does not match as expected > (matches > >> individually). > >> > >> Can you please provide an example that I can use in Solr Query > dashboard? > >> That will be helpful. > >> > >> I have also seen that wildcard queries work irrespective of field type > >> i.e. StrField as well as TextField. That makes sense because with a > >> WhitespaceTokenizer only creates word boundaries when we do not use a > >> EdgeNGramFilter. If I am not wrong, that is. SRK > >> > >>On Friday, November 11, 2016 5:00 AM, Erick Erickson > >> wrote: > >> > >> > >> You can escape the space with a backslash as 'a\ b*' > >> > >> Best, > >> Erick > >> > >> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM wrote: > >>> I don't think you can do wildcard on StrField. For text field, if your > >>> query is "category:(test m*)" the parsed query will be "category:test > >>> OR > >>> category:m*" > >>> You can add q.op=AND to make an AND between those terms. > >>> > >>> For phrase type wild card query support, as per docs, it > >>> is ComplexPhraseQueryParser that supports it. (I haven't tested it > >>> myself) > >>> > >>> > >>> https://cwiki.apache.org/confluence/display/solr/Other+ > Parsers#OtherParsers-ComplexPhraseQueryParser > >>> > >>> On Thu, Nov 10,
Re: Wildcard searches with space in TextField/StrField
Hi Erick, I gave this a try. These are my results. There is a record with "John D. Smith", and another named "John Doe". 1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any results. 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. Second observation: There is a record with "John D Smith" 1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any results. 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. 3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. SRK On Sunday, November 13, 2016 7:43 AM, Erick Ericksonwrote: Right, for that kind of use case you want complexPhraseQueryParser, see: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser Best, Erick On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode wrote: > Thanks, Erick. > > I am actually not trying to use the String field (prefer a TextField here). > But, in my comparisons with TextField, it seems that something like phrase > matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or > say, 'my dog has*') can only be accomplished with a string type field, > especially because, with a WhitespaceTokenizer in TextField, the space will > be lost, and all tokens will be individually considered. Am I missing > something? > > SRK > > > On Friday, November 11, 2016 10:05 PM, Erick Erickson > wrote: > > > You have to query text and string fields differently, that's just the > way it works. The problem is getting the query string through the > parser as a _single_ token or as multiple tokens. > > Let's say you have a string field with the "a b" example. You have a > single token > a b that starts at offset 0. > > But with a text field, you have two tokens, > a at position 0 > b at position 1 > > But when the query parser sees "a b" (without quotes) it splits it > into two tokens, and only the text field has both tokens so the string > field won't match. > > OTOH, when the query parser sees "a\ b" it passes this through as a > single token, which only matches the string field as there's no > _single_ token "a b" in the text field. > > But a more interesting question is why you want to search this way. > String fields are intended for keywords, machine-generated IDs and the > like. They're pretty useless for searching anything except > 1> exact tokens > 2> prefixes > > While if you have "my dog has fleas" in a string field, you _can_ > search "*dog*" and get a hit but the performance is poor when you get > a large corpus. Performance for "my*" will be pretty good though. > > In all this sounds like an XY problem, what's the use-case you're > trying to solve? > > Best, > Erick > > > > On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode > wrote: >> Hi Erick, Reth, >> >> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only >> for StrField for me. >> >> Any attempt at creating a 'a\ b*' for a TextField does not match any >> documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure >> there are documents that should match. >> Another (maybe unrelated) observation is if I have 'field:a\ b', then the >> parsedQuery is field:a field:b. Which does not match as expected (matches >> individually). >> >> Can you please provide an example that I can use in Solr Query dashboard? >> That will be helpful. >> >> I have also seen that wildcard queries work irrespective of field type >> i.e. StrField as well as TextField. That makes sense because with a >> WhitespaceTokenizer only creates word boundaries when we do not use a >> EdgeNGramFilter. If I am not wrong, that is. SRK >> >> On Friday, November 11, 2016 5:00 AM, Erick Erickson >> wrote: >> >> >> You can escape the space with a backslash as 'a\ b*' >> >> Best, >> Erick >> >> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM wrote: >>> I don't think you can do wildcard on StrField. For text field, if your >>> query is "category:(test m*)" the parsed query will be "category:test >>> OR >>> category:m*" >>> You can add q.op=AND to make an AND between those terms. >>> >>> For phrase type wild card query support, as per docs, it >>> is ComplexPhraseQueryParser that supports it. (I haven't tested it >>> myself) >>> >>> >>> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser >>> >>> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode < >>> sandeep_khanz...@yahoo.com.invalid> wrote: >>> Hi, How does a search like abc* work in StrField. Since the entire thing is stored as a single token, is it a type of a trie structure that allows such wildcard matching? How can searches with space like 'a b*' be executed for text fields (tokenized on whitespace)? If we specify
Re: Wildcard searches with space in TextField/StrField
Thanks, Erick. I am actually not trying to use the String field (prefer a TextField here). But, in my comparisons with TextField, it seems that something like phrase matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or say, 'my dog has*') can only be accomplished with a string type field, especially because, with a WhitespaceTokenizer in TextField, the space will be lost, and all tokens will be individually considered. Am I missing something? SRK On Friday, November 11, 2016 10:05 PM, Erick Ericksonwrote: You have to query text and string fields differently, that's just the way it works. The problem is getting the query string through the parser as a _single_ token or as multiple tokens. Let's say you have a string field with the "a b" example. You have a single token a b that starts at offset 0. But with a text field, you have two tokens, a at position 0 b at position 1 But when the query parser sees "a b" (without quotes) it splits it into two tokens, and only the text field has both tokens so the string field won't match. OTOH, when the query parser sees "a\ b" it passes this through as a single token, which only matches the string field as there's no _single_ token "a b" in the text field. But a more interesting question is why you want to search this way. String fields are intended for keywords, machine-generated IDs and the like. They're pretty useless for searching anything except 1> exact tokens 2> prefixes While if you have "my dog has fleas" in a string field, you _can_ search "*dog*" and get a hit but the performance is poor when you get a large corpus. Performance for "my*" will be pretty good though. In all this sounds like an XY problem, what's the use-case you're trying to solve? Best, Erick On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode wrote: > Hi Erick, Reth, > > The 'a\ b*' as well as the q.op=AND approach worked (successfully) only for > StrField for me. > > Any attempt at creating a 'a\ b*' for a TextField does not match any > documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure > there are documents that should match. > Another (maybe unrelated) observation is if I have 'field:a\ b', then the > parsedQuery is field:a field:b. Which does not match as expected (matches > individually). > > Can you please provide an example that I can use in Solr Query dashboard? > That will be helpful. > > I have also seen that wildcard queries work irrespective of field type i.e. > StrField as well as TextField. That makes sense because with a > WhitespaceTokenizer only creates word boundaries when we do not use a > EdgeNGramFilter. If I am not wrong, that is. SRK > > On Friday, November 11, 2016 5:00 AM, Erick Erickson > wrote: > > > You can escape the space with a backslash as 'a\ b*' > > Best, > Erick > > On Thu, Nov 10, 2016 at 2:37 PM, Reth RM wrote: >> I don't think you can do wildcard on StrField. For text field, if your >> query is "category:(test m*)" the parsed query will be "category:test OR >> category:m*" >> You can add q.op=AND to make an AND between those terms. >> >> For phrase type wild card query support, as per docs, it >> is ComplexPhraseQueryParser that supports it. (I haven't tested it myself) >> >> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser >> >> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode < >> sandeep_khanz...@yahoo.com.invalid> wrote: >> >>> Hi, >>> How does a search like abc* work in StrField. Since the entire thing is >>> stored as a single token, is it a type of a trie structure that allows such >>> wildcard matching? >>> How can searches with space like 'a b*' be executed for text fields >>> (tokenized on whitespace)? If we specify this type of query, it is broken >>> down into two queries with field:a and field:b*. I would like them to be >>> contiguous, sort of, like a phrase search with wild card. >>> SRK > > >
Re: Wildcard searches with space in TextField/StrField
You have to query text and string fields differently, that's just the way it works. The problem is getting the query string through the parser as a _single_ token or as multiple tokens. Let's say you have a string field with the "a b" example. You have a single token a b that starts at offset 0. But with a text field, you have two tokens, a at position 0 b at position 1 But when the query parser sees "a b" (without quotes) it splits it into two tokens, and only the text field has both tokens so the string field won't match. OTOH, when the query parser sees "a\ b" it passes this through as a single token, which only matches the string field as there's no _single_ token "a b" in the text field. But a more interesting question is why you want to search this way. String fields are intended for keywords, machine-generated IDs and the like. They're pretty useless for searching anything except 1> exact tokens 2> prefixes While if you have "my dog has fleas" in a string field, you _can_ search "*dog*" and get a hit but the performance is poor when you get a large corpus. Performance for "my*" will be pretty good though. In all this sounds like an XY problem, what's the use-case you're trying to solve? Best, Erick On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzodewrote: > Hi Erick, Reth, > > The 'a\ b*' as well as the q.op=AND approach worked (successfully) only for > StrField for me. > > Any attempt at creating a 'a\ b*' for a TextField does not match any > documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure > there are documents that should match. > Another (maybe unrelated) observation is if I have 'field:a\ b', then the > parsedQuery is field:a field:b. Which does not match as expected (matches > individually). > > Can you please provide an example that I can use in Solr Query dashboard? > That will be helpful. > > I have also seen that wildcard queries work irrespective of field type i.e. > StrField as well as TextField. That makes sense because with a > WhitespaceTokenizer only creates word boundaries when we do not use a > EdgeNGramFilter. If I am not wrong, that is. SRK > > On Friday, November 11, 2016 5:00 AM, Erick Erickson > wrote: > > > You can escape the space with a backslash as 'a\ b*' > > Best, > Erick > > On Thu, Nov 10, 2016 at 2:37 PM, Reth RM wrote: >> I don't think you can do wildcard on StrField. For text field, if your >> query is "category:(test m*)" the parsed query will be "category:test OR >> category:m*" >> You can add q.op=AND to make an AND between those terms. >> >> For phrase type wild card query support, as per docs, it >> is ComplexPhraseQueryParser that supports it. (I haven't tested it myself) >> >> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser >> >> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode < >> sandeep_khanz...@yahoo.com.invalid> wrote: >> >>> Hi, >>> How does a search like abc* work in StrField. Since the entire thing is >>> stored as a single token, is it a type of a trie structure that allows such >>> wildcard matching? >>> How can searches with space like 'a b*' be executed for text fields >>> (tokenized on whitespace)? If we specify this type of query, it is broken >>> down into two queries with field:a and field:b*. I would like them to be >>> contiguous, sort of, like a phrase search with wild card. >>> SRK > > >
Re: Wildcard searches with space in TextField/StrField
Hi Erick, Reth, The 'a\ b*' as well as the q.op=AND approach worked (successfully) only for StrField for me. Any attempt at creating a 'a\ b*' for a TextField does not match any documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure there are documents that should match. Another (maybe unrelated) observation is if I have 'field:a\ b', then the parsedQuery is field:a field:b. Which does not match as expected (matches individually). Can you please provide an example that I can use in Solr Query dashboard? That will be helpful. I have also seen that wildcard queries work irrespective of field type i.e. StrField as well as TextField. That makes sense because with a WhitespaceTokenizer only creates word boundaries when we do not use a EdgeNGramFilter. If I am not wrong, that is. SRK On Friday, November 11, 2016 5:00 AM, Erick Ericksonwrote: You can escape the space with a backslash as 'a\ b*' Best, Erick On Thu, Nov 10, 2016 at 2:37 PM, Reth RM wrote: > I don't think you can do wildcard on StrField. For text field, if your > query is "category:(test m*)" the parsed query will be "category:test OR > category:m*" > You can add q.op=AND to make an AND between those terms. > > For phrase type wild card query support, as per docs, it > is ComplexPhraseQueryParser that supports it. (I haven't tested it myself) > > https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser > > On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode < > sandeep_khanz...@yahoo.com.invalid> wrote: > >> Hi, >> How does a search like abc* work in StrField. Since the entire thing is >> stored as a single token, is it a type of a trie structure that allows such >> wildcard matching? >> How can searches with space like 'a b*' be executed for text fields >> (tokenized on whitespace)? If we specify this type of query, it is broken >> down into two queries with field:a and field:b*. I would like them to be >> contiguous, sort of, like a phrase search with wild card. >> SRK
Re: Wildcard searches with space in TextField/StrField
You can escape the space with a backslash as 'a\ b*' Best, Erick On Thu, Nov 10, 2016 at 2:37 PM, Reth RMwrote: > I don't think you can do wildcard on StrField. For text field, if your > query is "category:(test m*)" the parsed query will be "category:test OR > category:m*" > You can add q.op=AND to make an AND between those terms. > > For phrase type wild card query support, as per docs, it > is ComplexPhraseQueryParser that supports it. (I haven't tested it myself) > > https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser > > On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode < > sandeep_khanz...@yahoo.com.invalid> wrote: > >> Hi, >> How does a search like abc* work in StrField. Since the entire thing is >> stored as a single token, is it a type of a trie structure that allows such >> wildcard matching? >> How can searches with space like 'a b*' be executed for text fields >> (tokenized on whitespace)? If we specify this type of query, it is broken >> down into two queries with field:a and field:b*. I would like them to be >> contiguous, sort of, like a phrase search with wild card. >> SRK
Re: Wildcard searches with space in TextField/StrField
I don't think you can do wildcard on StrField. For text field, if your query is "category:(test m*)" the parsed query will be "category:test OR category:m*" You can add q.op=AND to make an AND between those terms. For phrase type wild card query support, as per docs, it is ComplexPhraseQueryParser that supports it. (I haven't tested it myself) https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode < sandeep_khanz...@yahoo.com.invalid> wrote: > Hi, > How does a search like abc* work in StrField. Since the entire thing is > stored as a single token, is it a type of a trie structure that allows such > wildcard matching? > How can searches with space like 'a b*' be executed for text fields > (tokenized on whitespace)? If we specify this type of query, it is broken > down into two queries with field:a and field:b*. I would like them to be > contiguous, sort of, like a phrase search with wild card. > SRK