Re: Text field case sensitivity problem
I'm not familiar with the CharFilters, I'll look into those now. Is the solr.LowerCaseFilterFactory not handling wildcards the expected result or is this a bug? On Wed, Jun 15, 2011 at 4:34 PM, Mike Sokolov soko...@ifactory.com wrote: I wonder whether CharFilters are applied to wildcard terms? I suspect they might be. If that's the case, you could use the MappingCharFilter to perform lowercasing (and strip diacritics too if you want that) -Mike On 06/15/2011 10:12 AM, Jamie Johnson wrote: So simply lower casing the works but can get complex. The query that I'm executing may have things like ranges which require some words to be upper case (i.e. TO). I think this would be much better solved on Solrs end, is there a JIRA about this? On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov soko...@ifactory.com wrote: opps, please s/Highlight/Wildcard/ On 06/14/2011 05:31 PM, Mike Sokolov wrote: Wildcard queries aren't analyzed, I think? I'm not completely sure what the best workaround is here: perhaps simply lowercasing the query terms yourself in the application. Also - I hope someone more knowledgeable will say that the new HighlightQuery in trunk doesn't have this restriction, but I'm not sure about that. -Mike On 06/14/2011 05:13 PM, Jamie Johnson wrote: Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
Re: Text field case sensitivity problem
I think my answer is here... On wildcard and fuzzy searches, no text analysis is performed on the search word. taken from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers On Thu, Jun 30, 2011 at 10:23 AM, Jamie Johnson jej2...@gmail.com wrote: I'm not familiar with the CharFilters, I'll look into those now. Is the solr.LowerCaseFilterFactory not handling wildcards the expected result or is this a bug? On Wed, Jun 15, 2011 at 4:34 PM, Mike Sokolov soko...@ifactory.com wrote: I wonder whether CharFilters are applied to wildcard terms? I suspect they might be. If that's the case, you could use the MappingCharFilter to perform lowercasing (and strip diacritics too if you want that) -Mike On 06/15/2011 10:12 AM, Jamie Johnson wrote: So simply lower casing the works but can get complex. The query that I'm executing may have things like ranges which require some words to be upper case (i.e. TO). I think this would be much better solved on Solrs end, is there a JIRA about this? On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov soko...@ifactory.com wrote: opps, please s/Highlight/Wildcard/ On 06/14/2011 05:31 PM, Mike Sokolov wrote: Wildcard queries aren't analyzed, I think? I'm not completely sure what the best workaround is here: perhaps simply lowercasing the query terms yourself in the application. Also - I hope someone more knowledgeable will say that the new HighlightQuery in trunk doesn't have this restriction, but I'm not sure about that. -Mike On 06/14/2011 05:13 PM, Jamie Johnson wrote: Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
Re: Text field case sensitivity problem
Yes, after posting that response, I read some more and came to the same conclusion... there seems to be some interest on the dev list in building a capability to specify an analysis chain for use with wildcard and related queries, but it doesn't exist now. -Mike On 06/30/2011 10:34 AM, Jamie Johnson wrote: I think my answer is here... On wildcard and fuzzy searches, no text analysis is performed on the search word. taken from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers On Thu, Jun 30, 2011 at 10:23 AM, Jamie Johnsonjej2...@gmail.com wrote: I'm not familiar with the CharFilters, I'll look into those now. Is the solr.LowerCaseFilterFactory not handling wildcards the expected result or is this a bug? On Wed, Jun 15, 2011 at 4:34 PM, Mike Sokolovsoko...@ifactory.com wrote: I wonder whether CharFilters are applied to wildcard terms? I suspect they might be. If that's the case, you could use the MappingCharFilter to perform lowercasing (and strip diacritics too if you want that) -Mike On 06/15/2011 10:12 AM, Jamie Johnson wrote: So simply lower casing the works but can get complex. The query that I'm executing may have things like ranges which require some words to be upper case (i.e. TO). I think this would be much better solved on Solrs end, is there a JIRA about this? On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolovsoko...@ifactory.com wrote: opps, please s/Highlight/Wildcard/ On 06/14/2011 05:31 PM, Mike Sokolov wrote: Wildcard queries aren't analyzed, I think? I'm not completely sure what the best workaround is here: perhaps simply lowercasing the query terms yourself in the application. Also - I hope someone more knowledgeable will say that the new HighlightQuery in trunk doesn't have this restriction, but I'm not sure about that. -Mike On 06/14/2011 05:13 PM, Jamie Johnson wrote: Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
Re: Text field case sensitivity problem
Jamie - there is a JIRA about this, at least one: https://issues.apache.org/jira/browse/SOLR-218 Erik On Jun 15, 2011, at 10:12 , Jamie Johnson wrote: So simply lower casing the works but can get complex. The query that I'm executing may have things like ranges which require some words to be upper case (i.e. TO). I think this would be much better solved on Solrs end, is there a JIRA about this? On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov soko...@ifactory.com wrote: opps, please s/Highlight/Wildcard/ On 06/14/2011 05:31 PM, Mike Sokolov wrote: Wildcard queries aren't analyzed, I think? I'm not completely sure what the best workaround is here: perhaps simply lowercasing the query terms yourself in the application. Also - I hope someone more knowledgeable will say that the new HighlightQuery in trunk doesn't have this restriction, but I'm not sure about that. -Mike On 06/14/2011 05:13 PM, Jamie Johnson wrote: Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
Re: Text field case sensitivity problem
Yes, and this too: https://issues.apache.org/jira/browse/SOLR-219 On 06/30/2011 12:46 PM, Erik Hatcher wrote: Jamie - there is a JIRA about this, at least one:https://issues.apache.org/jira/browse/SOLR-218 Erik On Jun 15, 2011, at 10:12 , Jamie Johnson wrote: So simply lower casing the works but can get complex. The query that I'm executing may have things like ranges which require some words to be upper case (i.e. TO). I think this would be much better solved on Solrs end, is there a JIRA about this? On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolovsoko...@ifactory.com wrote: opps, please s/Highlight/Wildcard/ On 06/14/2011 05:31 PM, Mike Sokolov wrote: Wildcard queries aren't analyzed, I think? I'm not completely sure what the best workaround is here: perhaps simply lowercasing the query terms yourself in the application. Also - I hope someone more knowledgeable will say that the new HighlightQuery in trunk doesn't have this restriction, but I'm not sure about that. -Mike On 06/14/2011 05:13 PM, Jamie Johnson wrote: Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
Re: Text field case sensitivity problem
So simply lower casing the works but can get complex. The query that I'm executing may have things like ranges which require some words to be upper case (i.e. TO). I think this would be much better solved on Solrs end, is there a JIRA about this? On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov soko...@ifactory.com wrote: opps, please s/Highlight/Wildcard/ On 06/14/2011 05:31 PM, Mike Sokolov wrote: Wildcard queries aren't analyzed, I think? I'm not completely sure what the best workaround is here: perhaps simply lowercasing the query terms yourself in the application. Also - I hope someone more knowledgeable will say that the new HighlightQuery in trunk doesn't have this restriction, but I'm not sure about that. -Mike On 06/14/2011 05:13 PM, Jamie Johnson wrote: Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
Re: Text field case sensitivity problem
I wonder whether CharFilters are applied to wildcard terms? I suspect they might be. If that's the case, you could use the MappingCharFilter to perform lowercasing (and strip diacritics too if you want that) -Mike On 06/15/2011 10:12 AM, Jamie Johnson wrote: So simply lower casing the works but can get complex. The query that I'm executing may have things like ranges which require some words to be upper case (i.e. TO). I think this would be much better solved on Solrs end, is there a JIRA about this? On Tue, Jun 14, 2011 at 5:33 PM, Mike Sokolov soko...@ifactory.com mailto:soko...@ifactory.com wrote: opps, please s/Highlight/Wildcard/ On 06/14/2011 05:31 PM, Mike Sokolov wrote: Wildcard queries aren't analyzed, I think? I'm not completely sure what the best workaround is here: perhaps simply lowercasing the query terms yourself in the application. Also - I hope someone more knowledgeable will say that the new HighlightQuery in trunk doesn't have this restriction, but I'm not sure about that. -Mike On 06/14/2011 05:13 PM, Jamie Johnson wrote: Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com mailto:jej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
Text field case sensitivity problem
I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
Re: Text field case sensitivity problem
Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnson jej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
Re: Text field case sensitivity problem
Wildcard queries aren't analyzed, I think? I'm not completely sure what the best workaround is here: perhaps simply lowercasing the query terms yourself in the application. Also - I hope someone more knowledgeable will say that the new HighlightQuery in trunk doesn't have this restriction, but I'm not sure about that. -Mike On 06/14/2011 05:13 PM, Jamie Johnson wrote: Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
RE: Text field case sensitivity problem
Unfortunately, wild card search terms don't get processed by the analyzers. One suggestion that's fairly common is to make sure you lower case your wild card search terms yourself before issuing the query. Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Jamie Johnson [mailto:jej2...@gmail.com] Sent: Tuesday, June 14, 2011 5:13 PM To: solr-user@lucene.apache.org Subject: Re: Text field case sensitivity problem Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnson jej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
Re: Text field case sensitivity problem
opps, please s/Highlight/Wildcard/ On 06/14/2011 05:31 PM, Mike Sokolov wrote: Wildcard queries aren't analyzed, I think? I'm not completely sure what the best workaround is here: perhaps simply lowercasing the query terms yourself in the application. Also - I hope someone more knowledgeable will say that the new HighlightQuery in trunk doesn't have this restriction, but I'm not sure about that. -Mike On 06/14/2011 05:13 PM, Jamie Johnson wrote: Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnsonjej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?