Re: ngramfilter minGramSize problem
it works well. now why does the search only find something when the fieldname is added to the query with stopwords? cug - 9 hits mit cug - 0 hits plain_text:mit cug - 9 hits why is this so? could it be a problem that stopwords aren't used in the query because no all fields that are search have the stopwordfilter? On Mon, 07 Apr 2014 00:37:15 +0200, Furkan KAMACI furkankam...@gmail.com wrote: Correction: My patch is at SOLR-5152 7 Nis 2014 01:05 tarihinde Andreas Owen ao...@swissonline.ch yazdı: i thought i cound use filter class=solr.LengthFilterFactory min=1 max=2/ to index and search words that are only 1 or 2 chars long. it seems to work but i have to test it some more On Sun, 06 Apr 2014 22:24:20 +0200, Andreas Owen ao...@swissonline.ch wrote: i have the a fieldtype that uses ngramfilter whle indexing. is there a setting that can force the ngramfilter to index smaller words then the minGramSize? Mine is set to 3 and the search wont find word that are only 1 or 2 chars long. i would like to not set minGramSize=1 because the results would be to diverse. fieldtype: fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ !-- filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ -- filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=50/ /analyzer analyzer type=query tokenizer class=solr. WhiteSpaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr. GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ /analyzer /fieldType -- Using Opera's mail client: http://www.opera.com/mail/ -- Using Opera's mail client: http://www.opera.com/mail/
ngramfilter minGramSize problem
i have the a fieldtype that uses ngramfilter whle indexing. is there a setting that can force the ngramfilter to index smaller words then the minGramSize? Mine is set to 3 and the search wont find word that are only 1 or 2 chars long. i would like to not set minGramSize=1 because the results would be to diverse. fieldtype: fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ !-- filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ -- filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=50/ /analyzer analyzer type=query tokenizer class=solr.WhiteSpaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ /analyzer /fieldType
Re: ngramfilter minGramSize problem
Hi Andreas; I've implemented a similar feature into EdgeNgramFilter due to some Solr users wants it. My patch is here: https://issues.apache.org/jira/browse/SOLR-5332 However if you read the conversation below the issue you will realize that you can do it with another way. Thanks; Furkan KAMACI 2014-04-06 23:24 GMT+03:00 Andreas Owen ao...@swissonline.ch: i have the a fieldtype that uses ngramfilter whle indexing. is there a setting that can force the ngramfilter to index smaller words then the minGramSize? Mine is set to 3 and the search wont find word that are only 1 or 2 chars long. i would like to not set minGramSize=1 because the results would be to diverse. fieldtype: fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ !-- filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ -- filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=50/ /analyzer analyzer type=query tokenizer class=solr. WhiteSpaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFacto ry/ filter class=solr.SnowballPorterFilterFactory language=German/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ /analyzer /fieldType
Re: ngramfilter minGramSize problem
i thought i cound use filter class=solr.LengthFilterFactory min=1 max=2/ to index and search words that are only 1 or 2 chars long. it seems to work but i have to test it some more On Sun, 06 Apr 2014 22:24:20 +0200, Andreas Owen ao...@swissonline.ch wrote: i have the a fieldtype that uses ngramfilter whle indexing. is there a setting that can force the ngramfilter to index smaller words then the minGramSize? Mine is set to 3 and the search wont find word that are only 1 or 2 chars long. i would like to not set minGramSize=1 because the results would be to diverse. fieldtype: fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ !-- filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ -- filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=50/ /analyzer analyzer type=query tokenizer class=solr.WhiteSpaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ /analyzer /fieldType -- Using Opera's mail client: http://www.opera.com/mail/
Re: ngramfilter minGramSize problem
Correction: My patch is at SOLR-5152 7 Nis 2014 01:05 tarihinde Andreas Owen ao...@swissonline.ch yazdı: i thought i cound use filter class=solr.LengthFilterFactory min=1 max=2/ to index and search words that are only 1 or 2 chars long. it seems to work but i have to test it some more On Sun, 06 Apr 2014 22:24:20 +0200, Andreas Owen ao...@swissonline.ch wrote: i have the a fieldtype that uses ngramfilter whle indexing. is there a setting that can force the ngramfilter to index smaller words then the minGramSize? Mine is set to 3 and the search wont find word that are only 1 or 2 chars long. i would like to not set minGramSize=1 because the results would be to diverse. fieldtype: fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ !-- filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ -- filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=50/ /analyzer analyzer type=query tokenizer class=solr. WhiteSpaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr. GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ /analyzer /fieldType -- Using Opera's mail client: http://www.opera.com/mail/