Re: ngramfilter minGramSize problem

2014-04-07 Thread Andreas Owen
it works well. now why does the search only find something when the  
fieldname is added to the query with stopwords?


cug - 9 hits
mit cug - 0 hits
plain_text:mit cug - 9 hits

why is this so? could it be a problem that stopwords aren't used in the  
query because no all fields that are search have the stopwordfilter?



On Mon, 07 Apr 2014 00:37:15 +0200, Furkan KAMACI furkankam...@gmail.com  
wrote:



Correction: My patch is at SOLR-5152
7 Nis 2014 01:05 tarihinde Andreas Owen ao...@swissonline.ch yazdı:


i thought i cound use filter class=solr.LengthFilterFactory min=1
max=2/ to index and search words that are only 1 or 2 chars long. it
seems to work but i have to test it some more


On Sun, 06 Apr 2014 22:24:20 +0200, Andreas Owen ao...@swissonline.ch
wrote:

 i have the a fieldtype that uses ngramfilter whle indexing. is there a

setting that can force the ngramfilter to index smaller words then the
minGramSize? Mine is set to 3 and the search wont find word that are  
only 1
or 2 chars long. i would like to not set minGramSize=1 because the  
results

would be to diverse.

fieldtype:

fieldType name=text_de class=solr.TextField
positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
!-- filter class=solr.WordDelimiterFilterFactory
types=at-under-alpha.txt/ --
filter class=solr.StopFilterFactory  
ignoreCase=true
words=lang/stopwords_de.txt format=snowball  
enablePositionIncrements=true/

!-- remove common words --
 filter class=solr.GermanNormalizationFilterFactory/
filter class=solr.SnowballPorterFilterFactory
language=German/ !-- remove noun/adjective inflections like plural
endings --
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.NGramFilterFactory minGramSize=3
maxGramSize=50/

   /analyzer
   analyzer type=query
tokenizer class=solr.
WhiteSpaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory
ignoreCase=true words=lang/stopwords_de.txt format=snowball
enablePositionIncrements=true/ !-- remove common words --
filter class=solr.
GermanNormalizationFilterFactory/
filter  
class=solr.SnowballPorterFilterFactory

language=German/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
   /analyzer
 /fieldType




--
Using Opera's mail client: http://www.opera.com/mail/




--
Using Opera's mail client: http://www.opera.com/mail/


ngramfilter minGramSize problem

2014-04-06 Thread Andreas Owen
i have the a fieldtype that uses ngramfilter whle indexing. is there a  
setting that can force the ngramfilter to index smaller words then the  
minGramSize? Mine is set to 3 and the search wont find word that are only  
1 or 2 chars long. i would like to not set minGramSize=1 because the  
results would be to diverse.


fieldtype:

fieldType name=text_de class=solr.TextField  
positionIncrementGap=100

  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
		!-- filter class=solr.WordDelimiterFilterFactory  
types=at-under-alpha.txt/ --
		filter class=solr.StopFilterFactory ignoreCase=true  
words=lang/stopwords_de.txt format=snowball  
enablePositionIncrements=true/ !-- remove common words --

filter class=solr.GermanNormalizationFilterFactory/
		filter class=solr.SnowballPorterFilterFactory language=German/  
!-- remove noun/adjective inflections like plural endings --
		filter class=solr.WordDelimiterFilterFactory generateWordParts=1  
generateNumberParts=1 catenateWords=1 catenateNumbers=1  
catenateAll=0 splitOnCaseChange=1/
		filter class=solr.NGramFilterFactory minGramSize=3  
maxGramSize=50/


   /analyzer
   analyzer type=query
tokenizer class=solr.WhiteSpaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
			filter class=solr.StopFilterFactory ignoreCase=true  
words=lang/stopwords_de.txt format=snowball  
enablePositionIncrements=true/ !-- remove common words --

filter class=solr.GermanNormalizationFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=German/
			filter class=solr.WordDelimiterFilterFactory generateWordParts=1  
generateNumberParts=1 catenateWords=1 catenateNumbers=1  
catenateAll=0 splitOnCaseChange=1/

  /analyzer
/fieldType


Re: ngramfilter minGramSize problem

2014-04-06 Thread Furkan KAMACI
Hi Andreas;

I've implemented a similar feature into EdgeNgramFilter due to some Solr
users wants it. My patch is here:
https://issues.apache.org/jira/browse/SOLR-5332 However if you read the
conversation below the issue you will realize that you can do it with
another way.

Thanks;
Furkan KAMACI


2014-04-06 23:24 GMT+03:00 Andreas Owen ao...@swissonline.ch:

 i have the a fieldtype that uses ngramfilter whle indexing. is there a
 setting that can force the ngramfilter to index smaller words then the
 minGramSize? Mine is set to 3 and the search wont find word that are only 1
 or 2 chars long. i would like to not set minGramSize=1 because the results
 would be to diverse.

 fieldtype:

 fieldType name=text_de class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 !-- filter class=solr.WordDelimiterFilterFactory
 types=at-under-alpha.txt/ --
 filter class=solr.StopFilterFactory ignoreCase=true
 words=lang/stopwords_de.txt format=snowball 
 enablePositionIncrements=true/
 !-- remove common words --
 filter class=solr.GermanNormalizationFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=German/ !-- remove noun/adjective inflections like plural
 endings --
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.NGramFilterFactory minGramSize=3
 maxGramSize=50/

/analyzer
analyzer type=query
 tokenizer class=solr.
 WhiteSpaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true words=lang/stopwords_de.txt format=snowball
 enablePositionIncrements=true/ !-- remove common words --
 filter class=solr.GermanNormalizationFilterFacto
 ry/
 filter class=solr.SnowballPorterFilterFactory
 language=German/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
   /analyzer
 /fieldType



Re: ngramfilter minGramSize problem

2014-04-06 Thread Andreas Owen
i thought i cound use filter class=solr.LengthFilterFactory min=1  
max=2/ to index and search words that are only 1 or 2 chars long. it  
seems to work but i have to test it some more



On Sun, 06 Apr 2014 22:24:20 +0200, Andreas Owen ao...@swissonline.ch  
wrote:


i have the a fieldtype that uses ngramfilter whle indexing. is there a  
setting that can force the ngramfilter to index smaller words then the  
minGramSize? Mine is set to 3 and the search wont find word that are  
only 1 or 2 chars long. i would like to not set minGramSize=1 because  
the results would be to diverse.


fieldtype:

fieldType name=text_de class=solr.TextField  
positionIncrementGap=100

   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
		!-- filter class=solr.WordDelimiterFilterFactory  
types=at-under-alpha.txt/ --
		filter class=solr.StopFilterFactory ignoreCase=true  
words=lang/stopwords_de.txt format=snowball  
enablePositionIncrements=true/ !-- remove common words --

 filter class=solr.GermanNormalizationFilterFactory/
		filter class=solr.SnowballPorterFilterFactory language=German/  
!-- remove noun/adjective inflections like plural endings --
		filter class=solr.WordDelimiterFilterFactory generateWordParts=1  
generateNumberParts=1 catenateWords=1 catenateNumbers=1  
catenateAll=0 splitOnCaseChange=1/
		filter class=solr.NGramFilterFactory minGramSize=3  
maxGramSize=50/


   /analyzer
   analyzer type=query
tokenizer class=solr.WhiteSpaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
			filter class=solr.StopFilterFactory ignoreCase=true  
words=lang/stopwords_de.txt format=snowball  
enablePositionIncrements=true/ !-- remove common words --

filter class=solr.GermanNormalizationFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=German/
			filter class=solr.WordDelimiterFilterFactory generateWordParts=1  
generateNumberParts=1 catenateWords=1 catenateNumbers=1  
catenateAll=0 splitOnCaseChange=1/

   /analyzer
 /fieldType



--
Using Opera's mail client: http://www.opera.com/mail/


Re: ngramfilter minGramSize problem

2014-04-06 Thread Furkan KAMACI
Correction: My patch is at SOLR-5152
7 Nis 2014 01:05 tarihinde Andreas Owen ao...@swissonline.ch yazdı:

 i thought i cound use filter class=solr.LengthFilterFactory min=1
 max=2/ to index and search words that are only 1 or 2 chars long. it
 seems to work but i have to test it some more


 On Sun, 06 Apr 2014 22:24:20 +0200, Andreas Owen ao...@swissonline.ch
 wrote:

  i have the a fieldtype that uses ngramfilter whle indexing. is there a
 setting that can force the ngramfilter to index smaller words then the
 minGramSize? Mine is set to 3 and the search wont find word that are only 1
 or 2 chars long. i would like to not set minGramSize=1 because the results
 would be to diverse.

 fieldtype:

 fieldType name=text_de class=solr.TextField
 positionIncrementGap=100
analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
 !-- filter class=solr.WordDelimiterFilterFactory
 types=at-under-alpha.txt/ --
 filter class=solr.StopFilterFactory ignoreCase=true
 words=lang/stopwords_de.txt format=snowball 
 enablePositionIncrements=true/
 !-- remove common words --
  filter class=solr.GermanNormalizationFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=German/ !-- remove noun/adjective inflections like plural
 endings --
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.NGramFilterFactory minGramSize=3
 maxGramSize=50/

/analyzer
analyzer type=query
 tokenizer class=solr.
 WhiteSpaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true words=lang/stopwords_de.txt format=snowball
 enablePositionIncrements=true/ !-- remove common words --
 filter class=solr.
 GermanNormalizationFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=German/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
/analyzer
  /fieldType



 --
 Using Opera's mail client: http://www.opera.com/mail/