date:20140910

Re: Czech stemmer

2014-09-10 Thread Lukáš Vlček

Hi,

I would recommend you to look at stemmer or token filter based on Hunspell
dictionaries. I am not a Solr user so I can not point you to appropriate
documentation about this but Czech dictionary that can be used with
Hunspell is of high quality. It can be downloaded from OpenOffice here
http://extensions.services.openoffice.org/en/project/czech-dictionary-pack-ceske-slovniky-cs-cz
(distributed under GPL).

Note: when I was looking at it the last time I noticed that the dictionary
contained one broken affix rule which may require manual fix depending on
how strict the rule loaded is in Solr. If you are interested in more
details and can not figure it yourself feel free to ping me again, I can
point you to some resources about how I used it in connection with
Elasticsearch, I assume the basic concepts apply to Solr as well.

Regards,
Lukas

2014-09-09 22:14 GMT+02:00 Shamik Bandopadhyay sham...@gmail.com:

Hi,

I'm facing stemming issues with the Czech language search. Solr/Lucene
currently provides CzechStemFilterFactory as the sole option. Snowball
Porter doesn't seem to be available for Czech. Here's the issue.

I'm trying to search for posunout (means move in English) which returns
result, but fails if I use ''posunulo (means moved in English). I used the
following text as field for search.

Pomocí multifunkčních uzlů je možné odkazy mnoha způsoby upravovat. Můžete
přidat a odstranit odkazy, přidat a odstranit vrcholy, prodloužit nebo
přesunout prodloužení čáry nebo přesunout text odkazu. Přístup k požadované
možnosti získáte po přesunutí ukazatele myši na uzel. Z uzlu prodloužení
čáry můžete zvolit tyto možnosti: Protáhnout: Umožňuje posunout prodloužení
odkazové čáry. Délka prodloužení čáry: Umožňuje prodloužit prodloužení
čáry. Přidat odkaz: Umožňuje přidat jednu nebo více odkazových čar. Z uzlu
koncového bodu odkazu můžete zvolit tyto možnosti: Protáhnout: Umožňuje
posunout koncový bod odkazové čáry. Přidat vrchol: Umožňuje přidat vrchol k
odkazové čáře. Odstranit odkaz: Umožňuje odstranit vybranou odkazovou čáru.
Z uzlu vrcholu odkazu můžete zvolit tyto možnosti: Protáhnout: Umožňuje
posunout vrchol. Přidat vrchol: Umožňuje přidat vrchol na odkazovou čáru.
Odstranit vrchol: Umožňuje odstranit vrchol.

Just wondering if there's a different stemmer available or a way to address
this.

Schema :

fieldType name=text_csy class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_cz.txt /
filter class=solr.SynonymFilterFactory synonyms=synonyms_csy.txt
ignoreCase=true expand=true/
filter class=solr.CzechStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_cz.txt /
filter class=solr.CzechStemFilterFactory/
/analyzer
/fieldType

37 matches

Mail list logo