Re: Problem with porter stemming
Hello. I want to set LMJelinekMercer Similarity (with lambda set to, say, 0.6) for the Luke similarity calculation. Luke by default use the DefaultSimilarity. Can anyone help with this? I use Lucene 4.10.4 and Luke for that version of Lucene index. Dwaipayan
Re: Problem with porter stemming
Stemming is an inherently limited process. It doesn't know about the word 'news', it just has a rule about 's'. Some of us sell commercial products that do more complex linguistic processing that knows about which words are which. There may be open source implementations of similar technology. On Mon, Mar 14, 2016 at 12:13 PM, Ahmet Arslanwrote: > Hi Dwaipayan, > > Another way is to use KeywordMarkerFilter. Stemmer implementations respect > this attribute. > If you want to supply your own mappings, StemmerOverrideTokenFilter could be > used as well. > > ahmet > > > On Monday, March 14, 2016 4:31 PM, Dwaipayan Roy > wrote: > > > > I am using EnglishAnalyzer with my own stopword list. EnglishAnalyzer uses > the porter stemmer (snowball) to stem the words. But using the > EnglishAnalyzer, I am getting erroneous result for 'news'. 'news' is > getting stemmed into 'new'. > > Any help would be appreciated. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Problem with porter stemming
Hi Dwaipayan, Another way is to use KeywordMarkerFilter. Stemmer implementations respect this attribute. If you want to supply your own mappings, StemmerOverrideTokenFilter could be used as well. ahmet On Monday, March 14, 2016 4:31 PM, Dwaipayan Roywrote: I am using EnglishAnalyzer with my own stopword list. EnglishAnalyzer uses the porter stemmer (snowball) to stem the words. But using the EnglishAnalyzer, I am getting erroneous result for 'news'. 'news' is getting stemmed into 'new'. Any help would be appreciated. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Problem with porter stemming
Hi - if you don't want specific words passed through a stemmer, you need to supply a CharArraySet with exclusions as the second argument to its constructor. Markus -Original message- > From:Dwaipayan Roy <dwaipayan@gmail.com> > Sent: Monday 14th March 2016 15:31 > To: java-user@lucene.apache.org > Subject: Problem with porter stemming > > I am using EnglishAnalyzer with my own stopword list. EnglishAnalyzer uses > the porter stemmer (snowball) to stem the words. But using the > EnglishAnalyzer, I am getting erroneous result for 'news'. 'news' is > getting stemmed into 'new'. > > Any help would be appreciated. > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Problem with porter stemming
I am using EnglishAnalyzer with my own stopword list. EnglishAnalyzer uses the porter stemmer (snowball) to stem the words. But using the EnglishAnalyzer, I am getting erroneous result for 'news'. 'news' is getting stemmed into 'new'. Any help would be appreciated.