> On Jul 8, 2016, at 08:36, Abhinav Upadhyay <er.abhinav.upadh...@gmail.com> > wrote: > > On Fri, Jul 8, 2016 at 8:56 PM, Tom Ivar Helbekkmo <t...@hamartun.priv.no> > wrote: >> Abhinav Upadhyay <er.abhinav.upadh...@gmail.com> writes: >> >>> We just need to handle the special cases where we don't want to stem :) >> >> ...or perhaps do the stemming only when the resulting stem is found in >> /usr/share/dict/words? > > Yes, that's probably a good idea. I first need to write the custom > tokenizer and I can probably use that dictionary to decide what to > stem and what not to stem. > > - > Abhinav
In principle a lot of technical names are marked up in mandoc as “.Tn foo” which might provide a good list of words to “not stem.” Erik Fair