> On Jul 8, 2016, at 08:36, Abhinav Upadhyay <er.abhinav.upadh...@gmail.com> 
> wrote:
> 
> On Fri, Jul 8, 2016 at 8:56 PM, Tom Ivar Helbekkmo <t...@hamartun.priv.no> 
> wrote:
>> Abhinav Upadhyay <er.abhinav.upadh...@gmail.com> writes:
>> 
>>> We just need to handle the special cases where we don't want to stem :)
>> 
>> ...or perhaps do the stemming only when the resulting stem is found in
>> /usr/share/dict/words?
> 
> Yes, that's probably a good idea. I first need to write the custom
> tokenizer and I can probably use that dictionary to decide what to
> stem and what not to stem.
> 
> -
> Abhinav

In principle a lot of technical names are marked up in mandoc as “.Tn foo” 
which might provide a good list of words to “not stem.”

        Erik Fair


Reply via email to