Couldn't you just mod the PorterStemmer class for your requirements?
(we did and provided it a list of ignore words & phrases specific to
our needs)

On Sat, Jan 9, 2010 at 4:00 AM, Jamie <[email protected]> wrote:
> Hi All
>
> Is there another stemmer we can use that is perhaps not as aggressive as the
> Porter Stemmer. i.e. the stemming could remove ing's, er's, but not
> something so significant as to convert ""Lowe's" to "Low"
>
> Thanks
>
> Jamie
>
> Will Murnane wrote:
>>
>> On Fri, Jan 8, 2010 at 16:27, Jamie <[email protected]> wrote:
>>
>>>
>>> Hi Ian / Will
>>>
>>> Thanks. Surely, the Porter Stemmer should not stem proper noun's. i.e. it
>>> could check the capitalization of the first letter of a word and whether
>>> or
>>> not the word is the start of sentence. If so, it could choose not apply
>>> any
>>> stemming. Or am I completely out of whack?
>>>
>>
>> Look again: you're downcasing the terms before the Porter filter ever
>> sees them (which is, AIUI, necessary).  You might do well to combine
>> the tokenizing and downcasing step with some heuristic to find proper
>> nouns and not downcase or stem them.
>>
>> Will
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
>
> --
> Stimulus Software - MailArchiva
> Email Archiving And Compliance
> USA Tel: +1-713-343-8824 ext 100
> UK Tel: +44-20-80991035 ext 100
> Email:  [email protected]
> Web: http://www.mailarchiva.com
> To receive MailArchiva Enterprise Edition product announcements, send a
> message to: <[email protected]>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to