Hi Dawid,

Maybe you could use KeywordMarkerFilter, either directly or as a recipe for a 
StopwordMarkerFilter?  

Note that KeywordAttribute is used by most (all?) Lucene stemmers, so I 
wouldn't use KeywordMarkerFilter if your analysis chain already includes a 
stemmer.

Steve

-----Original Message-----
From: Dawid Weiss [mailto:[email protected]] 
Sent: Tuesday, August 21, 2012 4:34 PM
To: [email protected]
Subject: Looking for a code pattern to pass stop words as an attribute

Seeking advice.

I have an application where I need to know which tokens are stop
words. Most analyzers construct the token stream in a way that those
tokens are filtered out -- this isn't what I need, I want them in, but
marked somehow. The question is how to do it nicely and in a simple
way, possibly reusing existing token filters? I had a few ideas but
they all seem awkward -- let me know if I'm missing something obvious.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to