Hi G.Long,

You can use TrimFilter+LengthFilter to remove empty/whitespace tokens.


Ahmet

On Thursday, October 9, 2014 5:54 PM, G.Long <jde...@gmail.com> wrote:
Hi :)

I wrote a custom token filter which removes special characters. 
Sometimes, all characters of the token are removed so the filter 
procudes an empty token. I would like to remove this token from the 
tokenstream but i'm not sure how to do that.

Is there something missing in my custom token filter or do I need to 
chain another custom token filter to remove empty tokens?

Regards :)

ps:

this is the code of my custom filter :

public class SpecialCharFilter extends TokenFilter {

     private final CharTermAttribute termAtt = 
addAttribute(CharTermAttribute.class);

     protected SpecialCharFilter(TokenStream input) {
         super(input);
     }

     @Override
     public boolean incrementToken() throws IOException {

         if (!input.incrementToken()) {
             return false;
         }

         final char[] buffer = termAtt.buffer();
         final int length = termAtt.length();
         final char[] newBuffer = new char[length];

         int newIndex = 0;
         for (int i = 0; i < length; i++) {
             if (!isFilteredChar(buffer[i])) {
                 newBuffer[newIndex] = buffer[i];
                 newIndex++;
             }
         }

         String term = new String(newBuffer);
         term = term.trim();
         char[] characters = term.toCharArray();
         termAtt.setEmpty();
         termAtt.copyBuffer(characters, 0, characters.length);

         return true;
     }
}

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to