Line filtering

Lev Bronshtein Sun, 05 Sep 2010 14:10:09 -0700

Hello group, 

I am new to Lucene and ran into a bit of trouble while writing an app.  I would 
like to selectively index lines from a syslog on a unix system, to this end I 
first wrote tokenizer that returns an entire line as a token extending 
CharTokenizer


  protected boolean isTokenChar(char c) {
    return !((c == '\n') || (c == '\r'));
  }

Perhaps that is my first mistake and I should have done things differently? 

I then pass this to a filter that only selects the lines with text I am 
interested in

 public final boolean incrementToken() throws IOException
 {
  while (input.incrementToken())
  {
   Matcher lineMatcher = linePattern.matcher(termAtt.term());
   if (lineMatcher.find()) //(we like the payload)
     return true;
  }
  //reached EOS -- return false
  return false;
 }

However the issue is that, now that I have the line I want to break up the 
individual line into tokens along white space, but the WhitespaceTokenizer does 
not take a TokenStream as a constructor parameter.  Can anyone offer  
suggestion for a workaround?

Regards,

Lev Bronshtein
                                          
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Line filtering

Reply via email to