Hello group,
I am new to Lucene and ran into a bit of trouble while writing an app. I would
like to selectively index lines from a syslog on a unix system, to this end I
first wrote tokenizer that returns an entire line as a token extending
CharTokenizer
protected boolean isTokenChar(char c) {
return !((c == '\n') || (c == '\r'));
}
Perhaps that is my first mistake and I should have done things differently?
I then pass this to a filter that only selects the lines with text I am
interested in
public final boolean incrementToken() throws IOException
{
while (input.incrementToken())
{
Matcher lineMatcher = linePattern.matcher(termAtt.term());
if (lineMatcher.find()) //(we like the payload)
return true;
}
//reached EOS -- return false
return false;
}
However the issue is that, now that I have the line I want to break up the
individual line into tokens along white space, but the WhitespaceTokenizer does
not take a TokenStream as a constructor parameter. Can anyone offer
suggestion for a workaround?
Regards,
Lev Bronshtein
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]