New Token filter for adding payloads "in-stream" ------------------------------------------------
Key: LUCENE-1676 URL: https://issues.apache.org/jira/browse/LUCENE-1676 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 2.9 This TokenFilter is able to split a token based on a delimiter and use one part as the token and the other part as a payload. This allows someone to include payloads inline with tokens (presumably setup by a pipeline ahead of time). An example is apropos. Given a | delimiter, we could have a stream that looks like: {quote}The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ brown|JJ dogs|NN{quote} In this case, this would produce tokens and payloads (assuming whitespace tokenization): Token: the Payload: null Token: quick Payload: JJ Token: red Pay: JJ. and so on. This patch will also support pluggable encoders for the payloads, so it can convert from the character array to byte arrays as appropriate. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org