[ https://issues.apache.org/jira/browse/LUCENE-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718817#action_12718817 ]
Grant Ingersoll commented on LUCENE-1676: ----------------------------------------- BTW, I'm curious if people have a better way to convert from char[] to byte[] for encoding the payloads (see FloatEncoder), other than going through Strings. > New Token filter for adding payloads "in-stream" > ------------------------------------------------ > > Key: LUCENE-1676 > URL: https://issues.apache.org/jira/browse/LUCENE-1676 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/analyzers > Reporter: Grant Ingersoll > Assignee: Grant Ingersoll > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1676.patch > > > This TokenFilter is able to split a token based on a delimiter and use one > part as the token and the other part as a payload. This allows someone to > include payloads inline with tokens (presumably setup by a pipeline ahead of > time). An example is apropos. Given a | delimiter, we could have a stream > that looks like: > {quote}The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ brown|JJ > dogs|NN{quote} > In this case, this would produce tokens and payloads (assuming whitespace > tokenization): > Token: the > Payload: null > Token: quick > Payload: JJ > Token: red > Pay: JJ. > and so on. > This patch will also support pluggable encoders for the payloads, so it can > convert from the character array to byte arrays as appropriate. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org