[ https://issues.apache.org/jira/browse/PIG-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607209#comment-13607209 ]
Russell Jurney commented on PIG-3190: ------------------------------------- Thanks for the notes. I'll get it fixed. As to piggybank: if this were in Piggybank, it would never get used. The reason being that a user would have to locate and LOAD lucene.jar(s), which is very difficult to do in practice. Since TOKENIZE is builtin, these make sense being builtin too. > Add LuceneTokenizer and SnowballTokenizer to Pig - useful text tokenization > --------------------------------------------------------------------------- > > Key: PIG-3190 > URL: https://issues.apache.org/jira/browse/PIG-3190 > Project: Pig > Issue Type: Bug > Components: internal-udfs > Affects Versions: 0.11 > Reporter: Russell Jurney > Assignee: Russell Jurney > Fix For: 0.12 > > Attachments: PIG-3190-2.patch, PIG-3190-3.patch, PIG-3190.patch > > > TOKENIZE is literally useless. The Lucene Standard/Snowball tokenizers in > lucene, as used by, varaha is much more useful for actual tasks: > https://github.com/Ganglion/varaha/blob/master/src/main/java/varaha/text/TokenizeText.java > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira