[ https://issues.apache.org/jira/browse/PIG-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-3190: ---------------------------- Fix Version/s: (was: 0.14.0) 0.15.0 > Add LuceneTokenizer and SnowballTokenizer to Pig - useful text tokenization > --------------------------------------------------------------------------- > > Key: PIG-3190 > URL: https://issues.apache.org/jira/browse/PIG-3190 > Project: Pig > Issue Type: Bug > Components: internal-udfs > Affects Versions: 0.11 > Reporter: Russell Jurney > Assignee: Russell Jurney > Fix For: 0.15.0 > > Attachments: PIG-3190-2.patch, PIG-3190-3.patch, PIG-3190.patch > > > TOKENIZE is literally useless. The Lucene Standard/Snowball tokenizers in > lucene, as used by, varaha is much more useful for actual tasks: > https://github.com/Ganglion/varaha/blob/master/src/main/java/varaha/text/TokenizeText.java > -- This message was sent by Atlassian JIRA (v6.3.4#6332)