Russell Jurney created PIG-3190: ----------------------------------- Summary: Add LuceneTokenizer to Pig - useful text tokenization Key: PIG-3190 URL: https://issues.apache.org/jira/browse/PIG-3190 Project: Pig Issue Type: Bug Components: internal-udfs Affects Versions: 0.11 Reporter: Russell Jurney Assignee: Russell Jurney Fix For: 0.12
TOKENIZE is literally useless. The Lucene tokenizer in varaha is much more useful for actual tasks: https://github.com/Ganglion/varaha/blob/master/src/main/java/varaha/text/TokenizeText.java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira