Semantics of TOKENIZE are not clear
-----------------------------------
Key: PIG-683
URL: https://issues.apache.org/jira/browse/PIG-683
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: types_branch
Reporter: Santhosh Srinivasan
Fix For: types_branch
The semantics of TOKENIZE are not clear. In its current form, TOKENIZE takes as
input a string and returns a bag. The bag contains 1 tuple per token. The tuple
in turn contains a single token. A better approach would be to return a tuple
(instead of a bag) that contains as many elements as there are tokens.
On a secondary note, the outputSchema method in TOKENIZE is broken. It should
return a bag with a tuple that contains a string and not just a string.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.