[ https://issues.apache.org/jira/browse/PIG-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gianmarco De Francisci Morales updated PIG-2691: ------------------------------------------------ Resolution: Fixed Fix Version/s: 0.11 Release Note: TOKENIZE: the default name of the field in the schema produced by this UDF now depends on the input field. This change could break your script if you were relying on the field being called "bag_of_tokenTuples" (i.e. you were not using an AS clause to rename the field). Hadoop Flags: Incompatible change Status: Resolved (was: Patch Available) > Duplicate TOKENIZE schema > ------------------------- > > Key: PIG-2691 > URL: https://issues.apache.org/jira/browse/PIG-2691 > Project: Pig > Issue Type: Bug > Reporter: Gianmarco De Francisci Morales > Assignee: Jie Li > Labels: simple > Fix For: 0.11 > > Attachments: PIG-2691.patch, PIG-2691.patch.2 > > > TOKENIZE produces a fixed named schema that results in duplicates if used > more than once in the same generate statement. > We could paramenterize the schema on the name of the field being tokenized. > {code} > grunt> q = LOAD 'file' AS (source:chararray, target:chararray); > grunt> e = FOREACH q GENERATE TOKENIZE(source), TOKENIZE(target); > 2012-05-09 20:18:37,235 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1108: > <line 2, column 14> Duplicate schema alias: bag_of_tokenTuples > grunt> e = FOREACH q GENERATE TOKENIZE(source) as s_entities, > TOKENIZE(target) as t_entities; > grunt> describe e > e: {s_entities: {tuple_of_tokens: (token: chararray)},t_entities: > {tuple_of_tokens: (token: chararray)}} > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira