[jira] [Commented] (PIG-2691) Duplicate TOKENIZE schema

Daniel Dai (JIRA) Wed, 23 May 2012 18:19:43 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282099#comment-13282099
 ]


Daniel Dai commented on PIG-2691:
---------------------------------

Patch looks good. 

One potential issue is it introduces some backward compatibility. The output 
schema name for TOKENIZE change. If anyone rely on it, he/she has to change the 
script. Is that fine or do we need a flag?
                
> Duplicate TOKENIZE schema
> -------------------------
>
>                 Key: PIG-2691
>                 URL: https://issues.apache.org/jira/browse/PIG-2691
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Gianmarco De Francisci Morales
>              Labels: simple
>         Attachments: PIG-2691.patch, PIG-2691.patch.2
>
>
> TOKENIZE produces a fixed named schema that results in duplicates if used 
> more than once in the same generate statement.
> We could paramenterize the schema on the name of the field being tokenized.
> {code}
> grunt> q = LOAD 'file' AS (source:chararray, target:chararray);
> grunt> e = FOREACH q GENERATE TOKENIZE(source), TOKENIZE(target);
> 2012-05-09 20:18:37,235 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1108: 
> <line 2, column 14> Duplicate schema alias: bag_of_tokenTuples
> grunt> e = FOREACH q GENERATE TOKENIZE(source) as s_entities, 
> TOKENIZE(target) as t_entities;
> grunt> describe e
> e: {s_entities: {tuple_of_tokens: (token: chararray)},t_entities: 
> {tuple_of_tokens: (token: chararray)}}
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2691) Duplicate TOKENIZE schema

Reply via email to