[jira] [Commented] (PIG-3082) outputSchema of a UDF allows two usages when describing a Tuple schema

2013-10-04 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786709#comment-13786709
 ] 

Aniket Mokashi commented on PIG-3082:
-

But, we should document this as incompatible change so that there are no 
surprises?

> outputSchema of a UDF allows two usages when describing a Tuple schema
> --
>
> Key: PIG-3082
> URL: https://issues.apache.org/jira/browse/PIG-3082
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Jonathan Coveney
> Fix For: 0.12.0
>
> Attachments: PIG-3082-0.patch, PIG-3082-1.patch
>
>
> When defining an evalfunc that returns a Tuple there are two ways you can 
> implement outputSchema().
> - The right way: return a schema that contains one Field that contains the 
> type and schema of the return type of the UDF
> - The unreliable way: return a schema that contains more than one field and 
> it will be understood as a tuple schema even though there is no type (which 
> is in Field class) to specify that. This is particularly deceitful when the 
> output schema is derived from the input schema and the outputted Tuple 
> sometimes contain only one field. In such cases Pig understands the output 
> schema as a tuple only if there is more than one field. And sometimes it 
> works, sometimes it does not.
> We should at least issue a warning (backward compatibility) if not plain 
> throw an exception when the output schema contains more than one Field.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3082) outputSchema of a UDF allows two usages when describing a Tuple schema

2013-10-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785427#comment-13785427
 ] 

Julien Le Dem commented on PIG-3082:


This is intended.
The second behavior described above is really problematic.
If a UDF breaks because it returns a schema of more than one field it should be 
changed to return one field of type tuple.
Once fixed it works in all versions of Pig.
This is only removing an unsafe use of outputSchema in favor of the existing 
correct use.

> outputSchema of a UDF allows two usages when describing a Tuple schema
> --
>
> Key: PIG-3082
> URL: https://issues.apache.org/jira/browse/PIG-3082
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Jonathan Coveney
> Fix For: 0.12.0
>
> Attachments: PIG-3082-0.patch, PIG-3082-1.patch
>
>
> When defining an evalfunc that returns a Tuple there are two ways you can 
> implement outputSchema().
> - The right way: return a schema that contains one Field that contains the 
> type and schema of the return type of the UDF
> - The unreliable way: return a schema that contains more than one field and 
> it will be understood as a tuple schema even though there is no type (which 
> is in Field class) to specify that. This is particularly deceitful when the 
> output schema is derived from the input schema and the outputted Tuple 
> sometimes contain only one field. In such cases Pig understands the output 
> schema as a tuple only if there is more than one field. And sometimes it 
> works, sometimes it does not.
> We should at least issue a warning (backward compatibility) if not plain 
> throw an exception when the output schema contains more than one Field.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3082) outputSchema of a UDF allows two usages when describing a Tuple schema

2013-10-02 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784720#comment-13784720
 ] 

Dmitriy V. Ryaboy commented on PIG-3082:


So... that's a breaking change, a bunch of UDF will fail under 12. 

Intended?

> outputSchema of a UDF allows two usages when describing a Tuple schema
> --
>
> Key: PIG-3082
> URL: https://issues.apache.org/jira/browse/PIG-3082
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Jonathan Coveney
> Fix For: 0.12.0
>
> Attachments: PIG-3082-0.patch, PIG-3082-1.patch
>
>
> When defining an evalfunc that returns a Tuple there are two ways you can 
> implement outputSchema().
> - The right way: return a schema that contains one Field that contains the 
> type and schema of the return type of the UDF
> - The unreliable way: return a schema that contains more than one field and 
> it will be understood as a tuple schema even though there is no type (which 
> is in Field class) to specify that. This is particularly deceitful when the 
> output schema is derived from the input schema and the outputted Tuple 
> sometimes contain only one field. In such cases Pig understands the output 
> schema as a tuple only if there is more than one field. And sometimes it 
> works, sometimes it does not.
> We should at least issue a warning (backward compatibility) if not plain 
> throw an exception when the output schema contains more than one Field.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3082) outputSchema of a UDF allows two usages when describing a Tuple schema

2013-01-17 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556686#comment-13556686
 ] 

Julien Le Dem commented on PIG-3082:


Thanks for fixing Jon!
I find the error message a little confusing:
{noformat}
 throw new FrontendException("Given UDF returns an improper Schema. Should only 
return Tuple, Bag, or a single item. Returns: " + udfSchema);
{noformat}
It should contain something along the lines of "... outputSchema should return 
a Schema containing a single Field ...".
Otherwise, it looks good to me.
Thanks

> outputSchema of a UDF allows two usages when describing a Tuple schema
> --
>
> Key: PIG-3082
> URL: https://issues.apache.org/jira/browse/PIG-3082
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3082-0.patch
>
>
> When defining an evalfunc that returns a Tuple there are two ways you can 
> implement outputSchema().
> - The right way: return a schema that contains one Field that contains the 
> type and schema of the return type of the UDF
> - The unreliable way: return a schema that contains more than one field and 
> it will be understood as a tuple schema even though there is no type (which 
> is in Field class) to specify that. This is particularly deceitful when the 
> output schema is derived from the input schema and the outputted Tuple 
> sometimes contain only one field. In such cases Pig understands the output 
> schema as a tuple only if there is more than one field. And sometimes it 
> works, sometimes it does not.
> We should at least issue a warning (backward compatibility) if not plain 
> throw an exception when the output schema contains more than one Field.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira