[ 
https://issues.apache.org/jira/browse/PIG-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137270#comment-16137270
 ] 

Daniel Dai edited comment on PIG-5272 at 8/22/17 7:41 PM:
----------------------------------------------------------

Are you saying your data does not match your declared schema? If you are not 
sure about the bag inner schema, you shall leave it empty by just declaring it 
as \{()\}, which means this is a bag with unknown inner schema. I see 
BagToString does have an issue, it does not deal with unknown inner schema. If 
that's the issue you are trying to fix, you are welcome to submit a patch.


was (Author: daijy):
Are you saying your data does not match your declared schema? If you are not 
sure about the bag inner schema, you shall leave it empty by just declaring it 
as {()}, which means this is a bag with unknown inner schema. I see BagToString 
does have an issue, it does not deal with unknown inner schema. If that's the 
issue you are trying to fix, you are welcome to submit a patch.

> BagToString Output Schema
> -------------------------
>
>                 Key: PIG-5272
>                 URL: https://issues.apache.org/jira/browse/PIG-5272
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Joshua Juen
>            Priority: Minor
>
> The output schema from BagToTuple is nonsensical causing problems using the 
> tuple later in the same script. 
> For example: Given a bag: { data:chararray }, calling BagToTuple yields the 
> schema: ( data:chararray )
> But, this makes no sense since if the above bag contains: {data1, data2, 
> data3} entries, the output tuple from BagToTuple will be:
> (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the 
> declared output schema from the UDF.
> Unfortunately, the schema of the tuple cannot be known during the initial 
> validation phase. Thus, I believe the output schema from the UDF should be 
> modified to be type tuple without the number of fields being fixed to the 
> number of columns in the input bag. 
> Under the current way, the elements in the tuple cannot be accessed in the 
> script after calling BagToTuple without getting an incompatible type error. 
> We have modified the UDF in our internal UDF jars to work around the issue. 
> Let me know if this sounds reasonable and I can generate the patch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to