[ 
https://issues.apache.org/jira/browse/PIG-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800342#action_12800342
 ] 

Alan Gates commented on PIG-1188:
---------------------------------

I don't think padding is a good idea.  We don't know which field in the record 
is missing.  We're just guessing that the last field is missing, when in fact 
it might be the first.  Then we've made the situation worse by inserting 
invalid data in the all the fields.

I think the loader should either throw the record out, or make all fields in 
the record null.  This guarantees that we are not further propagating the 
error.  Then a warning can be issued that the record was invalid (I'm assuming 
even in the above proposal the loader would issue a warning.) 

> Padding nulls to the input tuple according to input schema
> ----------------------------------------------------------
>
>                 Key: PIG-1188
>                 URL: https://issues.apache.org/jira/browse/PIG-1188
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Daniel Dai
>             Fix For: 0.7.0
>
>
> Currently, the number of fields in the input tuple is determined by the data. 
> When we have schema, we should generate input data according to the schema, 
> and padding nulls if necessary. Here is one example:
> Pig script:
> {code}
> a = load '1.txt' as (a0, a1);
> dump a;
> {code}
> Input file:
> {code}
> 1       2
> 1       2       3
> 1
> {code}
> Current result:
> {code}
> (1,2)
> (1,2,3)
> (1)
> {code}
> Desired result:
> {code}
> (1,2)
> (1,2)
> (1, null)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to