[ 
https://issues.apache.org/jira/browse/PIG-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972705#action_12972705
 ] 

Daniel Dai commented on PIG-1277:
---------------------------------

Response to Alan's comments:
1. Yes, you are right. It introduces some subtle differences. When we see a 
null value, we set mNull flag, and put null into a tuple. We do read the null 
back; however, in NullableBytesWritable, we rely on mNull flag to do the 
comparison, which may results wrong result. I will change it.

2. LOUnion is a bug fix. We shall get null schema if union two different 
schema. It is to fix PIG-1065. Since PIG-1065 is more of the same nature, I 
don't want to put fix in a separate patch.

> Pig should give error message when cogroup on tuple keys of different inner 
> type
> --------------------------------------------------------------------------------
>
>                 Key: PIG-1277
>                 URL: https://issues.apache.org/jira/browse/PIG-1277
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Daniel Dai
>            Assignee: Alan Gates
>             Fix For: 0.9.0
>
>         Attachments: PIG-1277-1.patch
>
>
> When we cogroup on a tuple, if the inner type of tuple does not match, we 
> treat them as different keys. This is confusing. It is desirable to give 
> error/warnings when it happens.
> Here is one example:
> UDF:
> {code}
> public class MapGenerate extends EvalFunc<Map> {
>     @Override
>     public Map exec(Tuple input) throws IOException {
>         // TODO Auto-generated method stub
>         Map m = new HashMap();
>         m.put("key", new Integer(input.size()));
>         return m;
>     }
>     
>     @Override
>     public Schema outputSchema(Schema input) {
>         return new Schema(new Schema.FieldSchema(null, DataType.MAP));
>     }
> }
> {code}
> Pig script: 
> {code}
> a = load '1.txt' as (a0);
> b = foreach a generate a0, MapGenerate(*) as m:map[];
> c = foreach b generate a0, m#'key' as key;
> d = load '2.txt' as (c0, c1);
> e = cogroup c by (a0, key), d by (c0, c1);
> dump e;
> {code}
> 1.txt
> {code}
> 1
> {code}
> 2.txt
> {code}
> 1 1
> {code}
> User expected result (which is not right):
> {code}
> ((1,1),{(1,1)},{(1,1)})
> {code}
> Real result:
> {code}
> ((1,1),{(1,1)},{})
> ((1,1),{},{(1,1)})
> {code}
> We shall give user the message that we can not merge the key due to the type 
> mismatch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to