[ https://issues.apache.org/jira/browse/PIG-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904819#action_12904819 ]
Scott Carey commented on PIG-1506: ---------------------------------- The SQL behavior of the above for an outer join would be to have five rows output -- just like COGROUP would if flattened. So that seems fine to me. A self-join should be the same as a COGROUP with yourself, which is different than a simple GROUP. However, there is a problem with inner join and nulls. Pig JOIN is not like SQL with respect to nulls on multi-column joins. (I have not tried on trunk however) In SQL, if ANY of the columns in a multi-column join is null, the row is not output. Try: {code} A = load 'small' as (name, age, gpa); B = load 'small' as (name, age, gpa); C = join A by (name,age), B by (name,age); dump C; {code} The result for SQL would be one row of the form joe 5 2.5 joe 5 2.5 > Need to clarify the difference between null handling in JOIN and COGROUP > ------------------------------------------------------------------------ > > Key: PIG-1506 > URL: https://issues.apache.org/jira/browse/PIG-1506 > Project: Pig > Issue Type: Improvement > Components: documentation > Reporter: Olga Natkovich > Assignee: Corinne Chandel > Fix For: 0.8.0 > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.