[ https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765192#action_12765192 ]
Pradeep Kamath commented on PIG-1014: ------------------------------------- The issue I see is with the implementation of COUNT today. It looks at only the first field in the bag and counts only non null values towards the result. This can lead to mysterious results. Consider a relation (A) with two fields with the following contents: {noformat} 1 2 3 4 null 6 7 null null null {noformat} If we have the following snippet: {code} B = group A all; C = foreach B generate COUNT(A); {code} The answer is 3 which was arrived at only by considering record 1, record 2 and record 4 since the other records have null in the first position. Ironically though record 4 has null in the second position that does not prevent it from being not counted. So the result being based on the null-ness of just the first field seems somewhat arbitrary. My concern is that most users would not know that the result was arrived at *after* dropping records which had null in the first field even though they did not specify COUNT(A.$0). Status Quo means we equate COUNT(A) to COUNT(A.$0) which is also not apparent to users. > Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all > records are counted without considering nullness of the fields in the records > ---------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: PIG-1014 > URL: https://issues.apache.org/jira/browse/PIG-1014 > Project: Pig > Issue Type: Bug > Affects Versions: 0.4.0 > Reporter: Pradeep Kamath > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.