[ 
https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763218#action_12763218
 ] 

Ashutosh Chauhan commented on PIG-953:
--------------------------------------

Changes look good. One comment I have:

1) In SortInfo.java#equals
We have two lists and we want to check for their equality. I quickly looked up 
jdk sources and it seems that ArrayList doesn't override equals, so doing 
equals check on lists would result in reference equality test which would be 
incorrect. Correct way to do this would be to first check the sizes of two 
lists, if they are equal iterate through both lists and check equality of items 
at the same index in two list.  

Few nits:
1) TestMergeJoin contains a System.err.println which we can get rid of.
2) There are few unused imports in patch.
3) SortInfo.java#getSortColInfoList may result in Findbugs warning because of 
similar reason we discussed earlier in this jira. 

> Enable merge join in pig to work with loaders and store functions which can 
> internally index sorted data 
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-953
>                 URL: https://issues.apache.org/jira/browse/PIG-953
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.3.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>         Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953.patch
>
>
> Currently merge join implementation in pig includes construction of an index 
> on sorted data and use of that index to seek into the "right input" to 
> efficiently perform the join operation. Some loaders (notably the zebra 
> loader) internally implement an index on sorted data and can perform this 
> seek efficiently using their index. So the use of the index needs to be 
> abstracted in such a way that when the loader supports indexing, pig uses it 
> (indirectly through the loader) and does not construct an index. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to