[ https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashutosh Chauhan updated PIG-845: --------------------------------- Attachment: merge-join-1.patch Specification: http://wiki.apache.org/pig/PigMergeJoin Updated patch with following enhancements: Performance: a) Got completely rid of POForEach for generating joined output tuples. b) Creating output tuple of required size and then doing set instead of append. c) Caching of key as suggested by Pradeep in previous comment. d) Creating new arraylist for holding buffered left tuples instead of clearing it, thus avoiding resizing of array. Functionality: a) Added typecasting for index keys, thus making join work when schemas are supplied. b) Refactored visit(LOJoin loj) in LogToPhyTranslationVisitor to avoid duplicate code. Error Handling: a) Better error handling at various places. b) Added validateMergeJoin() in LogToPhyTranslationVisitor to generate exception where Merge Join cant be used. c) Added more tests. Limitations: Merge Join doesn't work when there are splits, streaming and order-by in predecessors or streaming is present in successors. Some of these are related to an issue outlined here: https://issues.apache.org/jira/browse/PIG-858 and requires work in MRCompiler. Currently we detect these conditions in validateMergeJoin() and fail at compile time. > PERFORMANCE: Merge Join > ----------------------- > > Key: PIG-845 > URL: https://issues.apache.org/jira/browse/PIG-845 > Project: Pig > Issue Type: Improvement > Reporter: Olga Natkovich > Attachments: merge-join-1.patch, merge-join-for-review.patch > > > Thsi join would work if the data for both tables is sorted on the join key. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.