[ https://issues.apache.org/jira/browse/PIG-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-2690: ---------------------------- Resolution: Not A Problem Status: Resolved (was: Patch Available) > Pig Documentation regarding Merge Join is confusing > --------------------------------------------------- > > Key: PIG-2690 > URL: https://issues.apache.org/jira/browse/PIG-2690 > Project: Pig > Issue Type: Improvement > Components: documentation, site > Affects Versions: 0.7.0, 0.8.1 > Reporter: Jeff Lord > Labels: docuentation > Attachments: fixDocs_0.patch > > > The Documentation regarding merge join in pig is a bit off. > http://pig.apache.org/docs/r0.7.0/piglatin_ref1.html#Merge+Joins > "For optimal performance, each part file of the left (sorted) input of the > join should have a size of at least 1 hdfs block size (for example if the > hdfs block size is 128 MB, each part file should be less than 128 MB). If the > total input size (including all part files) is greater than blocksize, then > the part files should be uniform in size (without large skews in sizes)." > This is confusing and should read something more akin to this: > http://wiki.apache.org/pig/PigMergeJoin > For optimal performance, each part file of the left (sorted) input of the > join should have a size of at least 1 hdfs block size (for example if the > hdfs block size is 128 MB, each part file should be > 128 MB). If the total > input size (including all part files) is < a blocksize, then the part files > should be uniform in size (without large skews in sizes). The main idea is to > eliminate skew in the amount of input the final map job performing the > merge-join will process. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira