[ 
https://issues.apache.org/jira/browse/PIG-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2690:
----------------------------

    Resolution: Not A Problem
        Status: Resolved  (was: Patch Available)
    
> Pig Documentation regarding Merge Join is confusing
> ---------------------------------------------------
>
>                 Key: PIG-2690
>                 URL: https://issues.apache.org/jira/browse/PIG-2690
>             Project: Pig
>          Issue Type: Improvement
>          Components: documentation, site
>    Affects Versions: 0.7.0, 0.8.1
>            Reporter: Jeff Lord
>              Labels: docuentation
>         Attachments: fixDocs_0.patch
>
>
> The Documentation regarding merge join in pig is a bit off.
> http://pig.apache.org/docs/r0.7.0/piglatin_ref1.html#Merge+Joins
> "For optimal performance, each part file of the left (sorted) input of the 
> join should have a size of at least 1 hdfs block size (for example if the 
> hdfs block size is 128 MB, each part file should be less than 128 MB). If the 
> total input size (including all part files) is greater than blocksize, then 
> the part files should be uniform in size (without large skews in sizes)."
> This is confusing and should read something more akin to this:
> http://wiki.apache.org/pig/PigMergeJoin
> For optimal performance, each part file of the left (sorted) input of the 
> join should have a size of at least 1 hdfs block size (for example if the 
> hdfs block size is 128 MB, each part file should be > 128 MB). If the total 
> input size (including all part files) is < a blocksize, then the part files 
> should be uniform in size (without large skews in sizes). The main idea is to 
> eliminate skew in the amount of input the final map job performing the 
> merge-join will process.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to