Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by OlgaN: http://wiki.apache.org/pig/JoinFramework ------------------------------------------------------------------------------ === Pre-partitioned Join (PPJ) === This join type takes advantage of the fact that the data of all relations is already partition by the join key or its prefix which means that the join can be done completely independently on separate nodes. It further helps if the data is sorted on the key; otherwise it might have to get sorted before the join. + + Also if some but not other tables are partitioned on the join key; the other table can be shuffled before the join. In the case of Hadoop, this means that the join can be done in a Map avoiding SORT/SHUFFLE/REDUCE stages. The performance would be even better if the partitions for the same key ranges were collocated on the same nodes and if the computation was scheduled to run on this nodes. However, for now this is outside of Pig's control. @@ -71, +73 @@ C = JOIN A by name, B by name USING <JOIN TYPE>; }}} - The JOIN TYPE is a string that represents a type of a join like `partitioned`, `indexed`, `replicated`, etc. + The JOIN TYPE is a string that represents a type of a join like `partitioned`, `ordered partitioned` `indexed`, `replicated`, etc. + === No Metadata Available === + If no external meta data is available, the user would need to provide additional information to help the optimizer to make good choice. The user could also explicitely specify the join TYPE to use as shown above. +