[Pig Wiki] Trivial Update of "JoinFramework" by OlgaN

Apache Wiki Wed, 08 Oct 2008 17:20:33 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.


The following page has been changed by OlgaN:
http://wiki.apache.org/pig/JoinFramework

------------------------------------------------------------------------------
  === Pre-partitioned Join (PPJ) ===
  
  This join type takes advantage of the fact that the data of all relations is 
already partition by the join key or its prefix which means that the join can 
be done completely independently on separate nodes. It further helps if the 
data is sorted on the key; otherwise it might have to get sorted before the 
join.
+ 
+ Also if some but not other tables are partitioned on the join key; the other 
table can be shuffled before the join.
  
  In the case of Hadoop, this means that the join can be done in a Map avoiding 
SORT/SHUFFLE/REDUCE stages. The performance would be even better if the 
partitions for the same key ranges were collocated on the same nodes and if the 
computation was scheduled to run on this nodes. However, for now this is 
outside of Pig's control.
  
@@ -71, +73 @@

  C = JOIN A by name, B by name USING <JOIN TYPE>;
  }}}
  
- The JOIN TYPE is a string that represents a type of a join like 
`partitioned`, `indexed`, `replicated`, etc.
+ The JOIN TYPE is a string that represents a type of a join like 
`partitioned`, `ordered partitioned` `indexed`, `replicated`, etc.
+ 
  === No Metadata Available ===
  
+ If no external meta data is available, the user would need to provide 
additional information to help the optimizer to make good choice. The user 
could also explicitely specify the join TYPE to use as shown above.
+

[Pig Wiki] Trivial Update of "JoinFramework" by OlgaN

Reply via email to