Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/JoinFramework

------------------------------------------------------------------------------
  
  This document provides a comprehensive view of performing joins in Pig. By 
`JOIN` here we mean traditional inner/outer `SQL` joins which in Pig are 
realized via `COGROUP` followed by flatten of the relations.
  
- Some of the approaches described in this document can also be applied to 
`CROSS` and `GROUP` as well. 
+ Some of the approaches described in this document can also be applied to 
`CROSS` and `GROUP` as well.
  
  == Joins ==
  
@@ -79, +79 @@

  
  If no external meta data is available, the user would need to provide 
additional information to help the optimizer to make good choice. The user 
could also explicitely specify the join TYPE to use as shown above.
  
+ For PPJ, the needed meta data is partition key and sort key. This information 
can be provided by extending `LOAD` statement
+ 
+ {{{
+ A = LOAD 'data' using PigStorage() as (x, y, z) PARTITIONED BY (x, y) SORTED 
BY (x,y);
+ }}}
+ 
+ Also, the user might choose not to give the information and just force the 
join type by doing
+ 
+ {{{
+ C = JOIN A by name, B by name USING 'partitioned';
+ or 
+ C = JOIN A by name, B by name USING 'ordered partitioned';
+ }}}
+ 
+ Question: how far do we want to take that? If we have one table that is 
partitioned and the other one that is both partitioned and ordered and the 
third one that is neither - do we want to come up with name or do we require 
metadata specification in this case? I think we should require metadata.
+ 

Reply via email to