Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/JoinFramework

------------------------------------------------------------------------------
  
  == Metadata ==
  
- To choose best join algorithm, additional information about the data is 
required. This data can be stored with the data or in a separate repository in 
which case Pig can consume this data and make choices on user's behalf. 
However, part of Pig philosophy is to it anything which means in this case to 
operate correctly and as efficiently as possible in the absense of the 
metadata. Also, even if metadata is available user should be able to dictate 
how to join the data.
+ To choose best join algorithm, additional information about the data is 
required. This data can be stored with the data or in a separate repository in 
which case Pig can consume this data and make choices on user's behalf. 
However, part of Pig philosophy is to it anything which means in this case to 
operate correctly and as efficiently as possible in the absence of the 
metadata. Also, even if metadata is available user should be able to disable 
its use. 
+ 
+ Questions:
+ 
+  1. Should user be able to provide a conflicting information? I would think 
not?
+  2. Should user only be able to disable all optimizations or class of 
optimizations or particular optimization? What does Oracle do?
  
  === Metadata Available ===
+ 
+ If metadata is available pig will pull this metadata and use it as part of 
optimization. The details of how this would be done is beyond the scope of this 
document. The required data would need to be communicated as part of Pig 
requirements of the metadata repository whenever one is available.
+ 
  === No Metadata Available ===
  

Reply via email to