Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by OlgaN: http://wiki.apache.org/pig/JoinFramework ------------------------------------------------------------------------------ == Metadata == - To choose best join algorithm, additional information about the data is required. This data can be stored with the data or in a separate repository in which case Pig can consume this data and make choices on user's behalf. However, part of Pig philosophy is to it anything which means in this case to operate correctly and as efficiently as possible in the absense of the metadata. Also, even if metadata is available user should be able to dictate how to join the data. + To choose best join algorithm, additional information about the data is required. This data can be stored with the data or in a separate repository in which case Pig can consume this data and make choices on user's behalf. However, part of Pig philosophy is to it anything which means in this case to operate correctly and as efficiently as possible in the absence of the metadata. Also, even if metadata is available user should be able to disable its use. + + Questions: + + 1. Should user be able to provide a conflicting information? I would think not? + 2. Should user only be able to disable all optimizations or class of optimizations or particular optimization? What does Oracle do? === Metadata Available === + + If metadata is available pig will pull this metadata and use it as part of optimization. The details of how this would be done is beyond the scope of this document. The required data would need to be communicated as part of Pig requirements of the metadata repository whenever one is available. + === No Metadata Available ===