Combined input splits need to consider rack-locality for the underlying splits 
of rack info.
--------------------------------------------------------------------------------------------

                 Key: PIG-1535
                 URL: https://issues.apache.org/jira/browse/PIG-1535
             Project: Pig
          Issue Type: Improvement
            Reporter: Yan Zhou


PIG-1518 will add support to incorporate multiple small splits into bigger yet 
less splits. In doing so, the underlying generic input split's node-locality is 
consulted  to maximize the data node-locality for the "big" splits. The 
rack-locality info is unavailable because the generic input splits do not have 
the info currently. MAPREDUCE-1698 is filed to address the lack of rack info in 
InputSplit. On the other hand, for many other types of input splits the rack 
info is available. FileSplit is an example. Future Howl's input splits will 
also contain the rack-locality info. 

In summary, before MAPREDUCE-1698 is resolved if ever, for some specific types 
of input splits, the small splits could be combined with the awareness of the 
rack-locality, by, probably, the same or similar algorithms by the 
CombineFileInputFormat.

But it would mean non-trivial extra work on top of PIG-1518 and may be out of 
reach of 0.8, hence a separate JIRA.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to