Combined input splits need to consider rack-locality for the underlying splits
of rack info.
--------------------------------------------------------------------------------------------
Key: PIG-1535
URL: https://issues.apache.org/jira/browse/PIG-1535
Project: Pig
Issue Type: Improvement
Reporter: Yan Zhou
PIG-1518 will add support to incorporate multiple small splits into bigger yet
less splits. In doing so, the underlying generic input split's node-locality is
consulted to maximize the data node-locality for the "big" splits. The
rack-locality info is unavailable because the generic input splits do not have
the info currently. MAPREDUCE-1698 is filed to address the lack of rack info in
InputSplit. On the other hand, for many other types of input splits the rack
info is available. FileSplit is an example. Future Howl's input splits will
also contain the rack-locality info.
In summary, before MAPREDUCE-1698 is resolved if ever, for some specific types
of input splits, the small splits could be combined with the awareness of the
rack-locality, by, probably, the same or similar algorithms by the
CombineFileInputFormat.
But it would mean non-trivial extra work on top of PIG-1518 and may be out of
reach of 0.8, hence a separate JIRA.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.