[ 
https://issues.apache.org/jira/browse/PIG-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895335#action_12895335
 ] 

Yan Zhou commented on PIG-1518:
-------------------------------

The combination algorithm currently does not consider rack-locality as the 
generic underlying input splits do not carry the rack info. For more specific 
input splits like FileSplit, the rack info is available, thus allowing for 
generation of combined splits with consideration of rack-locality. But this 
might be out of scope for 0.8 and a seperate JIRA, PIG-1535, has been filed for 
that purpose.

> multi file input format for loaders
> -----------------------------------
>
>                 Key: PIG-1518
>                 URL: https://issues.apache.org/jira/browse/PIG-1518
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Yan Zhou
>             Fix For: 0.8.0
>
>
> We frequently run in the situation where Pig needs to deal with small files 
> in the input. In this case a separate map is created for each file which 
> could be very inefficient. 
> It would be greate to have an umbrella input format that can take multiple 
> files and use them in a single split. We would like to see this working with 
> different data formats if possible.
> There are already a couple of input formats doing similar thing: 
> MultifileInputFormat as well as CombinedInputFormat; howevere, neither works 
> with ne Hadoop 20 API. 
> We at least want to do a feasibility study for Pig 0.8.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to