Rohini Palaniswamy created PIG-3366:
---------------------------------------
Summary: Do intelligent combination of input splits for compressed
files.
Key: PIG-3366
URL: https://issues.apache.org/jira/browse/PIG-3366
Project: Pig
Issue Type: Improvement
Reporter: Rohini Palaniswamy
pig.maxCombinedSplitSize defaults to block size. If there are lot of small bz
files which will uncompress to big data, they were combined till the block size
was reached which was 128 MB in our case. The load took 20 mins, but using
pig.noSplitCombination=true cut down the time to 2+mins.
Need intelligent logic to take into account the factor the input split will
expand to when uncompressed (factor will differ for different compression
formats like bz and gz and can be configurable by user) and use the expanded
size as an estimate while combining splits.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira