archive partSize should be configurable
---------------------------------------
Key: MAPREDUCE-1465
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1465
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
The archive part size is current set to 2GB. For archiving 10^5 small files,
it took 52 minutes since there is only 1 mapper.
{noformat}
-bash-3.1$ time $H archive ${Q} -archiveName ${DIR}.3.har -p ${PARENT} ${DIR}
${PARENT}
10/02/06 01:55:14 INFO mapred.JobClient: Running job: job_201002042035_5737
...
10/02/06 02:47:18 INFO mapred.JobClient: map 100% reduce 100%
10/02/06 02:47:19 INFO mapred.JobClient: Job complete: job_201002042035_5737
...
10/02/06 02:47:19 INFO mapred.JobClient: Reduce input records=100002
real 52m27.188s
user 0m29.314s
sys 0m1.276s
{noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.