Assuming you're using TextInputFormat, it sounds like https://issues.apache.org/jira/browse/MAPREDUCE-773 In 0.21. Don't know about CDH.
Koji On 12/27/11 2:00 AM, "Niels Basjes" <ni...@basjes.nl> wrote: > I would not expect this. I would expect behaviour that is independent of > the way the splits are created. > > -- > Met vriendelijke groet, > Niels Basjes > (Verstuurd vanaf mobiel ) > Op 26 dec. 2011 07:57 schreef "Anthony Urso" <antho...@cs.ucla.edu> het > volgende: > >> Gzip files (unlike uncompressed files) are not splittable, which may be >> causing the behavior that you described. >> On Dec 24, 2011 6:24 AM, "Niels Basjes" <ni...@basjes.nl> wrote: >> >>> Hi, >>> >>> I noticed that the mapper progress indication in the hadoop cdh3 >>> distribution jumps from 0% to 100% for each gzipped input file. So when >>> running with big gzipped input files the job appears to be stuck. >>> >>> I was unable to find a jira issue that describes this effect. >>> Before I dive into this I have a few questions to you guys: >>> 1) is this a known effect for the 0.20 version? If so what is the jira >>> issue? >>> 2) is this specific to gzip? >>> 3) is this effect still present in the MRv2/yarn version of Hadoop? >>> >>> Thanks. >>> -- >>> Met vriendelijke groet, >>> Niels Basjes >>> (Verstuurd vanaf mobiel ) >>> >>