I would not expect this. I would expect behaviour that is independent of the way the splits are created.
-- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel ) Op 26 dec. 2011 07:57 schreef "Anthony Urso" <antho...@cs.ucla.edu> het volgende: > Gzip files (unlike uncompressed files) are not splittable, which may be > causing the behavior that you described. > On Dec 24, 2011 6:24 AM, "Niels Basjes" <ni...@basjes.nl> wrote: > > > Hi, > > > > I noticed that the mapper progress indication in the hadoop cdh3 > > distribution jumps from 0% to 100% for each gzipped input file. So when > > running with big gzipped input files the job appears to be stuck. > > > > I was unable to find a jira issue that describes this effect. > > Before I dive into this I have a few questions to you guys: > > 1) is this a known effect for the 0.20 version? If so what is the jira > > issue? > > 2) is this specific to gzip? > > 3) is this effect still present in the MRv2/yarn version of Hadoop? > > > > Thanks. > > -- > > Met vriendelijke groet, > > Niels Basjes > > (Verstuurd vanaf mobiel ) > > >