Re: Gzip progress during map phase.
Yes, this is what i was looking for. Thanks -- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel ) Op 27 dec. 2011 12:08 schreef Koji Noguchi knogu...@yahoo-inc.com het volgende: Assuming you're using TextInputFormat, it sounds like https://issues.apache.org/jira/browse/MAPREDUCE-773 In 0.21. Don't know about CDH. Koji On 12/27/11 2:00 AM, Niels Basjes ni...@basjes.nl wrote: I would not expect this. I would expect behaviour that is independent of the way the splits are created. -- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel ) Op 26 dec. 2011 07:57 schreef Anthony Urso antho...@cs.ucla.edu het volgende: Gzip files (unlike uncompressed files) are not splittable, which may be causing the behavior that you described. On Dec 24, 2011 6:24 AM, Niels Basjes ni...@basjes.nl wrote: Hi, I noticed that the mapper progress indication in the hadoop cdh3 distribution jumps from 0% to 100% for each gzipped input file. So when running with big gzipped input files the job appears to be stuck. I was unable to find a jira issue that describes this effect. Before I dive into this I have a few questions to you guys: 1) is this a known effect for the 0.20 version? If so what is the jira issue? 2) is this specific to gzip? 3) is this effect still present in the MRv2/yarn version of Hadoop? Thanks. -- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel )
Re: Gzip progress during map phase.
I would not expect this. I would expect behaviour that is independent of the way the splits are created. -- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel ) Op 26 dec. 2011 07:57 schreef Anthony Urso antho...@cs.ucla.edu het volgende: Gzip files (unlike uncompressed files) are not splittable, which may be causing the behavior that you described. On Dec 24, 2011 6:24 AM, Niels Basjes ni...@basjes.nl wrote: Hi, I noticed that the mapper progress indication in the hadoop cdh3 distribution jumps from 0% to 100% for each gzipped input file. So when running with big gzipped input files the job appears to be stuck. I was unable to find a jira issue that describes this effect. Before I dive into this I have a few questions to you guys: 1) is this a known effect for the 0.20 version? If so what is the jira issue? 2) is this specific to gzip? 3) is this effect still present in the MRv2/yarn version of Hadoop? Thanks. -- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel )
Re: Gzip progress during map phase.
Assuming you're using TextInputFormat, it sounds like https://issues.apache.org/jira/browse/MAPREDUCE-773 In 0.21. Don't know about CDH. Koji On 12/27/11 2:00 AM, Niels Basjes ni...@basjes.nl wrote: I would not expect this. I would expect behaviour that is independent of the way the splits are created. -- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel ) Op 26 dec. 2011 07:57 schreef Anthony Urso antho...@cs.ucla.edu het volgende: Gzip files (unlike uncompressed files) are not splittable, which may be causing the behavior that you described. On Dec 24, 2011 6:24 AM, Niels Basjes ni...@basjes.nl wrote: Hi, I noticed that the mapper progress indication in the hadoop cdh3 distribution jumps from 0% to 100% for each gzipped input file. So when running with big gzipped input files the job appears to be stuck. I was unable to find a jira issue that describes this effect. Before I dive into this I have a few questions to you guys: 1) is this a known effect for the 0.20 version? If so what is the jira issue? 2) is this specific to gzip? 3) is this effect still present in the MRv2/yarn version of Hadoop? Thanks. -- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel )
Re: Gzip progress during map phase.
Gzip files (unlike uncompressed files) are not splittable, which may be causing the behavior that you described. On Dec 24, 2011 6:24 AM, Niels Basjes ni...@basjes.nl wrote: Hi, I noticed that the mapper progress indication in the hadoop cdh3 distribution jumps from 0% to 100% for each gzipped input file. So when running with big gzipped input files the job appears to be stuck. I was unable to find a jira issue that describes this effect. Before I dive into this I have a few questions to you guys: 1) is this a known effect for the 0.20 version? If so what is the jira issue? 2) is this specific to gzip? 3) is this effect still present in the MRv2/yarn version of Hadoop? Thanks. -- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel )