Re: Gzip progress during map phase.

2011-12-27 Thread Niels Basjes
Yes, this is what i was looking for.
Thanks

-- 
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 27 dec. 2011 12:08 schreef Koji Noguchi knogu...@yahoo-inc.com het
volgende:

 Assuming you're using TextInputFormat, it sounds like
 https://issues.apache.org/jira/browse/MAPREDUCE-773
 In 0.21.  Don't know about CDH.

 Koji


 On 12/27/11 2:00 AM, Niels Basjes ni...@basjes.nl wrote:

  I would not expect this. I would expect behaviour that is independent of
  the way the splits are created.
 
  --
  Met vriendelijke groet,
  Niels Basjes
  (Verstuurd vanaf mobiel )
  Op 26 dec. 2011 07:57 schreef Anthony Urso antho...@cs.ucla.edu het
  volgende:
 
  Gzip files (unlike uncompressed files) are not splittable, which may be
  causing the behavior that you described.
  On Dec 24, 2011 6:24 AM, Niels Basjes ni...@basjes.nl wrote:
 
  Hi,
 
  I noticed that the mapper progress indication in the hadoop cdh3
  distribution jumps from 0% to 100% for each gzipped input file. So when
  running with big gzipped input files the job appears to be stuck.
 
  I was unable to find a jira issue that describes this effect.
  Before I dive into this I have a few questions to you guys:
  1) is this a known effect for the 0.20 version? If so what is the jira
  issue?
  2) is this specific to gzip?
  3) is this effect still present in the MRv2/yarn version of Hadoop?
 
  Thanks.
  --
  Met vriendelijke groet,
  Niels Basjes
  (Verstuurd vanaf mobiel )
 
 




Re: Gzip progress during map phase.

2011-12-27 Thread Niels Basjes
I would not expect this. I would expect behaviour that is independent of
the way the splits are created.

-- 
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 26 dec. 2011 07:57 schreef Anthony Urso antho...@cs.ucla.edu het
volgende:

 Gzip files (unlike uncompressed files) are not splittable, which may be
 causing the behavior that you described.
 On Dec 24, 2011 6:24 AM, Niels Basjes ni...@basjes.nl wrote:

  Hi,
 
  I noticed that the mapper progress indication in the hadoop cdh3
  distribution jumps from 0% to 100% for each gzipped input file. So when
  running with big gzipped input files the job appears to be stuck.
 
  I was unable to find a jira issue that describes this effect.
  Before I dive into this I have a few questions to you guys:
  1) is this a known effect for the 0.20 version? If so what is the jira
  issue?
  2) is this specific to gzip?
  3) is this effect still present in the MRv2/yarn version of Hadoop?
 
  Thanks.
  --
  Met vriendelijke groet,
  Niels Basjes
  (Verstuurd vanaf mobiel )
 



Re: Gzip progress during map phase.

2011-12-27 Thread Koji Noguchi
Assuming you're using TextInputFormat, it sounds like
https://issues.apache.org/jira/browse/MAPREDUCE-773
In 0.21.  Don't know about CDH.

Koji


On 12/27/11 2:00 AM, Niels Basjes ni...@basjes.nl wrote:

 I would not expect this. I would expect behaviour that is independent of
 the way the splits are created.
 
 -- 
 Met vriendelijke groet,
 Niels Basjes
 (Verstuurd vanaf mobiel )
 Op 26 dec. 2011 07:57 schreef Anthony Urso antho...@cs.ucla.edu het
 volgende:
 
 Gzip files (unlike uncompressed files) are not splittable, which may be
 causing the behavior that you described.
 On Dec 24, 2011 6:24 AM, Niels Basjes ni...@basjes.nl wrote:
 
 Hi,
 
 I noticed that the mapper progress indication in the hadoop cdh3
 distribution jumps from 0% to 100% for each gzipped input file. So when
 running with big gzipped input files the job appears to be stuck.
 
 I was unable to find a jira issue that describes this effect.
 Before I dive into this I have a few questions to you guys:
 1) is this a known effect for the 0.20 version? If so what is the jira
 issue?
 2) is this specific to gzip?
 3) is this effect still present in the MRv2/yarn version of Hadoop?
 
 Thanks.
 --
 Met vriendelijke groet,
 Niels Basjes
 (Verstuurd vanaf mobiel )
 
 



Re: Gzip progress during map phase.

2011-12-25 Thread Anthony Urso
Gzip files (unlike uncompressed files) are not splittable, which may be
causing the behavior that you described.
On Dec 24, 2011 6:24 AM, Niels Basjes ni...@basjes.nl wrote:

 Hi,

 I noticed that the mapper progress indication in the hadoop cdh3
 distribution jumps from 0% to 100% for each gzipped input file. So when
 running with big gzipped input files the job appears to be stuck.

 I was unable to find a jira issue that describes this effect.
 Before I dive into this I have a few questions to you guys:
 1) is this a known effect for the 0.20 version? If so what is the jira
 issue?
 2) is this specific to gzip?
 3) is this effect still present in the MRv2/yarn version of Hadoop?

 Thanks.
 --
 Met vriendelijke groet,
 Niels Basjes
 (Verstuurd vanaf mobiel )