Re: intermediate results not getting compressed

Billy Pearson Tue, 17 Mar 2009 10:16:06 -0700

Watching a second job with more reduce task running looks like the in-memorymerges are working correctly with compression.

The task I was watching failed and was running again it Shuffle all the mapoutput files then started the merged after all was copied so non was mergedin memory it was closed before the merging started.If it helps the name of the output files is intermediate.x and is stored infolder mapred/local/job-taskname/intermediate.xwhile the in-memory merges are storedmapred/local/taskTracker/jobcache/job-name/taskname/


The non compressed ones are the intermediate.x file above.

Billy

"Chris Douglas" <chri...@yahoo-inc.com> wrote inmessage news:9bb78c3a-efab-45c3-8cc3-25aab60df...@yahoo-inc.com...

My problem is the output from merging the intermediate map output filesis not compresses so I lose all the benefit of compressing the map fileoutput to save disk space because the merged map output files are nolonger compressed.
It should still be compressed, unless there's some bizarre regression.More segments will be around simultaneously (since the segments not yetmerged are still on disk), which clearly puts pressure on intermediatestorage, but if the map outputs are compressed, then the merged mapoutputs at the reduce must also be compressed. There's no place in theintermediate format to store compression metadata, so either all are ornone are. Intermediate merges should also follow the compression spec ofthe initiating merger, too (o.a.h.mapred.Merger: 447).
How are you concluding that the intermediate output is compressed fromthe map, but not in the reduce? -C
----- Original Message ----- From: "Chris Douglas"<chrisdo-ZXvpkYn067l8UrSeD/g...@public.gmane.org
>
Newsgroups: gmane.comp.jakarta.lucene.hadoop.user
To:<core-user-7ArZoLwFLBtd/SJB6HiN2Ni2O/jbr...@public.gmane.org>
Sent: Tuesday, March 17, 2009 12:33 AM
Subject: Re: intermediate results not getting compressed
I am running 0.19.1-dev, r744282. I have searched the issues butfound nothing about the compression.
AFAIK, there are no open issues that prevent intermediate compressionfrom working. The following might be useful:
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Data+Compression
Should the intermediate results not be compressed also if the mapoutput files are set to be compressed?
These are controlled by separate options.
FileOutputFormat::setCompressOutput enables/disables compression onthe final outputJobConf::setCompressMapOutput enables/disables compression of theintermediate output
If not then why do we have the map compression option just to savenetwork traffic?
That's part of it. Also to save on disk bandwidth and intermediatespace. -C

Re: intermediate results not getting compressed

Reply via email to