In addition:
- "mapred.output.compression.type" is now replaced with
"mapred.map.output.compression.type"
- the old implementation of the Java interface
setMapOutputCompressorClass() used to turn on map compression on
automatically as side-effect, the 0.15 one doesn't. Looks like one has
to call setCompressMapOutput() separately.
Aargh.
________________________________
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Joydeep
Sen
Sarma
Sent: Wednesday, February 20, 2008 5:06 PM
To: core-user@hadoop.apache.org
Subject: changes to compression interfaces in 0.15?
Hi developers,
In migrating to 0.15 - i am noticing that the compression interfaces
have changed:
- compression type for sequencefile outputs used to be set
by:
SequenceFile.setCompressionType()
- now it seems to be set using:
sequenceFileOutputFormat.setOutputCompressionType()
The change is for the better - but would it be possible to:
- remove old/dead interfaces. That would have been a
straightforward hint for applications to look for new interfaces.
(hadoop-default.xml also still has setting for old conf variable:
io.seqfile.compression.type)
- if possible - document changed interfaces in the release
notes (there's no way we can find this out by looking at the long list
of Jiras).
As u can imagine - this causes a very subtle and harmful regression in
behavior of existing apps. It does not causes failures - and in our
case
- switched from BLOCK to RECORD compression - meaning - there's no
compression at all pretty much. I caught this by *pure* chance and
now I
am living in absolute fear of what else lurks out there.
i am not sure how updated the wiki is on the compression stuff (my
responsibility to update it) - but please do consider the impact of
changing interfaces on existing applications. (maybe we should have a
JIRA tag to mark out bugs that change interfaces).
As always - thanks for all the fish (err .. working code),
Joydeep