In addition: - "mapred.output.compression.type" is now replaced with "mapred.map.output.compression.type"
- the old implementation of the Java interface setMapOutputCompressorClass() used to turn on map compression on automatically as side-effect, the 0.15 one doesn't. Looks like one has to call setCompressMapOutput() separately. Aargh. ________________________________ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Joydeep Sen Sarma Sent: Wednesday, February 20, 2008 5:06 PM To: core-user@hadoop.apache.org Subject: changes to compression interfaces in 0.15? Hi developers, In migrating to 0.15 - i am noticing that the compression interfaces have changed: - compression type for sequencefile outputs used to be set by: SequenceFile.setCompressionType() - now it seems to be set using: sequenceFileOutputFormat.setOutputCompressionType() The change is for the better - but would it be possible to: - remove old/dead interfaces. That would have been a straightforward hint for applications to look for new interfaces. (hadoop-default.xml also still has setting for old conf variable: io.seqfile.compression.type) - if possible - document changed interfaces in the release notes (there's no way we can find this out by looking at the long list of Jiras). As u can imagine - this causes a very subtle and harmful regression in behavior of existing apps. It does not causes failures - and in our case - switched from BLOCK to RECORD compression - meaning - there's no compression at all pretty much. I caught this by *pure* chance and now I am living in absolute fear of what else lurks out there. i am not sure how updated the wiki is on the compression stuff (my responsibility to update it) - but please do consider the impact of changing interfaces on existing applications. (maybe we should have a JIRA tag to mark out bugs that change interfaces). As always - thanks for all the fish (err .. working code), Joydeep