I suppose we could do with a simple identity mapping/identity reducing example/tool that can easily be reutilized for purposes such as these. Could you file a JIRA on this?
The -text is like -cat but has codec and some file format detection. Hopefully it should work for your case. On Fri, Aug 5, 2011 at 8:44 PM, Keith Wiley <kwi...@keithwiley.com> wrote: > I can envision an M/R job for the purpose of manipulating hdfs, such as > (de)compressing files and resaving them back to HDFS. I just didn't think it > should be necessary to *write a program* to do something so seemingly > minimal. This (tarring/compressing/etc.) seems like an obvious method for > moving data back and forth; I would expect the tools to support it. > > I'll read up on "-text". Maybe that really is what I wanted, although I'm > dubious since this has nothing to do with textual data at all. Anyway, I'll > see what I can find on that. > > Thanks. > > On Aug 4, 2011, at 9:04 PM, Harsh J wrote: > >> Keith, >> >> The 'hadoop fs -text' tool does decompress a file given to it if >> needed/able, but what you could also do is run a distributed mapreduce >> job that converts from compressed to decompressed, that'd be much >> faster. >> >> On Fri, Aug 5, 2011 at 4:58 AM, Keith Wiley <kwi...@keithwiley.com> wrote: >>> Instead of "hd fs -put" hundreds of files of X megs, I want to do it once >>> on a gzipped (or zipped) archive, one file, much smaller total megs. Then >>> I want to decompress the archive on HDFS? I can't figure out what "hd fs" >>> type command would do such a thing. >>> >>> Thanks. > > > ________________________________________________________________________________ > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > "It's a fine line between meticulous and obsessive-compulsive and a slippery > rope between obsessive-compulsive and debilitatingly slow." > -- Keith Wiley > ________________________________________________________________________________ > > -- Harsh J