Ok .. You could try this - run the hadoop archive tool in your local hadoop setup. For e.g. If you want to create a archive of the conf directory, you could run - "bin/hadoop archive -archiveName tmp.har conf test". Now copy the contents of the test directory to the dfs. "bin/hadoop dfs -put test/tmp.har tmp.har". It should be possible to look at this using the hadoop fs commands (like bin/hadoop dfs -ls har:///user/ddas/tmp.har) or from a MR job. The one thing you should note is that the paths in the har fs have the names of the paths in your local machine...
BTW I myself never tried the above.. The other option is to concatenate (if possible) the files into bigger files and then upload those to the dfs.. On 9/3/08 4:37 PM, "Dmitry Pushkarev" <[EMAIL PROTECTED]> wrote: > Probably, but the current idea is to bypass writing small files to HDFS by > creating my own local har archive and uploading it. (small files lower > transfer speed from 40-70MB/s to hundreds ok kbps :( > > -----Original Message----- > From: Devaraj Das [mailto:[EMAIL PROTECTED] > Sent: Wednesday, September 03, 2008 4:00 AM > To: core-user@hadoop.apache.org > Subject: Re: har/unhar utility > > You could create a har archive of the small files and then pass the > corresponding har filesystem as input to your mapreduce job. Would that > work? > > > On 9/3/08 4:24 PM, "Dmitry Pushkarev" <[EMAIL PROTECTED]> wrote: > >> Not quite, I want to be able to create har archives on local system and > then >> send them to HDFS, and back since I work with many small files (10kb) and >> hadoop seem to behave poorly with them. >> >> Perhaps HBASE is another option. Is anyone using it in "production" mode? >> And do I really need to downgrade to 17.x to install it? >> >> -----Original Message----- >> From: Devaraj Das [mailto:[EMAIL PROTECTED] >> Sent: Wednesday, September 03, 2008 3:35 AM >> To: core-user@hadoop.apache.org >> Subject: Re: har/unhar utility >> >> Are you looking for user documentation on har? If so, here it is: >> http://hadoop.apache.org/core/docs/r0.18.0/hadoop_archives.html >> >> >> On 9/3/08 3:21 PM, "Dmitry Pushkarev" <[EMAIL PROTECTED]> wrote: >> >>> Does anyone have har/unhar utility? >>> >>> Or at least format description: It looks pretty obvious though, but just >> in >>> case. >>> >>> >>> >>> Thanks >>> >>> >>> >>> >> >> > >