I should have mentioned that in the step where you create a har archive locally, you should use the local job runner.
On 9/3/08 5:41 PM, "Devaraj Das" <[EMAIL PROTECTED]> wrote: > Ok .. You could try this - run the hadoop archive tool in your local hadoop > setup. For e.g. If you want to create a archive of the conf directory, you > could run - "bin/hadoop archive -archiveName tmp.har conf test". > Now copy the contents of the test directory to the dfs. > "bin/hadoop dfs -put test/tmp.har tmp.har". It should be possible to look at > this using the hadoop fs commands (like bin/hadoop dfs -ls > har:///user/ddas/tmp.har) or from a MR job. > The one thing you should note is that the paths in the har fs have the names > of the paths in your local machine... > > BTW I myself never tried the above.. > > The other option is to concatenate (if possible) the files into bigger files > and then upload those to the dfs.. > > On 9/3/08 4:37 PM, "Dmitry Pushkarev" <[EMAIL PROTECTED]> wrote: > >> Probably, but the current idea is to bypass writing small files to HDFS by >> creating my own local har archive and uploading it. (small files lower >> transfer speed from 40-70MB/s to hundreds ok kbps :( >> >> -----Original Message----- >> From: Devaraj Das [mailto:[EMAIL PROTECTED] >> Sent: Wednesday, September 03, 2008 4:00 AM >> To: core-user@hadoop.apache.org >> Subject: Re: har/unhar utility >> >> You could create a har archive of the small files and then pass the >> corresponding har filesystem as input to your mapreduce job. Would that >> work? >> >> >> On 9/3/08 4:24 PM, "Dmitry Pushkarev" <[EMAIL PROTECTED]> wrote: >> >>> Not quite, I want to be able to create har archives on local system and >> then >>> send them to HDFS, and back since I work with many small files (10kb) and >>> hadoop seem to behave poorly with them. >>> >>> Perhaps HBASE is another option. Is anyone using it in "production" mode? >>> And do I really need to downgrade to 17.x to install it? >>> >>> -----Original Message----- >>> From: Devaraj Das [mailto:[EMAIL PROTECTED] >>> Sent: Wednesday, September 03, 2008 3:35 AM >>> To: core-user@hadoop.apache.org >>> Subject: Re: har/unhar utility >>> >>> Are you looking for user documentation on har? If so, here it is: >>> http://hadoop.apache.org/core/docs/r0.18.0/hadoop_archives.html >>> >>> >>> On 9/3/08 3:21 PM, "Dmitry Pushkarev" <[EMAIL PROTECTED]> wrote: >>> >>>> Does anyone have har/unhar utility? >>>> >>>> Or at least format description: It looks pretty obvious though, but just >>> in >>>> case. >>>> >>>> >>>> >>>> Thanks >>>> >>>> >>>> >>>> >>> >>> >> >>