Re: Hadoop file uploads

visioner sadak Thu, 13 Oct 2011 01:54:21 -0700

Hello Brock,

                  Thanks a lot for your help man,should i run this code
after doing the small file uploads i mean i have a java api which does the
small file uploads and reads as well,how will be i able to read the files as
well



On Thu, Oct 13, 2011 at 2:26 AM, Brock Noland <br...@cloudera.com> wrote:

> Hi,
>
> This:  http://pastebin.com/YFzAh0Nj
>
> will convert a directory of small files to a sequence file. The key is the
> filename, the value the file itself. This works if each individual file is
> small enough to fit in memory. If you have some files which are larger and
> those files can be split up, they can be split over multiple key value
> pairs.
>
> Brock
>
> On Wed, Oct 12, 2011 at 4:50 PM, visioner sadak 
> <visioner.sa...@gmail.com>wrote:
>
>> Hello guys,
>>
>>             Thanks a lot again for your previous guidance guys,i tried out
>> java api to do file uploads its wrking fine,now i need to modify the code
>> using sequence files so that i can handle large number of small files in
>> hadoop.. for that i encountered 2 links
>>
>> 1. http://stuartsierra.com/2008/04/24/a-million-little-files (tar to
>> sequence)
>> 2. http://www.jointhegrid.com/hadoop_filecrush/index.jsp (file crush)
>>
>> could you pls tell me which approach is better to follow or should i
>> follow HAR(hadoop archive) approach,i came to know that in sequence file we
>> can combine smaller files in to one big one but dunt know how to split and
>> retrieve the small files again while reading files,,, as well..
>>  Thanks and Gratitude
>> On Wed, Oct 5, 2011 at 1:27 AM, visioner sadak 
>> <visioner.sa...@gmail.com>wrote:
>>
>>> Thanks a lot wellington and bejoy for your inputs will try out this api
>>> and sequence file....
>>>
>>>
>>> On Wed, Oct 5, 2011 at 1:17 AM, Wellington Chevreuil <
>>> wellington.chevre...@gmail.com> wrote:
>>>
>>>> Yes, Sadak,
>>>>
>>>> Within this API, you'll copy your files into Hadoop HDFS as you do
>>>> when writing to an OutputStream. It will be replicated in your
>>>> cluster's HDFS then.
>>>>
>>>> Cheers.
>>>>
>>>> 2011/10/4 visioner sadak <visioner.sa...@gmail.com>:
>>>>  > Hey thanks wellington just a thought will my data be replicated as
>>>> well coz
>>>> > i thought tht mapper does the job of breaking data in to pieces and
>>>> > distribution and reducer will do the joining and combining while
>>>> fetching
>>>> > data back thts why was confused to use a MR..can i use this API for
>>>> > uploading a large number of small files as well thru my application or
>>>> > should i use sequence file class for that...because i saw the small
>>>> file
>>>> > problem in hadoop as well as mentioned in below link
>>>> >
>>>> > http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>> >
>>>> > On Wed, Oct 5, 2011 at 12:54 AM, Wellington Chevreuil
>>>> > <wellington.chevre...@gmail.com> wrote:
>>>> >>
>>>> >> Hey Sadak,
>>>> >>
>>>> >> you don't need to write a MR job for that. You can make your java
>>>> >> program use Hadoop Java API for that. You would need to use
>>>> FileSystem
>>>> >>
>>>> >> (
>>>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html
>>>> )
>>>> >> and Path
>>>> >> (
>>>> http://hadoop.apache.org/common/docs/current/api/index.html?org/apache/hadoop/fs/Path.html
>>>> )
>>>> >> classes for that.
>>>> >>
>>>> >> Cheers,
>>>> >> Wellington.
>>>> >>
>>>> >> 2011/10/4 visioner sadak <visioner.sa...@gmail.com>:
>>>> >> > Hello guys,
>>>> >> >
>>>> >> >             I would like to know how to do file uploads in HDFS
>>>> using
>>>> >> > java,is it to be done using map reduce what if i have a large
>>>> number of
>>>> >> > small files should i use sequence file along with map reduce???,It
>>>> will
>>>> >> > be
>>>> >> > great if you can provide some sort of information...
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>

Re: Hadoop file uploads

Reply via email to