Hey brock do you have a proper code its like giving a lot of errors!!!!!! On Thu, Oct 13, 2011 at 4:29 PM, Brock Noland <br...@cloudera.com> wrote:
> Hi, > > The code is very similar, just create a SequenceFile reader. > > Brock > > On Thu, Oct 13, 2011 at 4:53 AM, visioner sadak > <visioner.sa...@gmail.com>wrote: > >> Hello Brock, >> >> Thanks a lot for your help man,should i run this code >> after doing the small file uploads i mean i have a java api which does the >> small file uploads and reads as well,how will be i able to read the files as >> well >> >> >> >> On Thu, Oct 13, 2011 at 2:26 AM, Brock Noland <br...@cloudera.com> wrote: >> >>> Hi, >>> >>> This: http://pastebin.com/YFzAh0Nj >>> >>> will convert a directory of small files to a sequence file. The key is >>> the filename, the value the file itself. This works if each individual file >>> is small enough to fit in memory. If you have some files which are larger >>> and those files can be split up, they can be split over multiple key value >>> pairs. >>> >>> Brock >>> >>> On Wed, Oct 12, 2011 at 4:50 PM, visioner sadak < >>> visioner.sa...@gmail.com> wrote: >>> >>>> Hello guys, >>>> >>>> Thanks a lot again for your previous guidance guys,i tried >>>> out java api to do file uploads its wrking fine,now i need to modify the >>>> code using sequence files so that i can handle large number of small files >>>> in hadoop.. for that i encountered 2 links >>>> >>>> 1. http://stuartsierra.com/2008/04/24/a-million-little-files (tar to >>>> sequence) >>>> 2. http://www.jointhegrid.com/hadoop_filecrush/index.jsp (file crush) >>>> >>>> could you pls tell me which approach is better to follow or should i >>>> follow HAR(hadoop archive) approach,i came to know that in sequence file we >>>> can combine smaller files in to one big one but dunt know how to split and >>>> retrieve the small files again while reading files,,, as well.. >>>> Thanks and Gratitude >>>> On Wed, Oct 5, 2011 at 1:27 AM, visioner sadak < >>>> visioner.sa...@gmail.com> wrote: >>>> >>>>> Thanks a lot wellington and bejoy for your inputs will try out this api >>>>> and sequence file.... >>>>> >>>>> >>>>> On Wed, Oct 5, 2011 at 1:17 AM, Wellington Chevreuil < >>>>> wellington.chevre...@gmail.com> wrote: >>>>> >>>>>> Yes, Sadak, >>>>>> >>>>>> Within this API, you'll copy your files into Hadoop HDFS as you do >>>>>> when writing to an OutputStream. It will be replicated in your >>>>>> cluster's HDFS then. >>>>>> >>>>>> Cheers. >>>>>> >>>>>> 2011/10/4 visioner sadak <visioner.sa...@gmail.com>: >>>>>> > Hey thanks wellington just a thought will my data be replicated as >>>>>> well coz >>>>>> > i thought tht mapper does the job of breaking data in to pieces and >>>>>> > distribution and reducer will do the joining and combining while >>>>>> fetching >>>>>> > data back thts why was confused to use a MR..can i use this API for >>>>>> > uploading a large number of small files as well thru my application >>>>>> or >>>>>> > should i use sequence file class for that...because i saw the small >>>>>> file >>>>>> > problem in hadoop as well as mentioned in below link >>>>>> > >>>>>> > http://www.cloudera.com/blog/2009/02/the-small-files-problem/ >>>>>> > >>>>>> > On Wed, Oct 5, 2011 at 12:54 AM, Wellington Chevreuil >>>>>> > <wellington.chevre...@gmail.com> wrote: >>>>>> >> >>>>>> >> Hey Sadak, >>>>>> >> >>>>>> >> you don't need to write a MR job for that. You can make your java >>>>>> >> program use Hadoop Java API for that. You would need to use >>>>>> FileSystem >>>>>> >> >>>>>> >> ( >>>>>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html >>>>>> ) >>>>>> >> and Path >>>>>> >> ( >>>>>> http://hadoop.apache.org/common/docs/current/api/index.html?org/apache/hadoop/fs/Path.html >>>>>> ) >>>>>> >> classes for that. >>>>>> >> >>>>>> >> Cheers, >>>>>> >> Wellington. >>>>>> >> >>>>>> >> 2011/10/4 visioner sadak <visioner.sa...@gmail.com>: >>>>>> >> > Hello guys, >>>>>> >> > >>>>>> >> > I would like to know how to do file uploads in HDFS >>>>>> using >>>>>> >> > java,is it to be done using map reduce what if i have a large >>>>>> number of >>>>>> >> > small files should i use sequence file along with map >>>>>> reduce???,It will >>>>>> >> > be >>>>>> >> > great if you can provide some sort of information... >>>>>> > >>>>>> > >>>>>> >>>>> >>>>> >>>> >>> >> >