Hello Brock, Thanks a lot for your help man,should i run this code after doing the small file uploads i mean i have a java api which does the small file uploads and reads as well,how will be i able to read the files as well
On Thu, Oct 13, 2011 at 2:26 AM, Brock Noland <br...@cloudera.com> wrote: > Hi, > > This: http://pastebin.com/YFzAh0Nj > > will convert a directory of small files to a sequence file. The key is the > filename, the value the file itself. This works if each individual file is > small enough to fit in memory. If you have some files which are larger and > those files can be split up, they can be split over multiple key value > pairs. > > Brock > > On Wed, Oct 12, 2011 at 4:50 PM, visioner sadak > <visioner.sa...@gmail.com>wrote: > >> Hello guys, >> >> Thanks a lot again for your previous guidance guys,i tried out >> java api to do file uploads its wrking fine,now i need to modify the code >> using sequence files so that i can handle large number of small files in >> hadoop.. for that i encountered 2 links >> >> 1. http://stuartsierra.com/2008/04/24/a-million-little-files (tar to >> sequence) >> 2. http://www.jointhegrid.com/hadoop_filecrush/index.jsp (file crush) >> >> could you pls tell me which approach is better to follow or should i >> follow HAR(hadoop archive) approach,i came to know that in sequence file we >> can combine smaller files in to one big one but dunt know how to split and >> retrieve the small files again while reading files,,, as well.. >> Thanks and Gratitude >> On Wed, Oct 5, 2011 at 1:27 AM, visioner sadak >> <visioner.sa...@gmail.com>wrote: >> >>> Thanks a lot wellington and bejoy for your inputs will try out this api >>> and sequence file.... >>> >>> >>> On Wed, Oct 5, 2011 at 1:17 AM, Wellington Chevreuil < >>> wellington.chevre...@gmail.com> wrote: >>> >>>> Yes, Sadak, >>>> >>>> Within this API, you'll copy your files into Hadoop HDFS as you do >>>> when writing to an OutputStream. It will be replicated in your >>>> cluster's HDFS then. >>>> >>>> Cheers. >>>> >>>> 2011/10/4 visioner sadak <visioner.sa...@gmail.com>: >>>> > Hey thanks wellington just a thought will my data be replicated as >>>> well coz >>>> > i thought tht mapper does the job of breaking data in to pieces and >>>> > distribution and reducer will do the joining and combining while >>>> fetching >>>> > data back thts why was confused to use a MR..can i use this API for >>>> > uploading a large number of small files as well thru my application or >>>> > should i use sequence file class for that...because i saw the small >>>> file >>>> > problem in hadoop as well as mentioned in below link >>>> > >>>> > http://www.cloudera.com/blog/2009/02/the-small-files-problem/ >>>> > >>>> > On Wed, Oct 5, 2011 at 12:54 AM, Wellington Chevreuil >>>> > <wellington.chevre...@gmail.com> wrote: >>>> >> >>>> >> Hey Sadak, >>>> >> >>>> >> you don't need to write a MR job for that. You can make your java >>>> >> program use Hadoop Java API for that. You would need to use >>>> FileSystem >>>> >> >>>> >> ( >>>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html >>>> ) >>>> >> and Path >>>> >> ( >>>> http://hadoop.apache.org/common/docs/current/api/index.html?org/apache/hadoop/fs/Path.html >>>> ) >>>> >> classes for that. >>>> >> >>>> >> Cheers, >>>> >> Wellington. >>>> >> >>>> >> 2011/10/4 visioner sadak <visioner.sa...@gmail.com>: >>>> >> > Hello guys, >>>> >> > >>>> >> > I would like to know how to do file uploads in HDFS >>>> using >>>> >> > java,is it to be done using map reduce what if i have a large >>>> number of >>>> >> > small files should i use sequence file along with map reduce???,It >>>> will >>>> >> > be >>>> >> > great if you can provide some sort of information... >>>> > >>>> > >>>> >>> >>> >> >