Thanks How does mapreduce work on sequence file? Is there an example I can look at?
On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee < arkoprovomukher...@gmail.com> wrote: > Hi, > > Let's say all the smaller files are in the same directory. > > Then u can do: > > *BufferedWriter output = new BufferedWriter > (newOutputStreamWriter(fs.create(output_path, > true))); // Output path* > > *FileStatus[] output_files = fs.listStatus(new Path(input_path)); // Input > directory* > > *for ( int i=0; i < output_files.length; i++ ) * > > *{* > > * BufferedReader reader = new > BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath()))); > * > > * String data;* > > * data = reader.readLine();* > > * while ( data != null ) * > > * {* > > * output.write(data);* > > * }* > > * reader.close* > > *}* > > *output.close* > > > In case you have the files in multiple directories, call the code for each > of them with different input paths. > > Hope this helps! > > Cheers > > Arko > > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia <mohitanch...@gmail.com > >wrote: > > > I am trying to look for examples that demonstrates using sequence files > > including writing to it and then running mapred on it, but unable to find > > one. Could you please point me to some examples of sequence files? > > > > On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks <bejoy.had...@gmail.com> > wrote: > > > > > Hi Mohit > > > AFAIK XMLLoader in pig won't be suited for Sequence Files. Please > > > post the same to Pig user group for some workaround over the same. > > > SequenceFIle is a preferred option when we want to store small > > > files in hdfs and needs to be processed by MapReduce as it stores data > in > > > key value format.Since SequenceFileInputFormat is available at your > > > disposal you don't need any custom input formats for processing the > same > > > using map reduce. It is a cleaner and better approach compared to just > > > appending small xml file contents into a big file. > > > > > > On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia < > mohitanch...@gmail.com > > > >wrote: > > > > > > > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks <bejoy.had...@gmail.com> > > > wrote: > > > > > > > > > Mohit > > > > > Rather than just appending the content into a normal text > file > > or > > > > > so, you can create a sequence file with the individual smaller file > > > > content > > > > > as values. > > > > > > > > > > Thanks. I was planning to use pig's > > > > org.apache.pig.piggybank.storage.XMLLoader > > > > for processing. Would it work with sequence file? > > > > > > > > This text file that I was referring to would be in hdfs itself. Is it > > > still > > > > different than using sequence file? > > > > > > > > > Regards > > > > > Bejoy.K.S > > > > > > > > > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia < > > > mohitanch...@gmail.com > > > > > >wrote: > > > > > > > > > > > We have small xml files. Currently I am planning to append these > > > small > > > > > > files to one file in hdfs so that I can take advantage of splits, > > > > larger > > > > > > blocks and sequential IO. What I am unsure is if it's ok to > append > > > one > > > > > file > > > > > > at a time to this hdfs file > > > > > > > > > > > > Could someone suggest if this is ok? Would like to know how other > > do > > > > it. > > > > > > > > > > > > > > > > > > > > >