Hi, Check the links below.
Read from HDFS: https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-read-a-file-from-hdfs Write from HDFS: https://sites.google.com/site/hadoopandhive/home/how-to-write-a-file-in-hdfs-using-hadoop Hope they help! Thanks & regards Arko On Tue, Apr 3, 2012 at 7:40 AM, Christoph Schmitz <christoph.schm...@1und1.de> wrote: > Hi Xin, > > when you're running your MapReduce job, at some point you'll have to wire it > together, i.e., say what the mapper class is, what the reducer class is, etc. > There you can also configure the job to use your new OutputFormat class. > Something like this: > > -------------- > Job job = new Job(conf); > job.setMapperClass(MyMapper.class); > job.setReducerClass(MyReducer.class); > job.setOutputFormatClass(OverwritingTextOutputFormat.class); > ... // more setters > job.waitForCompletion(); > -------------- > > Assuming, of course, that your data is text. Your job should then use that > OutputFormat and overwrite the output directory. > > If possible, though, I'd agree and go with Bejoy's solution - it is much more > straightforward. > > Regards, > Christoph > > -----Ursprüngliche Nachricht----- > Von: Fang Xin [mailto:nusfang...@gmail.com] > Gesendet: Dienstag, 3. April 2012 14:31 > An: mapreduce-user@hadoop.apache.org > Betreff: Re: how to overwrite output in HDFS? > > I create such a class in the project, and build an instance of it in > main, and try to use this method included, but it didnt work. > Can you explain a little bit more about how to let this function work? > > On Tue, Apr 3, 2012 at 6:39 PM, Christoph Schmitz > <christoph.schm...@1und1.de> wrote: >> Hi Xin, >> >> you can derive your own output format class from one of the Hadoop >> OutputFormats and make sure the "checkOutputSpecs" method, which usually >> does the checking, is empty: >> >> ----------- >> public final class OverwritingTextOutputFormat<K, V> extends >> TextOutputFormat<K, V> { >> @Override >> public void checkOutputSpecs(JobContext job) throws IOException { >> // Nothing >> } >> } >> ----------- >> >> Regards, >> Christoph >> >> -----Ursprüngliche Nachricht----- >> Von: Fang Xin [mailto:nusfang...@gmail.com] >> Gesendet: Dienstag, 3. April 2012 11:35 >> An: mapreduce-user >> Betreff: how to overwrite output in HDFS? >> >> Hi, all >> >> I'm writing my own map-reduce code using eclipse with hadoop plug-in. >> I've specified input and output directories in the project property. >> (two folders, namely input and output) >> >> My problem is that each time when I do some modification and try to >> run it again, i have to manually delete the previous output in HDFS, >> otherwise there will be error. >> Can anyone kindly suggest how to just simply overwrite the result? >> >> Best regards, >> Xin