Re: how to overwrite output in HDFS?

Arko Provo Mukherjee Tue, 03 Apr 2012 22:37:26 -0700

Hi,

Check the links below.


Read from HDFS:
https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-read-a-file-from-hdfs
Write from HDFS:
https://sites.google.com/site/hadoopandhive/home/how-to-write-a-file-in-hdfs-using-hadoop

Hope they help!

Thanks & regards
Arko

On Tue, Apr 3, 2012 at 7:40 AM, Christoph Schmitz
<[email protected]> wrote:
> Hi Xin,
>
> when you're running your MapReduce job, at some point you'll have to wire it 
> together, i.e., say what the mapper class is, what the reducer class is, etc. 
> There you can also configure the job to use your new OutputFormat class. 
> Something like this:
>
> --------------
> Job job = new Job(conf);
> job.setMapperClass(MyMapper.class);
> job.setReducerClass(MyReducer.class);
> job.setOutputFormatClass(OverwritingTextOutputFormat.class);
> ... // more setters
> job.waitForCompletion();
> --------------
>
> Assuming, of course, that your data is text. Your job should then use that 
> OutputFormat and overwrite the output directory.
>
> If possible, though, I'd agree and go with Bejoy's solution - it is much more 
> straightforward.
>
> Regards,
> Christoph
>
> -----Ursprüngliche Nachricht-----
> Von: Fang Xin [mailto:[email protected]]
> Gesendet: Dienstag, 3. April 2012 14:31
> An: [email protected]
> Betreff: Re: how to overwrite output in HDFS?
>
> I create such a class in the project, and build an instance of it in
> main, and try to use this method included, but it didnt work.
> Can you explain a little bit more about how to let this function work?
>
> On Tue, Apr 3, 2012 at 6:39 PM, Christoph Schmitz
> <[email protected]> wrote:
>> Hi Xin,
>>
>> you can derive your own output format class from one of the Hadoop 
>> OutputFormats and make sure the "checkOutputSpecs" method, which usually 
>> does the checking, is empty:
>>
>> -----------
>> public final class OverwritingTextOutputFormat<K, V> extends 
>> TextOutputFormat<K, V> {
>>    @Override
>>    public void checkOutputSpecs(JobContext job) throws IOException {
>>          // Nothing
>>    }
>> }
>> -----------
>>
>> Regards,
>> Christoph
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Fang Xin [mailto:[email protected]]
>> Gesendet: Dienstag, 3. April 2012 11:35
>> An: mapreduce-user
>> Betreff: how to overwrite output in HDFS?
>>
>> Hi, all
>>
>> I'm writing my own map-reduce code using eclipse with hadoop plug-in.
>> I've specified input and output directories in the project property.
>> (two folders, namely input and output)
>>
>> My problem is that each time when I do some modification and try to
>> run it again, i have to manually delete the previous output in HDFS,
>> otherwise there will be error.
>> Can anyone kindly suggest how to just simply overwrite the result?
>>
>> Best regards,
>> Xin

Re: how to overwrite output in HDFS?

Reply via email to