Hi Mohit
     To add on, duplicates won't be there if your output is written to a hdfs 
file. Because if one attempt of a task is completed only that output file is 
copied to the final output destn and the files generated by other task attempts 
that are killed are just ignored.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: "Bejoy KS" <bejoy.had...@gmail.com>
Date: Thu, 22 Mar 2012 19:55:55 
To: <common-user@hadoop.apache.org>
Reply-To: bejoy.had...@gmail.com
Subject: Re: Number of retries

Mohit
      If you are writing to a db from a job in an atomic way, this would pop 
up. You can avoid this only by disabling speculative execution. 
Drilling down from web UI to a task level would get you the tasks where 
multiple attempts were there.

------Original Message------
From: Mohit Anchlia
To: common-user@hadoop.apache.org
ReplyTo: common-user@hadoop.apache.org
Subject: Number of retries
Sent: Mar 23, 2012 01:21

I am seeing wierd problem where I am seeing duplicate rows in the database.
I am wondering if this is because of some internal retries that might be
causing this. Is there a way to look at which tasks were retried? I am not
sure what else might cause because when I look at the output data I don't
see any duplicates in the file.



Regards
Bejoy KS

Sent from handheld, please excuse typos.

Reply via email to