Hi Mohit To add on, duplicates won't be there if your output is written to a hdfs file. Because if one attempt of a task is completed only that output file is copied to the final output destn and the files generated by other task attempts that are killed are just ignored.
Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: "Bejoy KS" <bejoy.had...@gmail.com> Date: Thu, 22 Mar 2012 19:55:55 To: <common-user@hadoop.apache.org> Reply-To: bejoy.had...@gmail.com Subject: Re: Number of retries Mohit If you are writing to a db from a job in an atomic way, this would pop up. You can avoid this only by disabling speculative execution. Drilling down from web UI to a task level would get you the tasks where multiple attempts were there. ------Original Message------ From: Mohit Anchlia To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: Number of retries Sent: Mar 23, 2012 01:21 I am seeing wierd problem where I am seeing duplicate rows in the database. I am wondering if this is because of some internal retries that might be causing this. Is there a way to look at which tasks were retried? I am not sure what else might cause because when I look at the output data I don't see any duplicates in the file. Regards Bejoy KS Sent from handheld, please excuse typos.