@Ashutosh: I am currently running task with speculative execution turned off, but was wondering if there is a way to avoid the performance penalty.
@Dimitry: I would like to try out option 1 - any pointers on how to infer this "killed" status in the UDF? On Tuesday, April 13, 2010, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > Option 1: write everything in a given mapper in one big transaction, roll > back if killed (this is obviously a performance killer) > > Option 2: on spin-up, the task creates a temporary table by copying the > definition from the main table; the allFinished() method, or whatever we are > calling it now, moves data from the temp tables of successfull attempts into > the main table. Also not awesome. > > Option 3: Write to fs, bulk import into a database at the end of your job. > Safest, sanest, most parallelizable. See dependency tools like the recently > open-sourced Azkaban for making life easier in that regard. > > -Dmitriy > > On Tue, Apr 13, 2010 at 4:35 PM, Ashutosh Chauhan < > ashutosh.chau...@gmail.com> wrote: > >> Sandesh, >> >> As a workaround you can set the property >> mapred.[map|reduce].max.attempts to 1, which I believe will turn off >> speculative execution. You can pass this as -D switch on pig command >> line or through mapred-site.xml . Proper way to do it will be the way >> you suggested (though that will be less performant as well as complex >> to implement). You may also want to comment on that jira with your >> issue. >> >> Ashutosh >> >> On Tue, Apr 13, 2010 at 16:16, Sandesh Devaraju >> <sandesh.devar...@gmail.com> wrote: >> > Hi All, >> > >> > I am using PIG-1229 to write pig query output to a database. However, >> > I noticed that because of speculative execution, spurious records end >> > up being written. >> > >> > I was wondering if there is a way to infer if current reduce task is >> > running in a speculative slot that was cancelled (and hence a rollback >> > needs to be issued). >> > >> > Thanks in advance! >> > >> > - Sandesh >> > >> >