@Ashutosh: I am currently running task with speculative execution
turned off, but was wondering if there is a way to avoid the
performance penalty.

@Dimitry: I would like to try out option 1 - any  pointers on how to
infer this "killed" status in the UDF?

On Tuesday, April 13, 2010, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:
> Option 1: write everything in a given mapper in one big transaction, roll
> back if killed (this is obviously a performance killer)
>
> Option 2: on spin-up, the task creates a temporary table by copying the
> definition from the main table; the allFinished() method, or whatever we are
> calling it now, moves data from the temp tables of successfull attempts into
> the main table. Also not awesome.
>
> Option 3: Write to fs, bulk import into a database at the end of your job.
> Safest, sanest, most parallelizable. See dependency tools like the recently
> open-sourced Azkaban for making life easier in that regard.
>
> -Dmitriy
>
> On Tue, Apr 13, 2010 at 4:35 PM, Ashutosh Chauhan <
> ashutosh.chau...@gmail.com> wrote:
>
>> Sandesh,
>>
>> As a workaround you can set the property
>> mapred.[map|reduce].max.attempts to 1, which I believe will turn off
>> speculative execution. You can pass this as -D switch on pig command
>> line or through mapred-site.xml . Proper way to do it will be the way
>> you suggested (though that will be less performant as well as  complex
>> to implement). You may also want to comment on that jira with your
>> issue.
>>
>> Ashutosh
>>
>> On Tue, Apr 13, 2010 at 16:16, Sandesh Devaraju
>> <sandesh.devar...@gmail.com> wrote:
>> > Hi All,
>> >
>> > I am using PIG-1229 to write pig query output to a database. However,
>> > I noticed that because of speculative execution, spurious records end
>> > up being written.
>> >
>> > I was wondering if there is a way to infer if current reduce task is
>> > running in a speculative slot that was cancelled (and hence a rollback
>> > needs to be issued).
>> >
>> > Thanks in advance!
>> >
>> > - Sandesh
>> >
>>
>

Reply via email to