Max,

Yes, the stack trace points to the race condition issue. What can be done
to fix this? Can someone from the dev team look into this? Should I raise a
JIRA for the same?



On 2 November 2018 at 11:14:36 AM, Maxime Beauchemin (
maximebeauche...@gmail.com) wrote:

Wait, the title of this thread is "Duplicate key unique constraint error",
to me that screams that something is not ok. If the check+insert was atomic
(insulated) this error wouldn't happen. Also I'm pretty sure when I looked
the stack trace looked like a scheduler-specific stack trace. It may be a
rare race condition, but doesn't the stack trace prove the existence of a
race condition?

Max

On Fri, Nov 2, 2018 at 10:19 AM Abhishek Sinha <abhis...@infoworks.io>
wrote:

> Max,
>
> If check+insert works correctly, then even multiple instances of scheduler
> running in parallel should not throw this error. I am not sure then when
> can this error happen.
>
>
>
> On 2 November 2018 at 8:37:20 AM, Maxime Beauchemin (
> maximebeauche...@gmail.com) wrote:
>
> The scheduler should never fail hard. The schedule logic that tries to
> insert the new task instance should only insert a new one if it doesn't
> exist already and isolate that check+insert inside a database transaction.
>
> Max
>
> On Fri, Nov 2, 2018 at 5:38 AM Abhishek Sinha <abhis...@infoworks.io>
> wrote:
>
> > Brian,
> >
> > We use the trigger dag CLI command to trigger it manually.
> >
> > Even when you have custom operators, the duplicate key error should not
> > happen right? Shouldn't the combination of task id, dag id and execution
> > date be unique?
> >
> >
> > On 30 October 2018 at 10:23:27 PM, Abhishek Sinha (abhis...@infoworks.io
> )
> > wrote:
> >
> > Max,
> >
> > The schedule interval is 1 day.
> >
> >
> >
> > Sent from my iPhone
> >
> > > On 30-Oct-2018, at 9:29 PM, Maxime Beauchemin <
> > maximebeauche...@gmail.com>
> > wrote:
> > >
> > > Also what's your schedule interval? I'm just trying to confirm that
> this
> > > isn't a "run every minute, or anytime someone blinks" kind of DAG.
> > >
> > > Max
> > >
> > > On Tue, Oct 30, 2018 at 5:48 AM Brian Greene <
> > > br...@heisenbergwoodworking.com> wrote:
> > >
> > >> How do you trigger it externally?
> > >>
> > >> We have several custom operators that trigger other jobs and we had to
> > be
> > >> really careful or we’d get duplicate keys for the dag run and it would
> > fail
> > >> to kick off.
> > >>
> > >> One scheduler, but we saw it repeatedly and have it noted as a thing
> to
> > >> watch out for.
> > >>
> > >> Brian
> > >>
> > >> Sent from a device with less than stellar autocorrect
> > >>
> > >>> On Oct 29, 2018, at 2:03 PM, Abhishek Sinha <abhis...@infoworks.io>
> > >> wrote:
> > >>>
> > >>> Attaching the scheduler crash logs as well.
> > >>>
> > >>> https://pastebin.com/B2WEJKRB
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> Regards,
> > >>>
> > >>> Abhishek Sinha | m: +919035191078 | e: abhis...@infoworks.io
> > >>>
> > >>>
> > >>> On Tue, Oct 30, 2018 at 12:19 AM Abhishek Sinha <
> abhis...@infoworks.io
> > >
> > >>> wrote:
> > >>>
> > >>>> Max,
> > >>>>
> > >>>> We always trigger the DAG externally. I am not sure if there is
> still
> > >> any
> > >>>> backfill involved.
> > >>>>
> > >>>> Is there a way where I can find out in logs, if more than one
> instance
> > >> of
> > >>>> scheduler is running?
> > >>>>
> > >>>>
> > >>>> On 29 October 2018 at 10:43:19 PM, Maxime Beauchemin (
> > >>>> maximebeauche...@gmail.com) wrote:
> > >>>>
> > >>>> The stacktrace seems to be pointing in that direction. Id check that
> > >>>> first. It seems like it **could** be a race condition with a
> backfill
> > as
> > >>>> well, unclear.
> > >>>>
> > >>>> It's still a bug though, and the scheduler should make sure to
> handle
> > >> this
> > >>>> and not raise/crash.
> > >>>>
> > >>>> On Mon, Oct 29, 2018, 10:05 AM Abhishek Sinha <
> abhis...@infoworks.io>
> > >>>> wrote:
> > >>>>
> > >>>>> Max,
> > >>>>>
> > >>>>> I do not think there was more than one instance of scheduler
> running.
> > >>>>> Since the scheduler crashed and it has been restarted, I cannot
> > >> confirm it
> > >>>>> now. Is there any log that can provide this information?
> > >>>>>
> > >>>>> Could there be a different cause apart from multiple scheduler
> > >> instances
> > >>>>> running?
> > >>>>>
> > >>>>>
> > >>>>> On 29 October 2018 at 9:30:56 PM, Maxime Beauchemin (
> > >>>>> maximebeauche...@gmail.com) wrote:
> > >>>>>
> > >>>>> Abhishek, are you running more than one scheduler instance at once?
> > >>>>>
> > >>>>> Max
> > >>>>>
> > >>>>> On Mon, Oct 29, 2018 at 8:17 AM Abhishek Sinha <
> > abhis...@infoworks.io>
> >
> > >>>>> wrote:
> > >>>>>
> > >>>>>> The issue is happening more frequently now. Can someone please
> look
> > >> into
> > >>>>>> this?
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On 24 September 2018 at 12:42:49 PM, Abhishek Sinha (
> > >>>>> abhis...@infoworks.io
> > >>>>>> )
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>> Can someone please help in looking into this issue? It is critical
> > >> since
> > >>>>>> this has come up in one of our production environment. Also, this
> > >> issue
> > >>>>> has
> > >>>>>> appeared only once till now.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>>
> > >>>>>> Abhishek
> > >>>>>>
> > >>>>>> On 20-Sep-2018, at 10:18 PM, Abhishek Sinha <
> abhis...@infoworks.io>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>> Any update on this?
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>>
> > >>>>>> Abhishek
> > >>>>>>
> > >>>>>> On 18-Sep-2018, at 12:48 AM, Abhishek Sinha <
> abhis...@infoworks.io>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>> Pastebin: https://pastebin.com/K6BMTb5K
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>>
> > >>>>>> Abhishek
> > >>>>>>
> > >>>>>> On 18-Sep-2018, at 12:31 AM, Stefan Seelmann <
> > m...@stefan-seelmann.de
> > >>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>> On 9/17/18 8:19 PM, Abhishek Sinha wrote:
> > >>>>>>
> > >>>>>> Any update on this?
> > >>>>>>
> > >>>>>> Please find the scheduler error log attached.
> > >>>>>>
> > >>>>>> Can you share the full python stack trace?
> > >>>>>>
> > >>>>>>
> > >>>>>> Seems the mailing list doesn't allow attachments. Either post the
> > >>>>>> stacktrace inline, or post it somewhere at pastebin or so.
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>
> >
>
>

Reply via email to