Wait, the title of this thread is "Duplicate key unique constraint error", to me that screams that something is not ok. If the check+insert was atomic (insulated) this error wouldn't happen. Also I'm pretty sure when I looked the stack trace looked like a scheduler-specific stack trace. It may be a rare race condition, but doesn't the stack trace prove the existence of a race condition?
Max On Fri, Nov 2, 2018 at 10:19 AM Abhishek Sinha <abhis...@infoworks.io> wrote: > Max, > > If check+insert works correctly, then even multiple instances of scheduler > running in parallel should not throw this error. I am not sure then when > can this error happen. > > > > On 2 November 2018 at 8:37:20 AM, Maxime Beauchemin ( > maximebeauche...@gmail.com) wrote: > > The scheduler should never fail hard. The schedule logic that tries to > insert the new task instance should only insert a new one if it doesn't > exist already and isolate that check+insert inside a database transaction. > > Max > > On Fri, Nov 2, 2018 at 5:38 AM Abhishek Sinha <abhis...@infoworks.io> > wrote: > > > Brian, > > > > We use the trigger dag CLI command to trigger it manually. > > > > Even when you have custom operators, the duplicate key error should not > > happen right? Shouldn't the combination of task id, dag id and execution > > date be unique? > > > > > > On 30 October 2018 at 10:23:27 PM, Abhishek Sinha (abhis...@infoworks.io) > > > wrote: > > > > Max, > > > > The schedule interval is 1 day. > > > > > > > > Sent from my iPhone > > > > > On 30-Oct-2018, at 9:29 PM, Maxime Beauchemin < > > maximebeauche...@gmail.com> > > wrote: > > > > > > Also what's your schedule interval? I'm just trying to confirm that > this > > > isn't a "run every minute, or anytime someone blinks" kind of DAG. > > > > > > Max > > > > > > On Tue, Oct 30, 2018 at 5:48 AM Brian Greene < > > > br...@heisenbergwoodworking.com> wrote: > > > > > >> How do you trigger it externally? > > >> > > >> We have several custom operators that trigger other jobs and we had > to > > be > > >> really careful or we’d get duplicate keys for the dag run and it > would > > fail > > >> to kick off. > > >> > > >> One scheduler, but we saw it repeatedly and have it noted as a thing > to > > >> watch out for. > > >> > > >> Brian > > >> > > >> Sent from a device with less than stellar autocorrect > > >> > > >>> On Oct 29, 2018, at 2:03 PM, Abhishek Sinha <abhis...@infoworks.io> > > >> wrote: > > >>> > > >>> Attaching the scheduler crash logs as well. > > >>> > > >>> https://pastebin.com/B2WEJKRB > > >>> > > >>> > > >>> > > >>> > > >>> Regards, > > >>> > > >>> Abhishek Sinha | m: +919035191078 | e: abhis...@infoworks.io > > >>> > > >>> > > >>> On Tue, Oct 30, 2018 at 12:19 AM Abhishek Sinha < > abhis...@infoworks.io > > > > > >>> wrote: > > >>> > > >>>> Max, > > >>>> > > >>>> We always trigger the DAG externally. I am not sure if there is > still > > >> any > > >>>> backfill involved. > > >>>> > > >>>> Is there a way where I can find out in logs, if more than one > instance > > >> of > > >>>> scheduler is running? > > >>>> > > >>>> > > >>>> On 29 October 2018 at 10:43:19 PM, Maxime Beauchemin ( > > >>>> maximebeauche...@gmail.com) wrote: > > >>>> > > >>>> The stacktrace seems to be pointing in that direction. Id check > that > > >>>> first. It seems like it **could** be a race condition with a > backfill > > as > > >>>> well, unclear. > > >>>> > > >>>> It's still a bug though, and the scheduler should make sure to > handle > > >> this > > >>>> and not raise/crash. > > >>>> > > >>>> On Mon, Oct 29, 2018, 10:05 AM Abhishek Sinha < > abhis...@infoworks.io> > > >>>> wrote: > > >>>> > > >>>>> Max, > > >>>>> > > >>>>> I do not think there was more than one instance of scheduler > running. > > >>>>> Since the scheduler crashed and it has been restarted, I cannot > > >> confirm it > > >>>>> now. Is there any log that can provide this information? > > >>>>> > > >>>>> Could there be a different cause apart from multiple scheduler > > >> instances > > >>>>> running? > > >>>>> > > >>>>> > > >>>>> On 29 October 2018 at 9:30:56 PM, Maxime Beauchemin ( > > >>>>> maximebeauche...@gmail.com) wrote: > > >>>>> > > >>>>> Abhishek, are you running more than one scheduler instance at > once? > > >>>>> > > >>>>> Max > > >>>>> > > >>>>> On Mon, Oct 29, 2018 at 8:17 AM Abhishek Sinha < > > abhis...@infoworks.io> > > > > >>>>> wrote: > > >>>>> > > >>>>>> The issue is happening more frequently now. Can someone please > look > > >> into > > >>>>>> this? > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> On 24 September 2018 at 12:42:49 PM, Abhishek Sinha ( > > >>>>> abhis...@infoworks.io > > >>>>>> ) > > >>>>>> wrote: > > >>>>>> > > >>>>>> Can someone please help in looking into this issue? It is > critical > > >> since > > >>>>>> this has come up in one of our production environment. Also, this > > >> issue > > >>>>> has > > >>>>>> appeared only once till now. > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> Regards, > > >>>>>> > > >>>>>> Abhishek > > >>>>>> > > >>>>>> On 20-Sep-2018, at 10:18 PM, Abhishek Sinha < > abhis...@infoworks.io> > > >>>>> wrote: > > >>>>>> > > >>>>>> Any update on this? > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> Regards, > > >>>>>> > > >>>>>> Abhishek > > >>>>>> > > >>>>>> On 18-Sep-2018, at 12:48 AM, Abhishek Sinha < > abhis...@infoworks.io> > > >>>>> wrote: > > >>>>>> > > >>>>>> Pastebin: https://pastebin.com/K6BMTb5K > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> Regards, > > >>>>>> > > >>>>>> Abhishek > > >>>>>> > > >>>>>> On 18-Sep-2018, at 12:31 AM, Stefan Seelmann < > > m...@stefan-seelmann.de > > >>> > > >>>>>> wrote: > > >>>>>> > > >>>>>> On 9/17/18 8:19 PM, Abhishek Sinha wrote: > > >>>>>> > > >>>>>> Any update on this? > > >>>>>> > > >>>>>> Please find the scheduler error log attached. > > >>>>>> > > >>>>>> Can you share the full python stack trace? > > >>>>>> > > >>>>>> > > >>>>>> Seems the mailing list doesn't allow attachments. Either post the > > >>>>>> stacktrace inline, or post it somewhere at pastebin or so. > > >>>>>> > > >>>>> > > >>>>> > > >> > > > >