Deploy procedure for new/modify dags

2017-07-17 Thread Germain TANGUY
Hello everybody,

I would like to know what are your procedure to deploy new versions of your 
DAGs, especially for dags that have external dependencies (bash script..etc)
I use CeleryExecutor with multiples workers and so there is an issue of 
consistency between workers, schedulers and webserver.

Today I pause the dags, I wait until all running tasks complete, I restart all 
airflow services and unpause the dags. Is there a better way?

Best regards,

Germain T.



Scheduler ignores the changed start_date ?

2017-07-17 Thread Ashika Umanga Umagiliya
I used to have my start_date as '2017-03-14' for the DAG "cdna_daily_stg"
as follows :

default_args = {
'owner': 'cdna',
'depends_on_past': False,
'start_date': datetime(2017, 3, 14),
'email': ['some_email'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 5,
'retry_delay': timedelta(minutes=5),
'on_failure_callback': on_failure_callback,
'on_success_call': on_success
}

dag = DAG(
dag_id='cdna_daily_stg',
default_args=default_args,
schedule_interval="0 2 * * *"
)

Due to some code refactoring in my DAG, I wanted to change my DAG name to
'cdna_daily_stg_v2' so I changed by start_date as well to '2017-07-14' as
follows:

default_args = {
'owner': 'cdna',
'depends_on_past': False,
'start_date': datetime(2017, 7, 14),
'email': ['some_email'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 5,
'retry_delay': timedelta(minutes=5),
'on_failure_callback': on_failure_callback,
'on_success_call': on_success
}

dag = DAG(
dag_id='cdna_daily_stg_v2',
default_args=default_args,
schedule_interval="0 2 * * *"
)


But when I deploy my DAG with the new DAG , it keep starting DAG runs from
'2017-03-16' instead of from '2017-07-14'.

How to fix this ?
I still use same python file name, do I have to change the file names as
well as the DAG name ?


Dist Area permissions problem

2017-07-17 Thread John D. Ament
All,

Please be advised that the infra team is aware of a permission problem that
is affecting podlings ability to write to the incubator dist area.  This
may cause you to be unable to create staged releases and promote those to
the public mirrors.  You can track the status in
https://issues.apache.org/jira/browse/INFRA-14609 .

Likewise, I want to make the podlings aware of a permission problem from
the weekend where git permissions were a little off.  That has since been
fixed.

Apologies for any inconvenience.

John


Re: [VOTE] Release Airflow 1.8.2 based on Airflow 1.8.2 RC2

2017-07-17 Thread Hitesh Shah
The main artifact to vote upon on is a source release. This implies just
creating a clean tarball of the source codebase with an appropriate README,
INSTALL, LICENSE, DISCLAIMER, NOTICE (all of these should be checked in to
the codebase as the release should map to a specific commit hash which can
be re-used to create the source tarball again if needed). License and
Notice files should cover all files in the codebase.

You can also publish binary build(s) as a convenience ( assuming you want
to publish to pypi) - this would be a separate tarball and likely different
notice/license files (would suggest having these in the src repo too) as
the binary tarball will likely have different files.

thanks
-- Hitesh



On Mon, Jul 17, 2017 at 2:18 AM, Bolke de Bruin  wrote:

> Great!
>
> It is also a bit new to me so maybe @Hitesh @Jakob can help with some
> guidance here?
>
> But my assumption indeed is:
>
> 1. Make a tarball from the repo with build instructions (including a
> working License check!) -> Vote here and IPMC. This is the “official”
> release.
> 2. Make sdist for redistribution on PyPi
>
> Bdist isn’t required.
>
> Cheers
> Bolke.
>
> > On 17 Jul 2017, at 06:27, Maxime Beauchemin 
> wrote:
> >
> > I've been slammed but skies are clearing up now I'm hoping.
> >
> > Reading the general@ thread I'm unclear about the next steps, targz the
> > whole repo and add build instructions? What should the file with the
> build
> > instructions be called? How to label that new tarbar? Can we skip the
> bdist?
> >
> > Max
> >
> > On Sun, Jul 16, 2017 at 12:35 PM, Bolke de Bruin  > wrote:
> >
> >> Max, Ping? Do you need help?
> >>
> >>> On 9 Jul 2017, at 14:30, Bolke de Bruin  wrote:
> >>>
> >>> Hi Max,
> >>>
> >>> The canonical distribution would be what we have in git right now (ie.
> >> before running python sdist). The rest is just convenience packages. So
> >> npm, would solve the issue as long as we don’t rely on any non APL
> >> compatible packages in core. I don’t think npm/yarn/webpack needs to be
> >> done for 1.8.3, but considering the messy javascript that we currently
> have
> >> it would be nice to put it on the todo.
> >>>
> >>> Cheers
> >>> Bolke
> >>>
>  On 9 Jul 2017, at 06:46, Maxime Beauchemin <
> maximebeauche...@gmail.com
> >> >>
> wrote:
> 
>  As far as I understand npm would not solve the problem as typically
> we'd
>  build our "entry" files and distribute that with Airflow as static
> >> files.
>  Those entry files would contain these other npm packages, minified.
> >> (from
>  my understanding that is the same issue as packaging the libs
> >> themselves)
> 
>  To make them runtime deps would be atypical and more complicated.
> >> `airflow
>  webserver` would need to "build" (npm install/webpack) and the
> webserver
>  would have to serve these static files out of some that temp location
>  (perhaps ~/.airflow/airflow.entry.js) as opposed to out of
> >> `site-packages`.
> 
>  Also note that Airflow's javascript is in pretty bad shape (scattered
> in
>  jinja templates files) and it would take quite a significant amount of
> >> work
>  to move to using npm/webpack.
> 
>  I'm back from vacation and will have things to catch up on next week
> but
>  I'll try to find time to look into some of this.
> 
>  On Thu, Jul 6, 2017 at 1:10 PM, Bolke de Bruin  
> >> >> wrote:
> 
> > Hi Folks,
> >
> > We probably need to adjust our release process as can be observed in
> >> the
> > IPMC thread. As we are packaging a “sdist” it does not pass license
> >> checks
> > and one cannot verify the validity of what we are doing. It was
> >> suggested
> > by one of the maintainers of another python project to create 3
> >> different
> > packages:
> >
> > 1. A source tarball which is essentially a snapshot of the repository
> > 2. A sdist
> > 3. A bdist
> >
> > 1 should then be the canonical Apache release. It should be
> >> accompanied by
> > build instructions and it should pass RAT checks. This is the package
> >> we
> > will vote on.
> > 2 is what we have voted upon until now. It should contain (it does)
> > LICENSE, NOTICE, and DISCLAIMER
> > 3 bdist, wheel package. Same as 2. Not really required, but more a
> > convenience package as is 2
> >
> > 2 and 3 can be published to PyPi.
> >
> > Max: can you take care of this? We need to vote on 1.  Build
> >> instruction
> > could be added to an INSTALL file or just to the README.md file? See
> >> for
> > inspiration the GitHub page of ariatosca: https://github.com/apache/
>  <
> >> https://github.com/apache/ >
> > incubator-ariatosca 

Re: AIRFLOW-1258

2017-07-17 Thread Jawahar Panchal
Hi again!


> On Jul 16, 2017, at 3:22 AM, Alex Guziel  
> wrote:
> 
> I think this may be related to a celery bug. I'll follow up with more
> details later.
> 

Just replying back to the note earlier in the thread - apologies for the 
earlier top-posting, got a bit excited that I might have found the issue, and 
of course lack of sleep results in one doing terrible, terrible things… :)

Any idea if my suspicion around the 1h default visibility timeout between 
celery/redis is the culprit?

> On Sun, Jul 16, 2017 at 12:56 AM Jawahar Panchal 
> wrote:
> 
>> Hi!
>> 
>> I am currently running a couple of long-running tasks on a
>> database/dataset at school for a project that results in behavior/log
>> output similar to what was flagged in this bug:
>> https://issues.apache.org/jira/browse/AIRFLOW-1258 <
>> https://issues.apache.org/jira/browse/AIRFLOW-1258>
>> 
>> Wasn’t sure if anyone on the list had seen anything similar, or would know
>> what I can do to possibly debug further/patch. As it takes 1hr to test a
>> change, needless to say any pointers from the dev team on the right
>> direction to look within the codebase would be much appreciated! :)
>> 
>> Thanks in advance for everyone’s/anyone's time and help - am not an
>> Airflow expert, but am hopefully learning quickly enough to help resolve
>> this issue (if I am ‘barking up the right tree’ with this bug number…)
>> 
>> Cheers,
>> J
>> 
>> 

Cheers,
J




Re: dagrun_timeout not working

2017-07-17 Thread Ben Schoener
Good to know, thanks.

Is there a recommended way to get dags to timeout?

Ben

On 2017-07-17 12:18 (-0400), Arthur Purvis  wrote: 
> dagrun_timeout is one of the many totally broken features that should be
> avoided like the plague.
> 
> On Mon, Jul 17, 2017 at 10:36 AM, Ben Schoener  wrote:
> 
> > I instantiate my dags with a timeout (dagrun_timeout=timedelta(hours=1)),
> > but this doesn't seem to have any effect. My DAGs can run for well over 1
> > hour without timing out. Is there anything obvious I might be missing here?
> > Has anyone had success using dagrun_timeout? I'm running airflow 1.8.0.
> >
> >
> > Thanks,
> >
> > Ben
> >
> 


Re: dagrun_timeout not working

2017-07-17 Thread Bolke de Bruin
Do new tasks get scheduled if the timeout has passed? The way the functionality 
works is that it sets the DagRun to failed so it will not schedule new tasks, 
but it will not fail tasks that are currently running.

Bolke

> On 17 Jul 2017, at 16:36, Ben Schoener  wrote:
> 
> I instantiate my dags with a timeout (dagrun_timeout=timedelta(hours=1)), but 
> this doesn't seem to have any effect. My DAGs can run for well over 1 hour 
> without timing out. Is there anything obvious I might be missing here? Has 
> anyone had success using dagrun_timeout? I'm running airflow 1.8.0.
> 
> 
> Thanks,
> 
> Ben



Re: dagrun_timeout not working

2017-07-17 Thread Arthur Purvis
dagrun_timeout is one of the many totally broken features that should be
avoided like the plague.

On Mon, Jul 17, 2017 at 10:36 AM, Ben Schoener  wrote:

> I instantiate my dags with a timeout (dagrun_timeout=timedelta(hours=1)),
> but this doesn't seem to have any effect. My DAGs can run for well over 1
> hour without timing out. Is there anything obvious I might be missing here?
> Has anyone had success using dagrun_timeout? I'm running airflow 1.8.0.
>
>
> Thanks,
>
> Ben
>


dagrun_timeout not working

2017-07-17 Thread Ben Schoener
I instantiate my dags with a timeout (dagrun_timeout=timedelta(hours=1)), but 
this doesn't seem to have any effect. My DAGs can run for well over 1 hour 
without timing out. Is there anything obvious I might be missing here? Has 
anyone had success using dagrun_timeout? I'm running airflow 1.8.0.


Thanks,

Ben


Re: [VOTE] Release Airflow 1.8.2 based on Airflow 1.8.2 RC2

2017-07-17 Thread Bolke de Bruin
Great!

It is also a bit new to me so maybe @Hitesh @Jakob can help with some guidance 
here?

But my assumption indeed is:

1. Make a tarball from the repo with build instructions (including a working 
License check!) -> Vote here and IPMC. This is the “official” release.
2. Make sdist for redistribution on PyPi

Bdist isn’t required.

Cheers
Bolke.

> On 17 Jul 2017, at 06:27, Maxime Beauchemin  
> wrote:
> 
> I've been slammed but skies are clearing up now I'm hoping.
> 
> Reading the general@ thread I'm unclear about the next steps, targz the
> whole repo and add build instructions? What should the file with the build
> instructions be called? How to label that new tarbar? Can we skip the bdist?
> 
> Max
> 
> On Sun, Jul 16, 2017 at 12:35 PM, Bolke de Bruin  > wrote:
> 
>> Max, Ping? Do you need help?
>> 
>>> On 9 Jul 2017, at 14:30, Bolke de Bruin  wrote:
>>> 
>>> Hi Max,
>>> 
>>> The canonical distribution would be what we have in git right now (ie.
>> before running python sdist). The rest is just convenience packages. So
>> npm, would solve the issue as long as we don’t rely on any non APL
>> compatible packages in core. I don’t think npm/yarn/webpack needs to be
>> done for 1.8.3, but considering the messy javascript that we currently have
>> it would be nice to put it on the todo.
>>> 
>>> Cheers
>>> Bolke
>>> 
 On 9 Jul 2017, at 06:46, Maxime Beauchemin > >> 
>> wrote:
 
 As far as I understand npm would not solve the problem as typically we'd
 build our "entry" files and distribute that with Airflow as static
>> files.
 Those entry files would contain these other npm packages, minified.
>> (from
 my understanding that is the same issue as packaging the libs
>> themselves)
 
 To make them runtime deps would be atypical and more complicated.
>> `airflow
 webserver` would need to "build" (npm install/webpack) and the webserver
 would have to serve these static files out of some that temp location
 (perhaps ~/.airflow/airflow.entry.js) as opposed to out of
>> `site-packages`.
 
 Also note that Airflow's javascript is in pretty bad shape (scattered in
 jinja templates files) and it would take quite a significant amount of
>> work
 to move to using npm/webpack.
 
 I'm back from vacation and will have things to catch up on next week but
 I'll try to find time to look into some of this.
 
 On Thu, Jul 6, 2017 at 1:10 PM, Bolke de Bruin >>> 
>> >> wrote:
 
> Hi Folks,
> 
> We probably need to adjust our release process as can be observed in
>> the
> IPMC thread. As we are packaging a “sdist” it does not pass license
>> checks
> and one cannot verify the validity of what we are doing. It was
>> suggested
> by one of the maintainers of another python project to create 3
>> different
> packages:
> 
> 1. A source tarball which is essentially a snapshot of the repository
> 2. A sdist
> 3. A bdist
> 
> 1 should then be the canonical Apache release. It should be
>> accompanied by
> build instructions and it should pass RAT checks. This is the package
>> we
> will vote on.
> 2 is what we have voted upon until now. It should contain (it does)
> LICENSE, NOTICE, and DISCLAIMER
> 3 bdist, wheel package. Same as 2. Not really required, but more a
> convenience package as is 2
> 
> 2 and 3 can be published to PyPi.
> 
> Max: can you take care of this? We need to vote on 1.  Build
>> instruction
> could be added to an INSTALL file or just to the README.md file? See
>> for
> inspiration the GitHub page of ariatosca: https://github.com/apache/ 
>  <
>> https://github.com/apache/ >
> incubator-ariatosca   <
>> https://github.com/apache/incubator-ariatosca 
>> >>
> 
> In the meantime I am figuring out an issue with one of the
>> dependencies of
> nvd3 which might be or have been GPL3 which is incompatible with the
>> APL,
> which we are distributing together with our source. Ideally, we should
>> move
> to a “yarn/webpack” build which will resolve those issues
>> automatically as
> these become runtime dependencies then in case of 1,2,3.
> 
> Cheers
> Bolke
> 
> 
> 
>> On 6 Jul 2017, at 05:20, Sumit Maheshwari > 
>> >>
> wrote:
>> 
>> Awesome.. thanks a lot Max for taking the RM responsibility..
>> 
>> On Jul 5, 2017 11:10 PM, "Chris Riccomini" > 
>>