Re: 1.7.1 release status

2016-04-27 Thread Dan Davydov
All of the blockers were fixed as of yesterday (there was some issue that
Jeremiah was looking at with the last release candidate which I think is
fixed but I'm not sure). I started staging the airbnb_1.7.1rc3 tag earlier
today, so as long as metrics look OK and the 1.7.1rc2 issues seem resolved
tomorrow I will release internally either tomorrow or Monday (we try to
avoid releases on Friday). If there aren't any issues we can push the 1.7.1
tag on Monday/Tuesday.

@Sid
I think we were originally aiming to deploy internally once every two weeks
but we decided to do it once a month in the end. I'm not too sure about
that so Max can comment there.

We have been running 1.7.0 in production for about a month now and it
stable.

I think what really slowed down this release cycle is some commits that
caused severe bugs that we decided to roll-forward with instead of rolling
back. We can potentially try reverting these commits next time while the
fixes are applied for the next version, although this is not always trivial
to do.

On Wed, Apr 27, 2016 at 9:31 PM, Siddharth Anand <
siddharthan...@yahoo.com.invalid> wrote:

> Btw, is anyone of the committers running 1.7.0 or later in any staging or
> production env? I have to say that given that 1.6.2 was the most stable
> release and is 4 or more months old does not say much for our release
> cadence or process. What's our plan for 1.7.1?
>
> Sent from Sid's iPhone
>
> > On Apr 27, 2016, at 9:05 PM, Chris Riccomini 
> wrote:
> >
> > Hey all,
> >
> > I just wanted to check in on the 1.7.1 release status. I know there have
> > been some major-ish bugs, as well as several people doing tests. Should
> we
> > create a 1.7.1 release JIRA, and track outstanding issues there?
> >
> > Cheers,
> > Chris
>
>


Re: 1.7.1 release status

2016-04-28 Thread Dan Davydov
Definitely, here were the issues we hit:
- airbnb/airflow#1365 occured
- Webservers/scheduler were timing out and stuck in restart cycles due to
increased time spent on parsing DAGs due to airbnb/airflow#1213/files
- Failed tasks that ran after the upgrade and the revert (after we reverted
the upgrade) were unable to be cleared (but running the tasks through the
UI worked without clearing them)
- The way log files were stored on S3 was changed (airflow now requires a
connection to be setup) which broke log storage
- Some DAGs were broken (unable to be parsed) due to package reorganization
in open-source (the import paths were changed) (the utils refactor commit)

On Thu, Apr 28, 2016 at 12:17 AM, Bolke de Bruin  wrote:

> Dan,
>
> Are you able to share some of the bugs you have been hitting and connected
> commits?
>
> We could at the very least learn from them and maybe even improve testing.
>
> Bolke
>
>
> > Op 28 apr. 2016, om 06:51 heeft Dan Davydov
>  het volgende geschreven:
> >
> > All of the blockers were fixed as of yesterday (there was some issue that
> > Jeremiah was looking at with the last release candidate which I think is
> > fixed but I'm not sure). I started staging the airbnb_1.7.1rc3 tag
> earlier
> > today, so as long as metrics look OK and the 1.7.1rc2 issues seem
> resolved
> > tomorrow I will release internally either tomorrow or Monday (we try to
> > avoid releases on Friday). If there aren't any issues we can push the
> 1.7.1
> > tag on Monday/Tuesday.
> >
> > @Sid
> > I think we were originally aiming to deploy internally once every two
> weeks
> > but we decided to do it once a month in the end. I'm not too sure about
> > that so Max can comment there.
> >
> > We have been running 1.7.0 in production for about a month now and it
> > stable.
> >
> > I think what really slowed down this release cycle is some commits that
> > caused severe bugs that we decided to roll-forward with instead of
> rolling
> > back. We can potentially try reverting these commits next time while the
> > fixes are applied for the next version, although this is not always
> trivial
> > to do.
> >
> > On Wed, Apr 27, 2016 at 9:31 PM, Siddharth Anand <
> > siddharthan...@yahoo.com.invalid> wrote:
> >
> >> Btw, is anyone of the committers running 1.7.0 or later in any staging
> or
> >> production env? I have to say that given that 1.6.2 was the most stable
> >> release and is 4 or more months old does not say much for our release
> >> cadence or process. What's our plan for 1.7.1?
> >>
> >> Sent from Sid's iPhone
> >>
> >>> On Apr 27, 2016, at 9:05 PM, Chris Riccomini 
> >> wrote:
> >>>
> >>> Hey all,
> >>>
> >>> I just wanted to check in on the 1.7.1 release status. I know there
> have
> >>> been some major-ish bugs, as well as several people doing tests. Should
> >> we
> >>> create a 1.7.1 release JIRA, and track outstanding issues there?
> >>>
> >>> Cheers,
> >>> Chris
> >>
> >>
>
>


Re: 1.7.1 release status

2016-05-02 Thread Dan Davydov
So a quick update, unfortunately we saw some DAGBag parsing time increases
(~10x for some DAGs) on the webservers with the 1.7.1rc3. Because of this I
will be working on a staging cluster that has a copy of our production
production DAGBag, and is a copy of our production airflow infrastructure,
just without the workers. This will let us debug the release outside of
production.

On Thu, Apr 28, 2016 at 10:20 AM, Dan Davydov 
wrote:

> Definitely, here were the issues we hit:
> - airbnb/airflow#1365 occured
> - Webservers/scheduler were timing out and stuck in restart cycles due to
> increased time spent on parsing DAGs due to airbnb/airflow#1213/files
> - Failed tasks that ran after the upgrade and the revert (after we
> reverted the upgrade) were unable to be cleared (but running the tasks
> through the UI worked without clearing them)
> - The way log files were stored on S3 was changed (airflow now requires a
> connection to be setup) which broke log storage
> - Some DAGs were broken (unable to be parsed) due to package
> reorganization in open-source (the import paths were changed) (the utils
> refactor commit)
>
> On Thu, Apr 28, 2016 at 12:17 AM, Bolke de Bruin 
> wrote:
>
>> Dan,
>>
>> Are you able to share some of the bugs you have been hitting and
>> connected commits?
>>
>> We could at the very least learn from them and maybe even improve testing.
>>
>> Bolke
>>
>>
>> > Op 28 apr. 2016, om 06:51 heeft Dan Davydov
>>  het volgende geschreven:
>> >
>> > All of the blockers were fixed as of yesterday (there was some issue
>> that
>> > Jeremiah was looking at with the last release candidate which I think is
>> > fixed but I'm not sure). I started staging the airbnb_1.7.1rc3 tag
>> earlier
>> > today, so as long as metrics look OK and the 1.7.1rc2 issues seem
>> resolved
>> > tomorrow I will release internally either tomorrow or Monday (we try to
>> > avoid releases on Friday). If there aren't any issues we can push the
>> 1.7.1
>> > tag on Monday/Tuesday.
>> >
>> > @Sid
>> > I think we were originally aiming to deploy internally once every two
>> weeks
>> > but we decided to do it once a month in the end. I'm not too sure about
>> > that so Max can comment there.
>> >
>> > We have been running 1.7.0 in production for about a month now and it
>> > stable.
>> >
>> > I think what really slowed down this release cycle is some commits that
>> > caused severe bugs that we decided to roll-forward with instead of
>> rolling
>> > back. We can potentially try reverting these commits next time while the
>> > fixes are applied for the next version, although this is not always
>> trivial
>> > to do.
>> >
>> > On Wed, Apr 27, 2016 at 9:31 PM, Siddharth Anand <
>> > siddharthan...@yahoo.com.invalid> wrote:
>> >
>> >> Btw, is anyone of the committers running 1.7.0 or later in any staging
>> or
>> >> production env? I have to say that given that 1.6.2 was the most stable
>> >> release and is 4 or more months old does not say much for our release
>> >> cadence or process. What's our plan for 1.7.1?
>> >>
>> >> Sent from Sid's iPhone
>> >>
>> >>> On Apr 27, 2016, at 9:05 PM, Chris Riccomini 
>> >> wrote:
>> >>>
>> >>> Hey all,
>> >>>
>> >>> I just wanted to check in on the 1.7.1 release status. I know there
>> have
>> >>> been some major-ish bugs, as well as several people doing tests.
>> Should
>> >> we
>> >>> create a 1.7.1 release JIRA, and track outstanding issues there?
>> >>>
>> >>> Cheers,
>> >>> Chris
>> >>
>> >>
>>
>>
>


Re: 1.7.1 release status

2016-05-03 Thread Dan Davydov
It's per DAG unfortunately (we have some pretty funky DAGs here).
On May 2, 2016 10:26 PM, "Bolke de Bruin"  wrote:

> Hi dan
>
> Is that per dag or per dag bag? Multiprocessing should parallelize dag
> parsing so I am very curious. Let me know if I can help out.
> Bolke
>
> Sent from my iPhone
>
> > On 3 mei 2016, at 01:47, Dan Davydov 
> wrote:
> >
> > So a quick update, unfortunately we saw some DAGBag parsing time
> increases
> > (~10x for some DAGs) on the webservers with the 1.7.1rc3. Because of
> this I
> > will be working on a staging cluster that has a copy of our production
> > production DAGBag, and is a copy of our production airflow
> infrastructure,
> > just without the workers. This will let us debug the release outside of
> > production.
> >
> > On Thu, Apr 28, 2016 at 10:20 AM, Dan Davydov 
> > wrote:
> >
> >> Definitely, here were the issues we hit:
> >> - airbnb/airflow#1365 occured
> >> - Webservers/scheduler were timing out and stuck in restart cycles due
> to
> >> increased time spent on parsing DAGs due to airbnb/airflow#1213/files
> >> - Failed tasks that ran after the upgrade and the revert (after we
> >> reverted the upgrade) were unable to be cleared (but running the tasks
> >> through the UI worked without clearing them)
> >> - The way log files were stored on S3 was changed (airflow now requires
> a
> >> connection to be setup) which broke log storage
> >> - Some DAGs were broken (unable to be parsed) due to package
> >> reorganization in open-source (the import paths were changed) (the utils
> >> refactor commit)
> >>
> >> On Thu, Apr 28, 2016 at 12:17 AM, Bolke de Bruin 
> >> wrote:
> >>
> >>> Dan,
> >>>
> >>> Are you able to share some of the bugs you have been hitting and
> >>> connected commits?
> >>>
> >>> We could at the very least learn from them and maybe even improve
> testing.
> >>>
> >>> Bolke
> >>>
> >>>
> >>>>> Op 28 apr. 2016, om 06:51 heeft Dan Davydov
> >>>>  het volgende geschreven:
> >>>>
> >>>> All of the blockers were fixed as of yesterday (there was some issue
> >>> that
> >>>> Jeremiah was looking at with the last release candidate which I think
> is
> >>>> fixed but I'm not sure). I started staging the airbnb_1.7.1rc3 tag
> >>> earlier
> >>>> today, so as long as metrics look OK and the 1.7.1rc2 issues seem
> >>> resolved
> >>>> tomorrow I will release internally either tomorrow or Monday (we try
> to
> >>>> avoid releases on Friday). If there aren't any issues we can push the
> >>> 1.7.1
> >>>> tag on Monday/Tuesday.
> >>>>
> >>>> @Sid
> >>>> I think we were originally aiming to deploy internally once every two
> >>> weeks
> >>>> but we decided to do it once a month in the end. I'm not too sure
> about
> >>>> that so Max can comment there.
> >>>>
> >>>> We have been running 1.7.0 in production for about a month now and it
> >>>> stable.
> >>>>
> >>>> I think what really slowed down this release cycle is some commits
> that
> >>>> caused severe bugs that we decided to roll-forward with instead of
> >>> rolling
> >>>> back. We can potentially try reverting these commits next time while
> the
> >>>> fixes are applied for the next version, although this is not always
> >>> trivial
> >>>> to do.
> >>>>
> >>>> On Wed, Apr 27, 2016 at 9:31 PM, Siddharth Anand <
> >>>> siddharthan...@yahoo.com.invalid> wrote:
> >>>>
> >>>>> Btw, is anyone of the committers running 1.7.0 or later in any
> staging
> >>> or
> >>>>> production env? I have to say that given that 1.6.2 was the most
> stable
> >>>>> release and is 4 or more months old does not say much for our release
> >>>>> cadence or process. What's our plan for 1.7.1?
> >>>>>
> >>>>> Sent from Sid's iPhone
> >>>>>
> >>>>>>> On Apr 27, 2016, at 9:05 PM, Chris Riccomini <
> criccom...@apache.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hey all,
> >>>>>>
> >>>>>> I just wanted to check in on the 1.7.1 release status. I know there
> >>> have
> >>>>>> been some major-ish bugs, as well as several people doing tests.
> >>> Should
> >>>>> we
> >>>>>> create a 1.7.1 release JIRA, and track outstanding issues there?
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Chris
> >>
>


Re: 1.7.1 release status

2016-05-05 Thread Dan Davydov
Moved discussion to https://issues.apache.org/jira/browse/AIRFLOW-52 and
updated the status of the task there.

On Tue, May 3, 2016 at 2:32 AM, Dan Davydov  wrote:

> It's per DAG unfortunately (we have some pretty funky DAGs here).
> On May 2, 2016 10:26 PM, "Bolke de Bruin"  wrote:
>
>> Hi dan
>>
>> Is that per dag or per dag bag? Multiprocessing should parallelize dag
>> parsing so I am very curious. Let me know if I can help out.
>> Bolke
>>
>> Sent from my iPhone
>>
>> > On 3 mei 2016, at 01:47, Dan Davydov 
>> wrote:
>> >
>> > So a quick update, unfortunately we saw some DAGBag parsing time
>> increases
>> > (~10x for some DAGs) on the webservers with the 1.7.1rc3. Because of
>> this I
>> > will be working on a staging cluster that has a copy of our production
>> > production DAGBag, and is a copy of our production airflow
>> infrastructure,
>> > just without the workers. This will let us debug the release outside of
>> > production.
>> >
>> > On Thu, Apr 28, 2016 at 10:20 AM, Dan Davydov 
>> > wrote:
>> >
>> >> Definitely, here were the issues we hit:
>> >> - airbnb/airflow#1365 occured
>> >> - Webservers/scheduler were timing out and stuck in restart cycles due
>> to
>> >> increased time spent on parsing DAGs due to airbnb/airflow#1213/files
>> >> - Failed tasks that ran after the upgrade and the revert (after we
>> >> reverted the upgrade) were unable to be cleared (but running the tasks
>> >> through the UI worked without clearing them)
>> >> - The way log files were stored on S3 was changed (airflow now
>> requires a
>> >> connection to be setup) which broke log storage
>> >> - Some DAGs were broken (unable to be parsed) due to package
>> >> reorganization in open-source (the import paths were changed) (the
>> utils
>> >> refactor commit)
>> >>
>> >> On Thu, Apr 28, 2016 at 12:17 AM, Bolke de Bruin 
>> >> wrote:
>> >>
>> >>> Dan,
>> >>>
>> >>> Are you able to share some of the bugs you have been hitting and
>> >>> connected commits?
>> >>>
>> >>> We could at the very least learn from them and maybe even improve
>> testing.
>> >>>
>> >>> Bolke
>> >>>
>> >>>
>> >>>>> Op 28 apr. 2016, om 06:51 heeft Dan Davydov
>> >>>>  het volgende geschreven:
>> >>>>
>> >>>> All of the blockers were fixed as of yesterday (there was some issue
>> >>> that
>> >>>> Jeremiah was looking at with the last release candidate which I
>> think is
>> >>>> fixed but I'm not sure). I started staging the airbnb_1.7.1rc3 tag
>> >>> earlier
>> >>>> today, so as long as metrics look OK and the 1.7.1rc2 issues seem
>> >>> resolved
>> >>>> tomorrow I will release internally either tomorrow or Monday (we try
>> to
>> >>>> avoid releases on Friday). If there aren't any issues we can push the
>> >>> 1.7.1
>> >>>> tag on Monday/Tuesday.
>> >>>>
>> >>>> @Sid
>> >>>> I think we were originally aiming to deploy internally once every two
>> >>> weeks
>> >>>> but we decided to do it once a month in the end. I'm not too sure
>> about
>> >>>> that so Max can comment there.
>> >>>>
>> >>>> We have been running 1.7.0 in production for about a month now and it
>> >>>> stable.
>> >>>>
>> >>>> I think what really slowed down this release cycle is some commits
>> that
>> >>>> caused severe bugs that we decided to roll-forward with instead of
>> >>> rolling
>> >>>> back. We can potentially try reverting these commits next time while
>> the
>> >>>> fixes are applied for the next version, although this is not always
>> >>> trivial
>> >>>> to do.
>> >>>>
>> >>>> On Wed, Apr 27, 2016 at 9:31 PM, Siddharth Anand <
>> >>>> siddharthan...@yahoo.com.invalid> wrote:
>> >>>>
>> >>>>> Btw, is anyone of the committers running 1.7.0 or later in any
>> staging
>> >>> or
>> >>>>> production env? I have to say that given that 1.6.2 was the most
>> stable
>> >>>>> release and is 4 or more months old does not say much for our
>> release
>> >>>>> cadence or process. What's our plan for 1.7.1?
>> >>>>>
>> >>>>> Sent from Sid's iPhone
>> >>>>>
>> >>>>>>> On Apr 27, 2016, at 9:05 PM, Chris Riccomini <
>> criccom...@apache.org>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>> Hey all,
>> >>>>>>
>> >>>>>> I just wanted to check in on the 1.7.1 release status. I know there
>> >>> have
>> >>>>>> been some major-ish bugs, as well as several people doing tests.
>> >>> Should
>> >>>>> we
>> >>>>>> create a 1.7.1 release JIRA, and track outstanding issues there?
>> >>>>>>
>> >>>>>> Cheers,
>> >>>>>> Chris
>> >>
>>
>


Re: Blocker Bug on Latest airbnb_1.7.1 Release Candidate

2016-05-11 Thread Dan Davydov
Thanks for finding this Sid, will wait on this before staging.
On May 11, 2016 8:04 PM, "Siddharth Anand" 
wrote:

> Hi Folks!There is a bug on current Master that might be related to a
> commit on April 4. I need to confirm with the committer. If so, then we
> have another blocker bug on the release candidate.
> https://issues.apache.org/jira/browse/AIRFLOW-106
>
> In a nutshell:Task retries, on_failure callback, and email_on_failure are
> not honored if the first task in a DAG fails. The bug presents itself when
> task retries are enabled and the first task in a DAG run fails one time. If
> this happens, then retries are not honored on the task that only failed one
> time. The task will be left in a permanent "UP_FOR_RETRY" state. As a
> result, failure hooks will not be executed for that failed task including
> email notification that the task failed.
> -s


Re: Voting Changes for Scheduler-related PRs/Commits

2016-05-12 Thread Dan Davydov
@Jakob
What if we made it more generic, e.g. a +1 from any commiter from a company
that is running at a certain scale (e.g. at least X workers) and willing to
help stage releases in their prods until we have more comprehensive test
coverage/an open source staging environment? This is in Airflow's best
interests as otherwise stability will suffer.

On Thu, May 12, 2016 at 1:44 PM, Chris Riccomini 
wrote:

> @Sid, perhaps defining a cool-off window before a scheduler change can be
> committed. That way, everyone that cares can have a look at it? Also,
> having more than one +1 seems OK with me for scheduler changes. We will
> have to decide what "scheduler change" means, though.
>
> On Thu, May 12, 2016 at 1:39 PM, Jakob Homan  wrote:
>
> > Hey Sid-
> >Thanks for the discussion.  It's a good chance to the new
> > contributors to get more experience with the ASF.
> >
> >Unfortunately, what you propose is not possible in ASF.  As a
> > meritocracy, ASF does not recognize individual's employers (or lack
> > thereof).  Merit is earned by the individual and follows them as they
> > move from organization to organization.  This is true even for
> > podlings.  Employees of certain organizations are not given extra
> > power over a project or vote due to their relationship with the
> > employer.
> >
> >ASF does recognize that at times people will be representing their
> > employer (with my $EMPLOYER hat on, is a common way of expressing
> > this), but expects that everyone is acting in the best interest of the
> > project.
> >
> > -Jakob
> >
> > On 12 May 2016 at 12:58, Siddharth Anand  wrote:
> > > Hi Folks!As many of you know, Apache Airflow (incubating) came from
> > Airbnb, where it currently still represents the largest Airflow
> deployment.
> > Airflow entered the Apache Incubator shortly over a month ago but still
> > depends on Airbnb's production deployment to vet its release candidates.
> As
> > Airflow's adoption increases, we expect to leverage multiple companies in
> > conjunction with Apache Infra resources to vet some of the more
> performance
> > critical pieces of the code base (e.g. scheduler). We're not there yet.
> > > So, for future commits and PRs involving the scheduler (and possibly
> > other components, e.g. executors), I propose a 2 vote system : at least 1
> > vote from an Airbnb committer and at least 1 vote from a non-Airbnb
> > committer, separate from the PR author. This will more readily stabilize
> > the Airbnb production system that we rely on to vet and cut releases,
> > speeding up our release cycle.
> > > Please share your thoughts on the matter along with a vote for/against.
> > > -s
> >
>


Re: Blocker Bug on Latest airbnb_1.7.1 Release Candidate

2016-05-12 Thread Dan Davydov
Awesome, thanks for fixing! Will tentatively stage tomorrow and release on
Monday.

On Thu, May 12, 2016 at 7:04 PM, Siddharth Anand  wrote:

> Dan, You may proceed. I have merged the change to master.-s
>
> On Thursday, May 12, 2016 6:27 AM, Dan Davydov
>  wrote:
>
>
>  Thanks for finding this Sid, will wait on this before staging.
> On May 11, 2016 8:04 PM, "Siddharth Anand"
> 
> wrote:
>
> > Hi Folks!There is a bug on current Master that might be related to a
> > commit on April 4. I need to confirm with the committer. If so, then we
> > have another blocker bug on the release candidate.
> > https://issues.apache.org/jira/browse/AIRFLOW-106
> >
> > In a nutshell:Task retries, on_failure callback, and email_on_failure are
> > not honored if the first task in a DAG fails. The bug presents itself
> when
> > task retries are enabled and the first task in a DAG run fails one time.
> If
> > this happens, then retries are not honored on the task that only failed
> one
> > time. The task will be left in a permanent "UP_FOR_RETRY" state. As a
> > result, failure hooks will not be executed for that failed task including
> > email notification that the task failed.
> > -s
>
>
>
>


Re: Scheduler problems in 1.7?

2016-05-19 Thread Dan Davydov
We have two staging clusters at the moment:
1. Cluster with:

   - Canary DAG (sanity check that tasks can run end-to-end)
   - Synthetic DAGs (test a couple of operators)

2. Cluster with:

   - Our production DAGs
   - Has webserver/scheduler but no workers (so nothing is actually run).
   At some point soon we will add what Max suggested (replacing real tasks
   with dummy tasks) and then add workers to this cluster as well


On Thu, May 19, 2016 at 10:17 AM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> @dan can provide more details but I think our staging is running off of our
> production DAGS_FOLDER, but we swap all tasks with dummy tasks
> (BashOperator with a dummy command) in our policy function.
>
> http://pythonhosted.org/airflow/concepts.html?highlight=policy#cluster-policy
>
> On Thu, May 19, 2016 at 9:34 AM, Chris Riccomini 
> wrote:
>
> > Hey Max,
> >
> > I think we would like to set up some DAG testing as well. Are you guys
> > using synthetic DAGs, or running your real DAGs on a separate cluster?
> >
> > Cheers,
> > Chris
> >
> > On Wed, May 18, 2016 at 1:32 PM, Lance Norskog 
> > wrote:
> >
> > > Ok, we'll update to 1.7.1 when y'all think it's fine.
> > >
> > > Thanks,
> > >
> > > Lance Norskog
> > >
> > > On Wed, May 18, 2016 at 12:01 PM, Bolke de Bruin 
> > > wrote:
> > >
> > > > Hey Max,
> > > >
> > > > Fair point. I’ll make sure that for the next release we jump a bit
> > > earlier
> > > > on board. We do run integration tests continuously,
> > > > but only from next month we will reach a certain level of complexity
> we
> > > > really will need to start pre-testing releases.
> > > >
> > > > - Bolke
> > > >
> > > > > Op 18 mei 2016, om 17:49 heeft Maxime Beauchemin <
> > > > maximebeauche...@gmail.com> het volgende geschreven:
> > > > >
> > > > > There's an RC out that is currently in production at Airbnb (as of
> > > > Monday)
> > > > > if you want to help us make sure the next version is up fully baked
> > for
> > > > > release. For now Airbnb is carrying most of the risk around
> deploying
> > > new
> > > > > code in production first. Knowing that we don't use all features
> and
> > > > > therefore wouldn't catch all possible regressions, it would be nice
> > to
> > > > have
> > > > > more companies pushing RCs in production along with us.
> > > > >
> > > > > Here's the git tag for the RC:
> > > > >
> > >
> https://github.com/apache/incubator-airflow/releases/tag/airbnb_1.7.1rc6
> > > > >
> > > > > Max
> > > > >
> > > > > On Tue, May 17, 2016 at 11:01 PM, Bolke de Bruin <
> bdbr...@gmail.com>
> > > > wrote:
> > > > >
> > > > >> 1.7.1 that most likely will be out at the end of the week,
> hopefully
> > > > fixes
> > > > >> this indeed. Don't stay on 1.7.0 for too long 1.7.1 contains many
> > > > stability
> > > > >> fixes.
> > > > >>
> > > > >> Verstuurd vanaf mijn iPad
> > > > >>
> > > > >>> Op 18 mei 2016 om 03:06 heeft Lance Norskog <
> > lance.nors...@gmail.com
> > > >
> > > > >> het volgende geschreven:
> > > > >>>
> > > > >>> Has the "long-running scheduler hang" problem been solved yet?
> > > > >>> We just upgraded from 1.6.2 to 1.7.0 and we think it just
> happened,
> > > but
> > > > >>> don't know.
> > > > >>> Should we use chronic scheduler restarts?
> > > > >>>
> > > > >>> Thanks,
> > > > >>>
> > > > >>> --
> > > > >>> Lance Norskog
> > > > >>> lance.nors...@gmail.com
> > > > >>> Redwood City, CA
> > > > >>
> > > >
> > > >
> > >
> > >
> > > --
> > > Lance Norskog
> > > lance.nors...@gmail.com
> > > Redwood City, CA
> > >
> >
>


Re: [1/3] incubator-airflow git commit: use targetPartitionSize as the default partition spec

2016-05-23 Thread Dan Davydov
Yep sorry will check the versions in the future. My own commits have JIRA
labels but I haven't validated that other users have done this for theirs
when I merge their commits (as the LGTM is delegated to either another
committer or the owner of a particular operator). Will be more vigilant in
the future.

On Mon, May 23, 2016 at 5:07 PM, Chris Riccomini 
wrote:

> Hey Dan,
>
> Could you please file JIRAs, and put the JIRA name as the prefix to your
> commits?
>
> Cheers,
> Chris
>
> On Mon, May 23, 2016 at 5:01 PM,  wrote:
>
>> Repository: incubator-airflow
>> Updated Branches:
>>   refs/heads/airbnb_rb1.7.1_4 1d0d8681d -> 6f7ea90ae
>>
>>
>> use targetPartitionSize as the default partition spec
>>
>>
>> Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
>> Commit:
>> http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/b58b5e09
>> Tree:
>> http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/b58b5e09
>> Diff:
>> http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/b58b5e09
>>
>> Branch: refs/heads/airbnb_rb1.7.1_4
>> Commit: b58b5e09578d8a0df17b4de12fe3b49792e9feda
>> Parents: 1d0d868
>> Author: Hongbo Zeng 
>> Authored: Sat May 14 17:00:42 2016 -0700
>> Committer: Dan Davydov 
>> Committed: Mon May 23 16:59:52 2016 -0700
>>
>> --
>>  airflow/hooks/druid_hook.py| 23 ---
>>  airflow/operators/hive_to_druid.py |  8 +---
>>  2 files changed, 21 insertions(+), 10 deletions(-)
>> --
>>
>>
>>
>> http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/b58b5e09/airflow/hooks/druid_hook.py
>> --
>> diff --git a/airflow/hooks/druid_hook.py b/airflow/hooks/druid_hook.py
>> index b6cb231..7c80c7c 100644
>> --- a/airflow/hooks/druid_hook.py
>> +++ b/airflow/hooks/druid_hook.py
>> @@ -10,7 +10,7 @@ from airflow.hooks.base_hook import BaseHook
>>  from airflow.exceptions import AirflowException
>>
>>  LOAD_CHECK_INTERVAL = 5
>> -
>> +TARGET_PARTITION_SIZE = 500
>>
>>  class AirflowDruidLoadException(AirflowException):
>>  pass
>> @@ -52,13 +52,22 @@ class DruidHook(BaseHook):
>>
>>  def construct_ingest_query(
>>  self, datasource, static_path, ts_dim, columns, metric_spec,
>> -intervals, num_shards, hadoop_dependency_coordinates=None):
>> +intervals, num_shards, target_partition_size,
>> hadoop_dependency_coordinates=None):
>>  """
>>  Builds an ingest query for an HDFS TSV load.
>>
>>  :param datasource: target datasource in druid
>>  :param columns: list of all columns in the TSV, in the right
>> order
>>  """
>> +
>> +# backward compatibilty for num_shards, but
>> target_partition_size is the default setting
>> +# and overwrites the num_shards
>> +if target_partition_size == -1:
>> +if num_shards == -1:
>> +target_partition_size = TARGET_PARTITION_SIZE
>> +else:
>> +num_shards = -1
>> +
>>  metric_names = [
>>  m['fieldName'] for m in metric_spec if m['type'] != 'count']
>>  dimensions = [c for c in columns if c not in metric_names and c
>> != ts_dim]
>> @@ -100,7 +109,7 @@ class DruidHook(BaseHook):
>>  },
>>  "partitionsSpec" : {
>>  "type" : "hashed",
>> -"targetPartitionSize" : -1,
>> +"targetPartitionSize" : target_partition_size,
>>  "numShards" : num_shards,
>>  },
>>  },
>> @@ -121,10 +130,10 @@ class DruidHook(BaseHook):
>>
>>  def send_ingest_query(
>>  self, datasource, static_path, ts_dim, columns, metric_spec,
>> -intervals, num_shards, hadoop_dependency_coordinates=None):
>> +intervals, num_shards, target_partition_size,
>> hadoop_dependency_coordinates=None):
>>  query = self.construct_ingest_query(
>>  datasource, static_path, ts_dim, columns,
>> -metric_spec, intervals, n

Re: [VOTE] Review-then-commit (RTC) vs Commit-then-review (CTR)

2016-05-24 Thread Dan Davydov
+1 binding

On Tue, May 24, 2016 at 1:15 PM, Arthur Wiedmer  wrote:

> +1 binding
>
> On Tue, May 24, 2016 at 1:10 PM, siddharth anand 
> wrote:
>
> > +1 (binding) : FYI, Chris was binding as well.
> >
> > -s
> >
> > On Tue, May 24, 2016 at 3:00 PM, Chris Riccomini 
> > wrote:
> >
> > > +1
> > >
> > > On Tue, May 24, 2016 at 12:50 PM, siddharth anand 
> > > wrote:
> > >
> > > > Hi Folks!
> > > > To formalize the RTC vs CTR policy going forward for committing, I'm
> > > > putting it to a vote.
> > > >
> > > > "Commits need a +1 vote from a committer who is not the author with
> the
> > > > exception of minor commits"
> > > >
> > > >
> > > > Anyone can vote, but committer/PPMC votes are binding.
> > > >
> > > >
> > > >
> > >
> >
> http://www.apache.org/foundation/voting.html#expressing-votes-1-0-1-and-fractions
> > > >
> > > > Please respond with -1, 0, +1 or the fractions listed in the link
> > above.
> > > If
> > > > you are a committer, then please respond "*+/-[0-1] (binding)*",
> > > > otherwise "*+/-[0-1]
> > > > (non-binding)*"
> > > >
> > > >
> > > > Voting will be open till Friday 9a PT.
> > > >
> > > >
> > > > -s
> > > >
> > >
> >
>


Re: Speeding up the scheduler - request for comments

2016-06-03 Thread Dan Davydov
Scheduler loop times are definitely a concern (at least for Airbnb), and +1
for option 2 as well if it can be implemented correctly. What is important
for me is that we should always be able to easily tell which of the
dependencies are met and which aren't in the event based model.

On Fri, Jun 3, 2016 at 5:53 PM, Chris Riccomini 
wrote:

> Hey Bolke,
>
> > Are scheduler loop times a concern at all?
>
> Yes, I strongly believe that they are. Especially as we add more
> DAGs/tasks.
>
> I am not a fan of (1). Caching is just going to create cache consistency
> issues, and be really annoying to manage, IMO.
>
> I agree that (2) seems more appealing. I can't comment on the feasibility
> of it, as I'm not well acquainted enough with the scheduler yet.
>
> Cheers,
> Chris
>
> On Fri, Jun 3, 2016 at 2:26 PM, Bolke de Bruin  wrote:
>
> > Hi,
> >
> > I am looking at speeding up the scheduler. Currently loop times increase
> > with the amount of tasks in a dag. This is due to
> > TaskInstance.are_depedencies_met executing several aggregation functions
> on
> > the database. These calls are expensive: between 0.05-0.15s per task and
> > for every scheduler loop this gets called twice. This call is where the
> > scheduler spends around 90% of its time when evaluating dags and is the
> > reason for people that have a large amount of tasks per dag to so quite
> > large loop times (north of 600s).
> >
> > I see 2 options to optimize the loop without going to a multiprocessing
> > approach which will just put the problem down the line (ie. the db or
> when
> > you don’t have enough cores anymore).
> >
> > 1. Cache the call to TI.are_dependencies_met by either caching in a
> > something like memcache or removing the need for the double call
> > (update_state and process_dag both make the call to
> > TI.are_dependencies_met). This would more or less cut the time in half.
> >
> > 2. Notify the downstream tasks of a state change of a upstream task. This
> > would remove the need for the aggregation as the task would just ‘know’.
> It
> > is a bit harder to implement correctly as you need to make sure you keep
> > being in a consistent state. Obviously you could still run a integrity
> > check once in a while. This option would make the aggregation event based
> > and significantly reduce the time spend here to around 1-5% of the
> current
> > scheduler. There is a slight overhead added at a state change of the
> > TaskInstance (managed by the TaskInstance itself).
> >
> > What do you think? My preferred option is #2. Am i missing any other
> > options? Are scheduler loop times a concern at all?
> >
> > Thanks
> > Bolke
> >
> >
> >
>


Re: When to use pools?

2016-06-20 Thread Dan Davydov
At the moment by default backfill does not use a pool but you can specify
one with --pool.

On Mon, Jun 20, 2016 at 9:02 PM, Chris Riccomini 
wrote:

> Hey Harish,
>
> One thing that I'm not clear on is whether backfill even honors pools at
> all. I believe backfill currently starts its own scheduler outside of the
> main scheduler process. As a result, I think the pools are completely
> disregarded. Bolke/Jeremiah/Paul can correct me if I'm wrong.
>
> Cheers,
> Chris
>
> On Mon, Jun 20, 2016 at 7:46 PM, Lance Norskog 
> wrote:
>
> > One reason to use Pools is because you have tasks in different DAGs that
> > all use the same resource, like a database. A Pool lets you say, "I will
> > send no more than 3 requests to this database at once". However, there
> are
> > bugs in the scheduler and it is possible to have many active tasks
> > overscheduled against a pool.
> >
> > You can create a pool in the Admin->Pools drop-down. You don't need a
> > script.
> >
> > On Mon, Jun 20, 2016 at 2:46 PM, harish singh 
> > wrote:
> >
> > > Hi,
> > >
> > > We have been using airflow for few 3 months now.
> > >
> > > One pain I felt was, during backfill if I have 2 tasks t1 and t2 - with
> > t1
> > > having depends_on_past=true,
> > >   t0 -> t1
> > >   t0 -> t2
> > >
> > > I find that the task t2 with no past dependency keeps getting
> scheduled.
> > > This causes the task t1 to wait for a long time before it gets
> scheduled.
> > >
> > > I think this is a good use case for creating "pools" and allocate slots
> > for
> > > each pool.
> > > Also, I will have to use priority_weights.  And adjust parallelism!!!
> > >
> > > Is there a better way to handle this?
> > >
> > >
> > > Also, in general, are there any examples on how to use pools?
> > >
> > > I peeked into* airflow/tests/operators/subdag_operator.py *and found
> the
> > > below snippet:
> > >
> > > session = airflow.settings.Session()
> > > pool_1 = airflow.models.Pool(pool='test_pool_1', slots=1)
> > > session.add(pool_1)
> > > session.commit()
> > >
> > > Why do we need Session instance? Do we need to run the below code
> before
> > > creating a pool in code (inside my pipeline.py under dags/ directory):
> > >
> > > *pool = (
> > > session.query(Pool)
> > > .filter(Pool.pool == 'AIRFLOW-205')
> > > .first())
> > > if not pool:
> > > session.add(Pool(pool='AIRFLOW-205', slots=8))
> > > session.commit()*
> > >
> > >
> > > Also, I saw few places where pool: 'backfill'  is used?
> > >
> > > Is 'backfill' a special pre-defined pool?
> > >
> > >
> > > If not, how do we create different types of pools based on whether it
> > > is backfill or not?
> > >
> > >
> > > All this is being done in pipeline.py script under 'dags/' directory.
> > >
> > >
> > > Thanks,
> > > Harish
> > >
> >
> >
> >
> > --
> > Lance Norskog
> > lance.nors...@gmail.com
> > Redwood City, CA
> >
>


Re: When to use pools?

2016-06-21 Thread Dan Davydov
Assuming you are using local/sequential executors, the backfill pool would
be used.

On Mon, Jun 20, 2016 at 10:47 PM, harish singh 
wrote:

> hmm.. Thanks Lance. I mentioned about pool for 'backfill' is because I saw
> that being a part 'default_args' airflow example.
>
> Chris/Dan/Bolke/Jeremiah/Paul/all :) :
> So suppose I create two pools:  'poo1' and 'pool2'
> and use it for tasks t1 and t2. Now say I also create a pool call
> 'backfill' but not use it in any of the tasks inside my DAG.
>
> Whenever I run the backfill for my dag with  ```--pool backfill```,
> will the scheduler use the slots from this backfill pool or will the tasks
> use pool1 and poo2?
>
>
>
> On Mon, Jun 20, 2016 at 9:20 PM, Dan Davydov
>  > wrote:
>
> > At the moment by default backfill does not use a pool but you can specify
> > one with --pool.
> >
> > On Mon, Jun 20, 2016 at 9:02 PM, Chris Riccomini 
> > wrote:
> >
> > > Hey Harish,
> > >
> > > One thing that I'm not clear on is whether backfill even honors pools
> at
> > > all. I believe backfill currently starts its own scheduler outside of
> the
> > > main scheduler process. As a result, I think the pools are completely
> > > disregarded. Bolke/Jeremiah/Paul can correct me if I'm wrong.
> > >
> > > Cheers,
> > > Chris
> > >
> > > On Mon, Jun 20, 2016 at 7:46 PM, Lance Norskog <
> lance.nors...@gmail.com>
> > > wrote:
> > >
> > > > One reason to use Pools is because you have tasks in different DAGs
> > that
> > > > all use the same resource, like a database. A Pool lets you say, "I
> > will
> > > > send no more than 3 requests to this database at once". However,
> there
> > > are
> > > > bugs in the scheduler and it is possible to have many active tasks
> > > > overscheduled against a pool.
> > > >
> > > > You can create a pool in the Admin->Pools drop-down. You don't need a
> > > > script.
> > > >
> > > > On Mon, Jun 20, 2016 at 2:46 PM, harish singh <
> > harish.sing...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We have been using airflow for few 3 months now.
> > > > >
> > > > > One pain I felt was, during backfill if I have 2 tasks t1 and t2 -
> > with
> > > > t1
> > > > > having depends_on_past=true,
> > > > >   t0 -> t1
> > > > >   t0 -> t2
> > > > >
> > > > > I find that the task t2 with no past dependency keeps getting
> > > scheduled.
> > > > > This causes the task t1 to wait for a long time before it gets
> > > scheduled.
> > > > >
> > > > > I think this is a good use case for creating "pools" and allocate
> > slots
> > > > for
> > > > > each pool.
> > > > > Also, I will have to use priority_weights.  And adjust
> parallelism!!!
> > > > >
> > > > > Is there a better way to handle this?
> > > > >
> > > > >
> > > > > Also, in general, are there any examples on how to use pools?
> > > > >
> > > > > I peeked into* airflow/tests/operators/subdag_operator.py *and
> found
> > > the
> > > > > below snippet:
> > > > >
> > > > > session = airflow.settings.Session()
> > > > > pool_1 = airflow.models.Pool(pool='test_pool_1', slots=1)
> > > > > session.add(pool_1)
> > > > > session.commit()
> > > > >
> > > > > Why do we need Session instance? Do we need to run the below code
> > > before
> > > > > creating a pool in code (inside my pipeline.py under dags/
> > directory):
> > > > >
> > > > > *pool = (
> > > > > session.query(Pool)
> > > > > .filter(Pool.pool == 'AIRFLOW-205')
> > > > > .first())
> > > > > if not pool:
> > > > > session.add(Pool(pool='AIRFLOW-205', slots=8))
> > > > > session.commit()*
> > > > >
> > > > >
> > > > > Also, I saw few places where pool: 'backfill'  is used?
> > > > >
> > > > > Is 'backfill' a special pre-defined pool?
> > > > >
> > > > >
> > > > > If not, how do we create different types of pools based on whether
> it
> > > > > is backfill or not?
> > > > >
> > > > >
> > > > > All this is being done in pipeline.py script under 'dags/'
> directory.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Harish
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Lance Norskog
> > > > lance.nors...@gmail.com
> > > > Redwood City, CA
> > > >
> > >
> >
>


Re: Aiming for an Apache release 1st week of September?

2016-07-08 Thread Dan Davydov
+1 the minimal apache cherrypick release makes sense to me.

On Fri, Jul 8, 2016 at 1:14 PM, Chris Riccomini 
wrote:

> Hey Bolke,
>
> A fast release with the 1.7.1.3 + cherry picks listed above sounds like the
> way to go. Then, a second release in sept where we just cut from master.
>
> I'm +1 on this. Lets us get our Apache ducks in a row without worrying
> about stabilizing everything simultaneously.
>
> Cheers,
> Chris
>
> On Fri, Jul 8, 2016 at 12:32 AM, Bolke de Bruin  wrote:
>
> > This was my assessment as well, thus I agree. My suggestion is to start
> > the process and see if we get questions about this that require us the
> > change our point of view.
> >
> > If we do an earlier release I would like to aim for July 19, but that
> > might be a bit short notice. If needed I can put myself up as release
> > manager till the 21st. If we do 1.7.1.3 + cherry picks I would say
> >
> > * Licenses
> > * Notices
> > * Disclaimer
> > * Highcharts -> d3
> >
> > Makes sense? Anything missing here?
> >
> > - Bolke
> >
> > > Op 7 jul. 2016, om 18:04 heeft Chris Riccomini 
> > het volgende geschreven:
> > >
> > >> but it's acceptable to have a soft dependency on an LGPL component,
> such
> > > that a user could deploy the LGPL component separately to enable
> > additional
> > > optional features
> > >
> > > This is precisely what I believe is going on with Airflow. It's under
> an
> > > airflow[postgres] package (so `pip install airflow` doesn't even
> install
> > > it). We went through a very similar exercise with Samza, where we had a
> > > dependency on Paramiko (also LGPL [1]), and our (LinkedIn) lawyers
> talked
> > > to Apache, and agreed that it was fine.
> > >
> > > [1] https://github.com/paramiko/paramiko/blob/master/LICENSE
> > >
> > > On Wed, Jul 6, 2016 at 10:16 PM, Chris Nauroth <
> cnaur...@hortonworks.com
> > >
> > > wrote:
> > >
> > >> Here are more details on Apache release requirements:
> > >>
> > >> http://www.apache.org/dev/release-publishing.html
> > >>
> > >>
> > >> http://www.apache.org/dev/release
> > >>
> > >>
> > >> To summarize, it's much more focused on compliance with licensing,
> > signing
> > >> and Apache infrastructure requirements.  That's the kind of scrutiny
> > that
> > >> a release candidate will get from the Incubator PMC rather than deep
> > >> testing for verification of new features or bug fixes.
> > >>
> > >> For that reason, I think it makes sense for a podling's first Apache
> > >> release to focus on nothing but those ASF policy requirements.  It's
> > >> completely normal for a podling's early release candidates to have a
> few
> > >> false starts that get voted down, because the policies are complex the
> > >> first time around.  Some projects have found it helpful to write a
> "How
> > to
> > >> Release" web page during the first release, so that they have
> > step-by-step
> > >> notes to follow during subsequent releases.  Focusing on "latest
> stable"
> > >> with a few additional patches sounds like a great plan to me, because
> it
> > >> decouples the challenges of your first ASF release from other software
> > >> development pressures, such as pressure from a user base to ship a new
> > >> feature quickly.
> > >>
> > >> Regarding the LGPL question, in general, the answer is that we are
> > >> prohibited from redistributing any LGPL component, but it's acceptable
> > to
> > >> have a soft dependency on an LGPL component, such that a user could
> > deploy
> > >> the LGPL component separately to enable additional optional features.
> > >> More details are here:
> > >>
> > >> http://www.apache.org/legal/resolved.html#prohibited
> > >>
> > >>
> > >> A specific example of this is Apache Hadoop's integration with LZO
> > >> compression, which uses a GPL license.  Hadoop does not redistribute
> LZO
> > >> or include any code that is tightly coupled to it, but the Hadoop
> > codebase
> > >> does have a notion of a pluggable CompressionCodec, with
> implementations
> > >> of the interface discoverable at runtime.  This setup supports users
> > >> downloading and installing a separate LZO integration library onto
> > >> Hadoop's classpath.
> > >>
> > >> --Chris Nauroth
> > >>
> > >>
> > >>
> > >>
> > >> On 7/6/16, 9:36 PM, "Maxime Beauchemin" 
> > >> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> This sounds very reasonable to me, though we may be able to do an
> > earlier
> > >>> release as a practice run for an Apache release with a snapshot of
> our
> > >>> production which would consists of the latest release plus a set of
> > cherry
> > >>> picked PRs.
> > >>>
> > >>> How does an Apache release differ from a standard release again?
> > >>>
> > >>> Max
> > >>>
> > >>> On Wed, Jul 6, 2016 at 7:59 PM, Chris Riccomini <
> criccom...@apache.org
> > >
> > >>> wrote:
> > >>>
> >  One other thing to note is that I'm planning to run the RCs in all
> of
> >  our
> >  environments to exercise things. We should make sure that we're all
> >  committed (as well as the

Re: Specifying memory limit for task

2016-08-05 Thread Dan Davydov
Note that on master (but not in the latest release), you can already
specify resource constraints for tasks. They are not consumed anywhere in
airflow itself yet, but you can use them in an operator if it fits your use
case:
https://github.com/apache/incubator-airflow/pull/1669

On Fri, Aug 5, 2016 at 4:07 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> We're adding cgroups semantics to Airflow's BaseOperator, along with
> integration to activate those settings when firing up the task instance as
> a subprocess. Paul Yang (@Airbnb) is working on a prototype / design doc
> and should be able to share more shortly.
>
> It lines up nicely with the recent work around unix impersonation as they
> are all new optional BaseOperator params that wraps the subprocess. There
> features will require Airflow to run with elevated privileges, which
> Airflow will "downstep" from as it runs the tasks instances as
> subprocesses.
>
> Max
>
> On Fri, Aug 5, 2016 at 11:03 AM, Lance Norskog 
> wrote:
>
> > Linux cgroups gives Docker the ability to control memory use inside a
> > container.
> >
> > https://www.cloudsigma.com/manage-docker-resources-with-cgroups/
> >
> > Here is an example that allocates the CPU:
> > http://blog.viktorpetersson.com/post/115562026784/using-
> > cgroups-with-docker-on-ubuntu-1404
> >
> > (I have not worked with cgroups.)
> >
> > On Fri, Aug 5, 2016 at 7:57 AM, wood stock  wrote:
> >
> > > I think you can define a new operator which take an existing operator
> > base
> > > class and keep monitoring the memory usage (maybe base on airflow
> > > heartbeat?).
> > >
> > > On Thu, Aug 4, 2016 at 8:23 PM, Adinata  wrote:
> > >
> > > > Is there any way for task instances to limit the memory task
> execution?
> > > >
> > > > I run the worker inside docker container, hence when it uses high
> > memory,
> > > > the container was killed.
> > > > Airflow then detect it as zombie (well the email notification failure
> > > told
> > > > me that). It would be great if
> > > > I know it was killed because of oom, so I know what to fix after
> > instead
> > > of
> > > > wondering first why it was
> > > > becoming a zombie.
> > > >
> > > > Thanks
> > > > --
> > > > *Adinata*
> > > > Engineer - UrbanIndo.com
> > > >
> > >
> >
> >
> >
> > --
> > Lance Norskog
> > lance.nors...@gmail.com
> > Redwood City, CA
> >
>


Re: Shout out!

2016-08-12 Thread Dan Davydov
+1

On Aug 12, 2016 8:02 AM, "Chris Riccomini"  wrote:

> Same. It's awesome.
>
> On Thu, Aug 11, 2016 at 7:28 PM, siddharth anand 
> wrote:
> > FYI!
> > Just wanted to give a special shout-out for jlowin for writing a great
> > merge tool for committers. Thx to this tool, merging your PR is super
> easy.
> >
> > -s
>


Re: Subsequent Airflow Meetup: 2017/01/11

2016-11-16 Thread Dan Davydov
Based on chatting with a couple of people today at the Airflow meet-up I
think there has been some demand for an airflow operations talk,
specifically around monitoring/alerting. If there is still room I can give
a talk about this, let me know George.

On Thu, Nov 10, 2016 at 10:17 AM, siddharth anand  wrote:

> Kevin,
> Here's a link to the 1Q17 meet-up.
> https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/events/
> 235259523/
>
> Both upcoming meet-ups (next week at WePay and 1Q17 at Clover Health) can
> be found on http://www.meetup.com/Bay-Area-Apache-Airflow-
> Incubating-Meetup/
>
> -s
>
>
> On Wed, Nov 9, 2016 at 4:24 PM, Kevin Mandich 
> wrote:
>
> > Hi George,
> >
> > If there is still room, I'd like to give a talk about how we use Airflow
> at
> > my company, Agari. We are a data company that is working to eliminate
> > inbound, targeted e-mail attacks to our customers (spear-phishing). I am
> > currently working as a data scientist who is also responsible for
> shipping
> > my work to production.
> >
> > We currently use Airflow to build models from our telemetry data which
> are
> > then used for scoring in our near-real-time pipeline. I'd like to talk
> > about some of the DAGs we've set up to do this.
> >
> > Please let me know if this sounds reasonable. Thank you,
> >
> > Kevin Mandich
> > Agari Data, Inc.
> >
> >
> > On Mon, Oct 31, 2016 at 11:27 PM, George Leslie-Waksman <
> > geo...@cloverhealth.com.invalid> wrote:
> >
> > > I know it's a bit far in advance, but to make sure there's space (and
> > food
> > > and drink), I've scheduled and booked the subsequent meetup for January
> > > 11th at Clover Health in SF.
> > >
> > > If anyone wants to volunteer to talk, let me know, otherwise I'll
> > probably
> > > start bugging folks sometime after Thanksgiving and before the December
> > > holidays.
> > >
> > > --George Leslie-Waksman
> > >
> >
>


Re: Merging the experimental API Framework

2016-11-28 Thread Dan Davydov
Just wanted to say this is very exciting, thank you Bolke :).

On Mon, Nov 28, 2016 at 10:50 AM, Bolke de Bruin  wrote:

> All,
>
> After a few weeks of work I have finalized the implementation of a Rest
> API Framework. Out of the box it supports Kerberos authentication, which is
> now fully end to end tested on Travis’ with a working KDC. You can also
> switch the CLI to use the API endpoints when available. Currently, only the
> “trigger_dag” functionality is available this way, but I hope others to
> pick up and create new endpoints that the CLI can then use.
>
> For Contributors:
>
> In case you are implementing new functionality in the CLI please make sure
> to implement the actual functionality in api/common/…/
> and expose it through api_client (abstract), json_client (JSON),
> local_client (direct). Endpoints are defined in www/api/experimental.
>
> Direct exposure in cli.py I would consider deprecated and I would prefer
> to deny it from now on. Hopefully, this gives us a gradual path to improved
> integration and improved security while maintaining backwards
> compatibility. Also note that the APIs are still marked experimental and
> are subject to change.
>
> Next steps:
> - Swagger definitions (http://swagger.io)
> - Research possible integration between different authentication backends
> - Use “airflow api” instead of “airflow webserver” to separate concerns
> - Remove all direct DB access from cli.py
> - Improve documentation
> - Design API graduation roadmap (when is something not experimental
> anymore)
>
> Feedback obviously appreciated.
>
> Bolke
>
>
>


Re: Integration test env

2016-12-14 Thread Dan Davydov
This is extremely generous of you! I do agree with the approach of trying
to get funding from Apache and having shared resources (e.g. so that we
don't depend on any one company or individual for the uptime of the
integration environment, plus so we would have public cloud integration
potentially).

On Wed, Dec 14, 2016 at 1:22 AM, Bolke de Bruin  wrote:

> Hi,
>
> I have been thinking about an integration test environment. Aside from any
> technical requirements we need a place to do it. I am willing to offer a
> place in Lab env I am running or to fund an environment in AWS/GCloud if
> Apache cannot make these kind of resources available.
>
> If running in our Lab there is virtually no restriction what we could do,
> however I will hand select people who have access to this environment. I
> will also hold ultimate power to remove access from anyone. I even might
> ask for a confirmation that you will behave when using our property (don’t
> worry won’t cover it with legal wording). This is a IAAS service so we need
> to cover the things we need ourselves, but the upside is we can and it is
> free. We could setup a Gitlab instance that mirrors from Apache a kicks off
> runners to do testing. Downside 1) it might not be entirely Apache like.
> Sorry cant help that. 2) there is no guaranteed up time 3) I might need to
> remove it in the future e.g. when I change jobs for example :). 4) No
> public cloud integration, it’s a private stack after all.
>
> I can also fund on AWS/GCloud. Again, I probably want to have ultimate
> power on access to this environment - it’s my company’s money on the line
> after all. Major downside to this is that it is dependent on and limited by
> the budget I can make available. Upside is that it is not company property.
> Also I personally have less exposure to public cloud environments due to
> company restrictions.
>
> Are there any other options? Any thoughts?
>
> Bolke
>
>
>
>
>
>


Re: Airflow 1.8.0 Alpha 1

2017-01-03 Thread Dan Davydov
I have also started on this effort, recently Alex Guziel and I have been
pushing Airbnb's custom cherries onto master to get Airbnb back onto master
in order for us to do a release.

I think it might make sense to wait for these two commits to get merged in
since they would be quite nice to have for all Airflow users and seem like
they will be merged soon:
Schedule all pending DAG runs in a single scheduler loop -
https://github.com/apache/incubator-airflow/pull/1906
Add Support for dag.backfill=(True|False) Option -
https://github.com/apache/incubator-airflow/pull/1830
Impersonation Support + Cgroups - https://github.com/apache/
incubator-airflow/pull/1934 (this is kind of important from the Airbnb side
so that we can help test the new master without having to cherrypick this
PR on top of it which would make the testing unreliable for others).

If there are PRs that affect the core of Airflow that other committers
think are important to merge we could include these too. I can commit to
pushing out the Impersonation/Cgroups PR this week pending PR comments.
What do you think Bolke?

On Tue, Jan 3, 2017 at 4:26 AM, Bolke de Bruin  wrote:

> Hey Alex,
>
> I have noticed the same, and it is also the reason why we have Alpha
> versions. For now I have noticed the following:
>
> * Tasks can get in limbo between scheduler and executor:
> https://github.com/apache/incubator-airflow/pull/1948 <
> https://github.com/apache/incubator-airflow/pull/1948>
> * Try_number not increased due to reset in LocalTaskJob:
> https://github.com/apache/incubator-airflow/pull/1969 <
> https://github.com/apache/incubator-airflow/pull/1969>
> * one_failed trigger not executed
>
> My idea is to move to a Samba style of releases eventually, but for now I
> would like to get master into a state that we understand and therefore not
> accept any patches that do not address any bugs.
>
> If you (or anyone else) can review the above PRs and add your own as well
> then I can create another Alpha version. I’ll be on gitter as much as I can
> so we can speed up if needed.
>
> - Bolke
>
> > On 3 Jan 2017, at 08:51, Alex Van Boxel  wrote:
> >
> > Hey Bolke,
> >
> > thanks for getting this moving. But I already have some blockers, since I
> > moved up master to this release (moved from end November to now)
> stability
> > has gone down (certainly on Celary). I'm trying to identify the core
> > problems and see if I can fix them.
> >
> > On Sat, Dec 31, 2016 at 9:52 PM Bolke de Bruin  > wrote:
> >
> > Dear All,
> >
> > On the verge of the New Year, I decided to be a little bit cheeky and to
> > make available an Airflow 1.8.0 Alpha 1. We have been talking about it
> for
> > a long time now and by doing this I wanted bootstrap the process. It
> should
> > by no means be considered an Apache release yet. This is for testing
> > purposes in the dev community around Airflow, nothing else.
> >
> > The build is exactly the same as the state of master (git 410736d) plus
> the
> > change to version “1.8.0.alpha1” in version.py.
> >
> > I am dedicating quite some time next week and beyond to get a release
> out.
> > Hopefully we can get some help with testing, changelog etc. To make this
> > possible I would like to propose a freeze to adding new features for at
> > least two weeks - say until Jan 15.
> >
> > You can find the tar here: http://people.apache.org/~bolke/ <
> > http://people.apache.org/~bolke/ > .
> It isn’t signed. Following versions
> > will be. SHA is available.
> >
> > Lastly, Alpha 1 does not have the fix for retries yet. So we will get an
> > Alpha 2 :-). @Max / @Dan / @Paul: a potential fix is in
> > https://github.com/apache/incubator-airflow/pull/1948 <
> https://github.com/apache/incubator-airflow/pull/1948> <
> > https://github.com/apache/incubator-airflow/pull/1948 <
> https://github.com/apache/incubator-airflow/pull/1948>> , but your
> feedback
> > is required as it is entrenched in new processing code that you are
> running
> > in production afaik - so I wonder what happens in your fork.
> >
> > Happy New Year!
> >
> > Bolke
> >
> >
> >
> > --
> >  _/
> > _/ Alex Van Boxel
>
>


Re: Subsequent Airflow Meetup: 2017/01/11

2017-01-03 Thread Dan Davydov
Confirmed.

On Sun, Jan 1, 2017 at 9:16 PM, George Leslie-Waksman <
geo...@cloverhealth.com.invalid> wrote:

> Sorry for the delayed response, end of year and holidays stole my attention
> for a bit.
>
> With the new year, I was just looking to pick things back up and solicit
> presenters for the meetup. Given we're looking for two more, and
> Dan(Airbnb) and Kevin(Agari) have already expressed interest, I'd be happy
> to give them the spots.
>
> I hope the delay in my response isn't too much of an inconvenience for
> anyone. Dan, Kevin: confirm and I'll add you to the line up.
>
> --George
>
> On Sun, Nov 20, 2016 at 8:44 PM siddharth anand  wrote:
>
> > I suspect Clover Health is extremely busy with all of the benefit
> > enrollments going on right now..
> >
> > George,
> > When you come up for air, it looks like both Dan(Airbnb) and Kevin(Agari)
> > have talk ideas.
> >
> > -s
> >
> > On Wed, Nov 16, 2016 at 11:50 PM, Dan Davydov <
> > dan.davy...@airbnb.com.invalid> wrote:
> >
> > > Based on chatting with a couple of people today at the Airflow meet-up
> I
> > > think there has been some demand for an airflow operations talk,
> > > specifically around monitoring/alerting. If there is still room I can
> > give
> > > a talk about this, let me know George.
> > >
> > > On Thu, Nov 10, 2016 at 10:17 AM, siddharth anand 
> > > wrote:
> > >
> > > > Kevin,
> > > > Here's a link to the 1Q17 meet-up.
> > > >
> > https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/events/
> > > > 235259523/
> > > >
> > > > Both upcoming meet-ups (next week at WePay and 1Q17 at Clover Health)
> > can
> > > > be found on http://www.meetup.com/Bay-Area-Apache-Airflow-
> > > > Incubating-Meetup/
> > > >
> > > > -s
> > > >
> > > >
> > > > On Wed, Nov 9, 2016 at 4:24 PM, Kevin Mandich <
> kevinmand...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi George,
> > > > >
> > > > > If there is still room, I'd like to give a talk about how we use
> > > Airflow
> > > > at
> > > > > my company, Agari. We are a data company that is working to
> eliminate
> > > > > inbound, targeted e-mail attacks to our customers
> (spear-phishing). I
> > > am
> > > > > currently working as a data scientist who is also responsible for
> > > > shipping
> > > > > my work to production.
> > > > >
> > > > > We currently use Airflow to build models from our telemetry data
> > which
> > > > are
> > > > > then used for scoring in our near-real-time pipeline. I'd like to
> > talk
> > > > > about some of the DAGs we've set up to do this.
> > > > >
> > > > > Please let me know if this sounds reasonable. Thank you,
> > > > >
> > > > > Kevin Mandich
> > > > > Agari Data, Inc.
> > > > >
> > > > >
> > > > > On Mon, Oct 31, 2016 at 11:27 PM, George Leslie-Waksman <
> > > > > geo...@cloverhealth.com.invalid> wrote:
> > > > >
> > > > > > I know it's a bit far in advance, but to make sure there's space
> > (and
> > > > > food
> > > > > > and drink), I've scheduled and booked the subsequent meetup for
> > > January
> > > > > > 11th at Clover Health in SF.
> > > > > >
> > > > > > If anyone wants to volunteer to talk, let me know, otherwise I'll
> > > > > probably
> > > > > > start bugging folks sometime after Thanksgiving and before the
> > > December
> > > > > > holidays.
> > > > > >
> > > > > > --George Leslie-Waksman
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Airflow 1.8.0 Alpha 1

2017-01-03 Thread Dan Davydov
All very reasonable to me, one reason we may not have hit the bugs in our
production is because we are running off a different merge base and our
cherries aren't 1-1 with what we are running in production (we still test
them but we can't run them in production), that being said I don't think I
authored the commits you are referring to so I don't have full context.

On Tue, Jan 3, 2017 at 1:27 PM, Bolke de Bruin  wrote:

> Hi Dan et al,
>
> That sounds good to me, however I will be pretty critical of the changes
> in the scheduler and the cleanliness of the patches. This is due to the
> fact I have been chasing quite some bugs in master that were pretty hard to
> track down even with a debugger at hand. I’m surprised that those didn’t
> pop up in your production or maybe I am concerned ;-). Anyways, I hope you
> understand I might be a bit picky in understanding and needing (design)
> documentation for some of the changes.
>
> What I would like to suggest is that for the Alpha versions we still
> accept “new” features so these PRs can get in, but from Beta we will not
> accept new features anymore. For new features in the area of the scheduler
> an integration DummyDag should be supplied, so others can test the
> behaviour. Does this sound ok?
>
> My list of open code items for a release looks now like this:
>
> Blockers
> * one_failed not honoured
> * Alex’s sensor issue
>
> New features:
> * Schedule all pending DAGs in a single loop
> * Add support for backfill true/false
> * Impersonation
> * CGroups
> * Add Cloud Storage updated sensor
>
> Alpha2 I will package tomorrow. Packages are signed now by my apache.org <
> http://apache.org/> key. Please verify and let me know if something is
> off. I’m still waiting for access to the incubating dist repository.
>
> Bolke
>
>
> > On 3 Jan 2017, at 14:38, Dan Davydov 
> wrote:
> >
> > I have also started on this effort, recently Alex Guziel and I have been
> > pushing Airbnb's custom cherries onto master to get Airbnb back onto
> master
> > in order for us to do a release.
> >
> > I think it might make sense to wait for these two commits to get merged
> in
> > since they would be quite nice to have for all Airflow users and seem
> like
> > they will be merged soon:
> > Schedule all pending DAG runs in a single scheduler loop -
> > https://github.com/apache/incubator-airflow/pull/1906 <
> https://github.com/apache/incubator-airflow/pull/1906>
> > Add Support for dag.backfill=(True|False) Option -
> > https://github.com/apache/incubator-airflow/pull/1830 <
> https://github.com/apache/incubator-airflow/pull/1830>
> > Impersonation Support + Cgroups - https://github.com/apache/ <
> https://github.com/apache/>
> > incubator-airflow/pull/1934 (this is kind of important from the Airbnb
> side
> > so that we can help test the new master without having to cherrypick this
> > PR on top of it which would make the testing unreliable for others).
> >
> > If there are PRs that affect the core of Airflow that other committers
> > think are important to merge we could include these too. I can commit to
> > pushing out the Impersonation/Cgroups PR this week pending PR comments.
> > What do you think Bolke?
> >
> > On Tue, Jan 3, 2017 at 4:26 AM, Bolke de Bruin  <mailto:bdbr...@gmail.com>> wrote:
> >
> >> Hey Alex,
> >>
> >> I have noticed the same, and it is also the reason why we have Alpha
> >> versions. For now I have noticed the following:
> >>
> >> * Tasks can get in limbo between scheduler and executor:
> >> https://github.com/apache/incubator-airflow/pull/1948 <
> https://github.com/apache/incubator-airflow/pull/1948> <
> >> https://github.com/apache/incubator-airflow/pull/1948 <
> https://github.com/apache/incubator-airflow/pull/1948>>
> >> * Try_number not increased due to reset in LocalTaskJob:
> >> https://github.com/apache/incubator-airflow/pull/1969 <
> https://github.com/apache/incubator-airflow/pull/1969> <
> >> https://github.com/apache/incubator-airflow/pull/1969 <
> https://github.com/apache/incubator-airflow/pull/1969>>
> >> * one_failed trigger not executed
> >>
> >> My idea is to move to a Samba style of releases eventually, but for now
> I
> >> would like to get master into a state that we understand and therefore
> not
> >> accept any patches that do not address any bugs.
> >>
> >> If you (or anyone else) can review the above PRs and add your own as
> well
> >> then I can create anothe

Re: Airflow 1.8.0 Alpha 1

2017-01-04 Thread Dan Davydov
It should be fine to delete them, hopefully noone is depending on them.

On Jan 4, 2017 11:41 AM, "Chris Riccomini"  wrote:

> @Bolke, thanks for creating the branch! Your plan sounds good to me. Re:
> deleting airbnb branches, I'll leave Dan/Max/Paul/Arthur/etc to comment on
> that. :)
>
> On Wed, Jan 4, 2017 at 7:59 AM, Bolke de Bruin  wrote:
>
> > Hi Chris,
> >
> > I have created branch “v1-8-test”. For now I want to keep master and
> > v1-8-test in sync and do not do any cherry picking. The reason for this
> is
> > that we have a lot of catching up to do between 1.7.1.3 and 1.8.0, next
> to
> > that master is (at least to me) in an unknown state. If someone has a
> > better way to do this I am open to suggestions.
> >
> > When we release 1.8.0 I will create branch v-1-8-stable. This should
> track
> > point releases (e.g., 1.8.1, 1.8.2).
> >
> > On a side note I have deleted many old branches. This is what is left:
> >
> >   remotes/apache/airbnb_rb1.7.1
> >   remotes/apache/airbnb_rb1.7.1_2
> >   remotes/apache/airbnb_rb1.7.1_3
> >   remotes/apache/airbnb_rb1.7.1_4
> >   remotes/apache/master
> >   remotes/apache/v1-8-test
> >
> > I would like to remove the Airbnb branches as well. Can I? Maybe leave
> one
> > in as it reflect 1.7.1.3? (Which one?)
> >
> > - Bolke
> >
> >
> > > On 3 Jan 2017, at 20:34, Chris Riccomini 
> wrote:
> > >
> > > Hey Bolke,
> > >
> > > Thanks for taking this on. I'm definitely up for running stuff in our
> > > environments to verify everything is working.
> > >
> > > Can I ask that you create a 1.8 alpha 1 branch in the git repo? This
> will
> > > make it easier for us to track what changes are getting cherry picked
> > into
> > > the branch, and will also make it easier for users to pip install, if
> > they
> > > want to do so via github.
> > >
> > > Also, yea, when we switch to beta, we need to stop merging anything
> other
> > > than bug fixes into the release branch.
> > >
> > > Cheers,
> > > Chris
> > >
> > > On Tue, Jan 3, 2017 at 10:31 AM, Dan Davydov  > invalid
> > >> wrote:
> > >
> > >> All very reasonable to me, one reason we may not have hit the bugs in
> > our
> > >> production is because we are running off a different merge base and
> our
> > >> cherries aren't 1-1 with what we are running in production (we still
> > test
> > >> them but we can't run them in production), that being said I don't
> > think I
> > >> authored the commits you are referring to so I don't have full
> context.
> > >>
> > >> On Tue, Jan 3, 2017 at 1:27 PM, Bolke de Bruin 
> > wrote:
> > >>
> > >>> Hi Dan et al,
> > >>>
> > >>> That sounds good to me, however I will be pretty critical of the
> > changes
> > >>> in the scheduler and the cleanliness of the patches. This is due to
> the
> > >>> fact I have been chasing quite some bugs in master that were pretty
> > hard
> > >> to
> > >>> track down even with a debugger at hand. I’m surprised that those
> > didn’t
> > >>> pop up in your production or maybe I am concerned ;-). Anyways, I
> hope
> > >> you
> > >>> understand I might be a bit picky in understanding and needing
> (design)
> > >>> documentation for some of the changes.
> > >>>
> > >>> What I would like to suggest is that for the Alpha versions we still
> > >>> accept “new” features so these PRs can get in, but from Beta we will
> > not
> > >>> accept new features anymore. For new features in the area of the
> > >> scheduler
> > >>> an integration DummyDag should be supplied, so others can test the
> > >>> behaviour. Does this sound ok?
> > >>>
> > >>> My list of open code items for a release looks now like this:
> > >>>
> > >>> Blockers
> > >>> * one_failed not honoured
> > >>> * Alex’s sensor issue
> > >>>
> > >>> New features:
> > >>> * Schedule all pending DAGs in a single loop
> > >>> * Add support for backfill true/false
> > >>> * Impersonation
> > >>> * CGroups
> > >>> * Add Cloud Storage updated sensor
> > >>>
> >

Re: Subsequent Airflow Meetup: 2017/01/11

2017-01-04 Thread Dan Davydov
Title: Operations & Support for Airflow
Brief Description: Several ideas for how to help catch and debug
operational issues with Airflow, as well as how to effectively deal with
common user issues.

On Wed, Jan 4, 2017 at 10:32 AM, George Leslie-Waksman <
geo...@cloverhealth.com.invalid> wrote:

> Kevin, Dan, do you have titles and (maybe) a brief paragraph for the meetup
> description, or should I just make something from the descriptions earlier
> in this thread?
>
> --George
>
> On Tue, Jan 3, 2017 at 4:00 PM Kevin Mandich 
> wrote:
>
> > Hi George,
> >
> > Confirmed - would like give a talk. Thanks,
> >
> > Kevin Mandich
> >
> > On Tue, Jan 3, 2017 at 5:40 AM, Dan Davydov  > .invalid>
> > wrote:
> >
> > > Confirmed.
> > >
> > > On Sun, Jan 1, 2017 at 9:16 PM, George Leslie-Waksman <
> > > geo...@cloverhealth.com.invalid> wrote:
> > >
> > > > Sorry for the delayed response, end of year and holidays stole my
> > > attention
> > > > for a bit.
> > > >
> > > > With the new year, I was just looking to pick things back up and
> > solicit
> > > > presenters for the meetup. Given we're looking for two more, and
> > > > Dan(Airbnb) and Kevin(Agari) have already expressed interest, I'd be
> > > happy
> > > > to give them the spots.
> > > >
> > > > I hope the delay in my response isn't too much of an inconvenience
> for
> > > > anyone. Dan, Kevin: confirm and I'll add you to the line up.
> > > >
> > > > --George
> > > >
> > > > On Sun, Nov 20, 2016 at 8:44 PM siddharth anand 
> > > wrote:
> > > >
> > > > > I suspect Clover Health is extremely busy with all of the benefit
> > > > > enrollments going on right now..
> > > > >
> > > > > George,
> > > > > When you come up for air, it looks like both Dan(Airbnb) and
> > > Kevin(Agari)
> > > > > have talk ideas.
> > > > >
> > > > > -s
> > > > >
> > > > > On Wed, Nov 16, 2016 at 11:50 PM, Dan Davydov <
> > > > > dan.davy...@airbnb.com.invalid> wrote:
> > > > >
> > > > > > Based on chatting with a couple of people today at the Airflow
> > > meet-up
> > > > I
> > > > > > think there has been some demand for an airflow operations talk,
> > > > > > specifically around monitoring/alerting. If there is still room I
> > can
> > > > > give
> > > > > > a talk about this, let me know George.
> > > > > >
> > > > > > On Thu, Nov 10, 2016 at 10:17 AM, siddharth anand <
> > san...@apache.org
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Kevin,
> > > > > > > Here's a link to the 1Q17 meet-up.
> > > > > > >
> > > > > https://www.meetup.com/Bay-Area-Apache-Airflow-
> > > Incubating-Meetup/events/
> > > > > > > 235259523/
> > > > > > >
> > > > > > > Both upcoming meet-ups (next week at WePay and 1Q17 at Clover
> > > Health)
> > > > > can
> > > > > > > be found on http://www.meetup.com/Bay-Area-Apache-Airflow-
> > > > > > > Incubating-Meetup/
> > > > > > >
> > > > > > > -s
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Nov 9, 2016 at 4:24 PM, Kevin Mandich <
> > > > kevinmand...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi George,
> > > > > > > >
> > > > > > > > If there is still room, I'd like to give a talk about how we
> > use
> > > > > > Airflow
> > > > > > > at
> > > > > > > > my company, Agari. We are a data company that is working to
> > > > eliminate
> > > > > > > > inbound, targeted e-mail attacks to our customers
> > > > (spear-phishing). I
> > > > > > am
> > > > > > > > currently working as a data scientist who is also responsible
> > for
> > > > > > > shipping
> > > > > > > > my work to production.
> > > > > > > >
> > > > > > > > We currently use Airflow to build models from our telemetry
> > data
> > > > > which
> > > > > > > are
> > > > > > > > then used for scoring in our near-real-time pipeline. I'd
> like
> > to
> > > > > talk
> > > > > > > > about some of the DAGs we've set up to do this.
> > > > > > > >
> > > > > > > > Please let me know if this sounds reasonable. Thank you,
> > > > > > > >
> > > > > > > > Kevin Mandich
> > > > > > > > Agari Data, Inc.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Oct 31, 2016 at 11:27 PM, George Leslie-Waksman <
> > > > > > > > geo...@cloverhealth.com.invalid> wrote:
> > > > > > > >
> > > > > > > > > I know it's a bit far in advance, but to make sure there's
> > > space
> > > > > (and
> > > > > > > > food
> > > > > > > > > and drink), I've scheduled and booked the subsequent meetup
> > for
> > > > > > January
> > > > > > > > > 11th at Clover Health in SF.
> > > > > > > > >
> > > > > > > > > If anyone wants to volunteer to talk, let me know,
> otherwise
> > > I'll
> > > > > > > > probably
> > > > > > > > > start bugging folks sometime after Thanksgiving and before
> > the
> > > > > > December
> > > > > > > > > holidays.
> > > > > > > > >
> > > > > > > > > --George Leslie-Waksman
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Last minute open meetup speaking slot

2017-01-11 Thread Dan Davydov
Thank you Arthur for stepping up, and my sincere apologies. I just don't
want to get anyone sick. I hope to present the talk at the next meetup.

On Jan 10, 2017 11:22 PM, "George Leslie-Waksman"

wrote:

> Listing updated. Thanks for stepping in on short notice.
>
> On Tue, Jan 10, 2017 at 8:15 PM Arthur Wiedmer 
> wrote:
>
> > George,
> >
> > I'm in. Here is some info if you want to update the meetup page. Thanks
> for
> > the opportunity !
> >
> >
> > The working title is :
> > Using Apache Airflow as a platform for data engineering frameworks.
> >
> > A short abstract would be :
> > Airbnb uses Airflow ability to dynamically generate pipelines to power
> > frameworks addressing the needs of the data teams. We will explore some
> of
> > Airflow expressiveness via a couple of examples running in production at
> > Airbnb.
> >
> > Best,
> > Arthur.
> >
> > On Jan 10, 2017 8:03 PM, "George Leslie-Waksman"
> >  wrote:
> >
> > 20 minutes. If you want to fill in, that would be great.
> >
> > Regards,
> > --George
> >
> > On Tue, Jan 10, 2017 at 4:56 PM Arthur Wiedmer  >
> > wrote:
> >
> > > George,
> > >
> > > How long are the time slots?
> > >
> > > I might be able to put something together about some of the frameworks
> we
> > > have been using at Airbnb on top of Airflow.
> > >
> > > Best,
> > > Arthur
> > >
> > > On Tue, Jan 10, 2017 at 4:36 PM, George Leslie-Waksman <
> > > geo...@cloverhealth.com.invalid> wrote:
> > >
> > > > One of the speakers for tomorrow's meetup has come down with a cold.
> > > >
> > > > Is there anyone that would like to claim the third time slot?
> > > >
> > > > If not, we'll have extra time for Q&A, updates, and general
> meeting-up.
> > > >
> > > > --George
> > > >
> > >
> >
>


Re: Airflow 1.8.0 alpha 4

2017-01-11 Thread Dan Davydov
The task dependency engine code is well commented, but I can provide a high
level overview specifically for developers if there is interest (note that
this would be the first documentation of it's kind in that it would be
developer-only documentation). The disadvantage is that it would create
duplication with the logic itself on quite a large scale. Let me know Bolke.

On Wed, Jan 11, 2017 at 1:30 PM, Chris Riccomini 
wrote:

> @bolke, this sounds like a good list.
>
> On Wed, Jan 11, 2017 at 12:01 PM, Bolke de Bruin 
> wrote:
>
> > Ok.
> >
> > For now to call it “beta” 4 items seems to be left:
> >
> > Blocker:
> > * retry_delay not respected
> > * poison pill due to re-queue before process has finished (to be
> > investigated)
> >
> > Features:
> > * cgroups + impersonation
> > * dag.catchup (Ben Tallman -> Only documentation is missing).
> >
> > PRs that contain documentation would really be appreciated. In my opinion
> > we are lacking there. Think about docs covering:
> > * new scheduler behaviour and options
> > * task dependency engine
> > * api / kerberized api
> > * …
> >
> > Cheers
> > Bolke
> >
> > > On 11 Jan 2017, at 18:59, Arthur Wiedmer 
> > wrote:
> > >
> > > +1
> > >
> > > We can always think about different ways of doing this later (fair
> share
> > > scheduling etc...)
> > >
> > > Best,
> > > Arthur
> > >
> > > On Wed, Jan 11, 2017 at 4:46 AM, Bolke de Bruin 
> > wrote:
> > >
> > >> Dear All,
> > >>
> > >> I would like to drop "Schedule all pending DAG runs in a single
> > scheduler
> > >> loop” from the 1.8.0 release (updated: https://github.com/apache/
> > >> incubator-airflow/pull/1980  > >> incubator-airflow/pull/1980>, original: https://github.com/apache/
> > >> incubator-airflow/pull/1906  > >> incubator-airflow/pull/1906>). The reason for this is that it, imho,
> > >> biases the scheduler towards a single DAG as it fills the queue with
> > tasks
> > >> from one DAG and then goes to the next DAG. Starving DAGs that come
> > after
> > >> the first for resources. As such it should be updated and that will
> take
> > >> time.
> > >>
> > >> Please let me know if I am incorrect.
> > >>
> > >> Thanks
> > >> Bolke
> > >>
> > >>> On 10 Jan 2017, at 09:25, Bolke de Bruin  wrote:
> > >>>
> > >>> Dear All,
> > >>>
> > >>> I have made Airflow 1.8.0 alpha 4 available at
> > >> https://people.apache.org/~bolke/ 
> .
> > >> Again no Apache release yet - this is for testing purposes. I consider
> > this
> > >> Alpha to be a Beta if not for the pending features. If the pending
> > features
> > >> are merged within a reasonable time frame (except for **, as no
> progress
> > >> currently) then I am planning to mark the tarball as Beta and only
> allow
> > >> bug fixes and (very) minor features. This week hopefully.
> > >>>
> > >>> Blockers:
> > >>>
> > >>> * None
> > >>>
> > >>> Fixed issues
> > >>> * Regression in email
> > >>> * LDAP case sensitivity
> > >>> * one_failed task not being run: now seems to pass suddenly (so
> fixed?)
> > >> -> need to investigate why
> > >>> * Email attachments
> > >>> * Pinned jinja2 to < 2.9.0 (2.9.1 has a confirmed regression)
> > >>> * Improve time units for task performance charts
> > >>> * XCom throws an duplicate / locking error
> > >>> * Add execution_date to trigger_dag
> > >>>
> > >>> Pending features:
> > >>> * DAG.catchup : minor changes needed, documentation still required,
> > >> integration tests seem to pass flawlessly
> > >>> * Cgroups + impersonation: clean up of patches on going, more tests
> and
> > >> more elaborate documentation required. Integration tests not executed
> > yet
> > >>> * Schedule all pending DAG runs in a single scheduler loop: no
> progress
> > >> (**)
> > >>>
> > >>> Cheers!
> > >>> Bolke
> > >>
> > >>
> >
> >
>


Re: Airflow 1.8.0 BETA 1

2017-01-17 Thread Dan Davydov
Would be good to cherrypick Arthur's fix into here if possible:
https://github.com/apache/incubator-airflow/pull/1973/files (commit
 43bf89d)

The impersonation stuff should be wrapping up shortly pending Bolke's
comments.

Also agreed with Max on the thanks. Thanks Alex too for the change log!

On Tue, Jan 17, 2017 at 10:05 AM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Bolke, I couldn't thank you enough for driving the release process!
>
> I'll coordinate with the Airbnb team around impersonation/CGROUPs and on
> making sure we put this release in our staging ASAP. We have our employee
> conference this week so things are slower, but we'll be back at full speed
> Friday.
>
> Max
>
> On Mon, Jan 16, 2017 at 3:51 PM, Alex Van Boxel  wrote:
>
> > Hey Bolke, thanks great wotk. I'll handle the CHANGELOG, and add some
> > documentation about triggers with branching operators.
> >
> > About the Google Cloud Operators: I wouldn't call it feature complete...
> it
> > never is.
> >
> >
> > On Mon, Jan 16, 2017 at 11:24 PM Bolke de Bruin 
> wrote:
> >
> > > Dear All,
> > >
> > > I have made the first BETA of Airflow 1.8.0 available at:
> > > https://dist.apache.org/repos/dist/dev/incubator/airflow/ <
> > > https://dist.apache.org/repos/dist/dev/incubator/airflow/> , public
> keys
> > > are available at
> > > https://dist.apache.org/repos/dist/release/incubator/airflow/ <
> > > https://dist.apache.org/repos/dist/release/incubator/airflow/> . It is
> > > tagged with a local version “apache.incubating” so it allows upgrading
> > from
> > > earlier releases. This beta is available for testing in a more
> production
> > > like setting (acceptance environment?).
> > >
> > > I would like to encourage everyone  to try it out, to report back any
> > > issues so we get to a rock solid release of 1.8.0. When reporting
> issues
> > a
> > > test case or even a fix is highly appreciated.
> > >
> > > By moving to beta, we are also in feature freeze mode. Meaning no major
> > > adjustments or additions can be made to the v1-8-test branch. There is
> > one
> > > exception: the cgroups+impersonation patch. I was assured that before
> we
> > > merge that it will be thoroughly tested, so its can still enter 1.8 if
> > > within a reasonable time frame. A lot of work has gone into it and it
> > would
> > > be a shame if we would lose momentum.
> > >
> > > Finally, it would also be really nice of have some updates to the
> > > documentation. In order of importance:
> > >
> > > * UPDATING.md What does a user need to think of when upgrading to 1.8?
> > > MySQL 5.6.4 is now minimally required, scheduler now has separate logs
> > per
> > > file processor.
> > > * docs/configuration.rst We have many new options, especially in the
> > > scheduler area
> > > * docs/faq.rst
> > > * CHANGELOG.txt (compiled from git log)
> > > * swagger definitions for the API
> > >
> > > HIGHLIGHTS of the beta:
> > >
> > > * DAG catchup: If False the scheduler does not fill in the gaps between
> > > the start_date and the current_date. Can be specified per dag or
> globally
> > > * Per DAG multi processing: More robust and faster DAG processing. A
> > > faulty DAG should not take down the scheduler any more
> > > * Google Cloud Operators: Feature complete I have heard.
> > > * Time units now dynamic UI
> > > * Better SMTP handling and attachment support
> > > * Operational metrics for the scheduler
> > > * MSSQL Improvements
> > > * Experimental Rest API with Kerberos support
> > > * Auto alignment of start_date to interval
> > > * Better support for sub second scheduling
> > > * Rolling restart of web workers
> > > * nvd3.js instead of highcharts
> > > * New dependency engine making debugging why my task is running easier
> > > * Many UI updates
> > > * Many new operators
> > > * Many, many, many bugfixes
> > >
> > > RELEASE PLANNING
> > >
> > > Beta 2: 20 Jan
> > > Beta 3: 25 Jan
> > > RC1:  2 Feb
> > >
> > > Cheers
> > > Bolke
> > >
> > >
> > >
> > > --
> >   _/
> > _/ Alex Van Boxel
> >
>


Re: Airflow 1.8.0 BETA 1

2017-01-17 Thread Dan Davydov
So it is, my bad. Bad skills with ctrl-f :).

On Tue, Jan 17, 2017 at 3:31 PM, Bolke de Bruin  wrote:

> Arthur's change is already in!
>
> B.
>
> Sent from my iPhone
>
> > On 17 Jan 2017, at 22:20, Dan Davydov 
> wrote:
> >
> > Would be good to cherrypick Arthur's fix into here if possible:
> > https://github.com/apache/incubator-airflow/pull/1973/files (commit
> > 43bf89d)
> >
> > The impersonation stuff should be wrapping up shortly pending Bolke's
> > comments.
> >
> > Also agreed with Max on the thanks. Thanks Alex too for the change log!
> >
> > On Tue, Jan 17, 2017 at 10:05 AM, Maxime Beauchemin <
> > maximebeauche...@gmail.com> wrote:
> >
> >> Bolke, I couldn't thank you enough for driving the release process!
> >>
> >> I'll coordinate with the Airbnb team around impersonation/CGROUPs and on
> >> making sure we put this release in our staging ASAP. We have our
> employee
> >> conference this week so things are slower, but we'll be back at full
> speed
> >> Friday.
> >>
> >> Max
> >>
> >>> On Mon, Jan 16, 2017 at 3:51 PM, Alex Van Boxel 
> wrote:
> >>>
> >>> Hey Bolke, thanks great wotk. I'll handle the CHANGELOG, and add some
> >>> documentation about triggers with branching operators.
> >>>
> >>> About the Google Cloud Operators: I wouldn't call it feature
> complete...
> >> it
> >>> never is.
> >>>
> >>>
> >>> On Mon, Jan 16, 2017 at 11:24 PM Bolke de Bruin 
> >> wrote:
> >>>
> >>>> Dear All,
> >>>>
> >>>> I have made the first BETA of Airflow 1.8.0 available at:
> >>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/ <
> >>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/> , public
> >> keys
> >>>> are available at
> >>>> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
> >>>> https://dist.apache.org/repos/dist/release/incubator/airflow/> . It
> is
> >>>> tagged with a local version “apache.incubating” so it allows upgrading
> >>> from
> >>>> earlier releases. This beta is available for testing in a more
> >> production
> >>>> like setting (acceptance environment?).
> >>>>
> >>>> I would like to encourage everyone  to try it out, to report back any
> >>>> issues so we get to a rock solid release of 1.8.0. When reporting
> >> issues
> >>> a
> >>>> test case or even a fix is highly appreciated.
> >>>>
> >>>> By moving to beta, we are also in feature freeze mode. Meaning no
> major
> >>>> adjustments or additions can be made to the v1-8-test branch. There is
> >>> one
> >>>> exception: the cgroups+impersonation patch. I was assured that before
> >> we
> >>>> merge that it will be thoroughly tested, so its can still enter 1.8 if
> >>>> within a reasonable time frame. A lot of work has gone into it and it
> >>> would
> >>>> be a shame if we would lose momentum.
> >>>>
> >>>> Finally, it would also be really nice of have some updates to the
> >>>> documentation. In order of importance:
> >>>>
> >>>> * UPDATING.md What does a user need to think of when upgrading to 1.8?
> >>>> MySQL 5.6.4 is now minimally required, scheduler now has separate logs
> >>> per
> >>>> file processor.
> >>>> * docs/configuration.rst We have many new options, especially in the
> >>>> scheduler area
> >>>> * docs/faq.rst
> >>>> * CHANGELOG.txt (compiled from git log)
> >>>> * swagger definitions for the API
> >>>>
> >>>> HIGHLIGHTS of the beta:
> >>>>
> >>>> * DAG catchup: If False the scheduler does not fill in the gaps
> between
> >>>> the start_date and the current_date. Can be specified per dag or
> >> globally
> >>>> * Per DAG multi processing: More robust and faster DAG processing. A
> >>>> faulty DAG should not take down the scheduler any more
> >>>> * Google Cloud Operators: Feature complete I have heard.
> >>>> * Time units now dynamic UI
> >>>> * Better SMTP handling and attachment support
> >>>> * Operational metrics for the scheduler
> >>>> * MSSQL Improvements
> >>>> * Experimental Rest API with Kerberos support
> >>>> * Auto alignment of start_date to interval
> >>>> * Better support for sub second scheduling
> >>>> * Rolling restart of web workers
> >>>> * nvd3.js instead of highcharts
> >>>> * New dependency engine making debugging why my task is running easier
> >>>> * Many UI updates
> >>>> * Many new operators
> >>>> * Many, many, many bugfixes
> >>>>
> >>>> RELEASE PLANNING
> >>>>
> >>>> Beta 2: 20 Jan
> >>>> Beta 3: 25 Jan
> >>>> RC1:  2 Feb
> >>>>
> >>>> Cheers
> >>>> Bolke
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>  _/
> >>> _/ Alex Van Boxel
> >>>
> >>
>


Re: Experiences with 1.8.0

2017-01-20 Thread Dan Davydov
I'd be happy to lend a hand fixing these issues and hopefully some others
are too. Do you mind creating jiras for these since you have the full
context? I have created a JIRA for (1) and have assigned it to myself:
https://issues.apache.org/jira/browse/AIRFLOW-780

On Fri, Jan 20, 2017 at 1:01 AM, Bolke de Bruin  wrote:

> This is to report back on some of the (early) experiences we have with
> Airflow 1.8.0 (beta 1 at the moment):
>
> 1. The UI does not show faulty DAG, leading to confusion for developers.
> When a faulty dag is placed in the dags folder the UI would report a
> parsing error. Now it doesn’t due to the separate parising (but not
> reporting back errors)
>
> 2. The hive hook sets ‘airflow.ctx.dag_id’ in hive
> We run in a secure environment which requires this variable to be
> whitelisted if it is modified (needs to be added to UPDATING.md)
>
> 3. DagRuns do not exist for certain tasks, but don’t get fixed
> Log gets flooded without a suggestion what to do
>
> 4. At start up all running dag_runs are being checked, we seemed to have a
> lot of “left over” dag_runs (couple of thousand)
> - Checking was logged to INFO -> requires a fsync for every log message
> making it very slow
> - Checking would happen at every restart, but dag_runs’ states were not
> being updated
> - These dag_runs would never er be marked anything else than running for
> some reason
> -> Applied work around to update all dag_run in sql before a certain date
> to -> finished
> -> need to investigate why dag_runs did not get marked “finished/failed”
>
> 5. Our umask is set to 027
>
>


Re: Airflow 1.8.0 BETA 5

2017-01-30 Thread Dan Davydov
The latest commit fixed a regression since 1.7 that files with parsing
errors no longer showed up on the UI.

On Mon, Jan 30, 2017 at 2:42 PM, Alex Van Boxel  wrote:

> Just installed beta 5 on our dev environment it lighted up as a christmas
> tree. I got a a screen full of import errors. I see that the latest commit
> did something with import errors... is it coorect?!
>
> On Sun, Jan 29, 2017 at 4:37 PM Bolke de Bruin  wrote:
>
> > Hey Boris
> >
> > The scheduler is a bit more aggressive and can use multiple processors,
> so
> > higher CPU usage is actually a good thing.
> >
> > I case it is really out of hand look at the new scheduler options and
> > heartbeat options (see PR for updating.md not in the beta yet).
> >
> > Bolke
> >
> > Sent from my iPhone
> >
> > > On 29 Jan 2017, at 15:35, Boris Tyukin  wrote:
> > >
> > > I am not sure if it is my config or something, but looks like after the
> > > upgrade and start of scheduler, airflow would totally hose CPU. The
> > reason
> > > is two new examples that start running right away - latest only and
> > latest
> > > with trigger. Once I pause them, CPU goes back to idle. Is this because
> > now
> > > dags are not paused by default like it was before?
> > >
> > > As I mentioned before, I also had to upgrade mysql to 5.7 - if someone
> > > needs a step by step instruction, make sure to follow all steps
> precisely
> > > here for in-place upgrade or you will have heck of the time (like me).
> > >
> > https://dev.mysql.com/doc/refman/5.7/en/upgrading.html#
> upgrade-procedure-inplace
> > >
> > > BTW official Oracle repository for Oracle Linux only has MySql 5.6 -
> for
> > > 5.7 you have to use MySql community repo.
> > >
> > >> On Sat, Jan 28, 2017 at 10:07 AM, Bolke de Bruin 
> > wrote:
> > >>
> > >> Hi All,
> > >>
> > >> I have made the FIFTH beta of Airflow 1.8.0 available at:
> > >> https://dist.apache.org/repos/dist/dev/incubator/airflow/ <
> > >> https://dist.apache.org/repos/dist/dev/incubator/airflow/> , public
> > keys
> > >> are available at https://dist.apache.org/repos/
> dist/release/incubator/
> > >> airflow/  airflow/
> > >
> > >> . It is tagged with a local version “apache.incubating” so it allows
> > >> upgrading from earlier releases.
> > >>
> > >> Issues fixed:
> > >> * Parsing errors not showing up in UI fixing a regression**
> > >> * Scheduler would terminate immediately if no dag files present
> > >>
> > >> ** As this touches the scheduler logic I though it warranted another
> > beta.
> > >>
> > >> This should be the last beta in my opinion and we can prepare
> changelog,
> > >> upgrade notes and release notes for the RC (Feb 2).
> > >>
> > >> Cheers
> > >> Bolke
> >
> --
>   _/
> _/ Alex Van Boxel
>


Re: Airflow 1.8.0 BETA 5

2017-01-30 Thread Dan Davydov
@Alex
I'm not able to reproduce locally (assuming the two python files are in the
same folder or is on your PYTHONPATH). I don't see that import error
anyways.

Just in case, what is your complete DAG definition? Is anyone else able to
repro?

On Mon, Jan 30, 2017 at 3:09 PM, Alex Van Boxel  wrote:

> Well this means none of my DAG's work anymore:
>
> you just can do this anymore:
>
> file bqschema.py with
>
> def marketing_segment():
> return [
> {"name": "user_id", "type": "integer", "mode": "nullable"},
> {"name": "bucket_date", "type": "timestamp", "mode": "nullable"},
> {"name": "segment_main", "type": "string", "mode": "nullable"},
> {"name": "segment_sub", "type": "integer", "mode": "nullable"},
>
>
> In marketing_segmentation.py:
>
>
> import bqschema
>
> Gives an error:
>
> Traceback (most recent call last):
>   File
> "/usr/local/lib/python2.7/site-packages/airflow-1.8.0b5+
> apache.incubating-py2.7.egg/airflow/models.py",
> line 264, in process_file
> m = imp.load_source(mod_name, filepath)
>   File "/home/airflow/dags/marketing_segmentation.py", line 17, in
> 
> import bqschema
> ImportError: No module named bqschema
>
> *I don't think this is incorrect?!*
>
>
>
> On Mon, Jan 30, 2017 at 11:46 PM Dan Davydov  invalid>
> wrote:
>
> > The latest commit fixed a regression since 1.7 that files with parsing
> > errors no longer showed up on the UI.
> >
> > On Mon, Jan 30, 2017 at 2:42 PM, Alex Van Boxel 
> wrote:
> >
> > > Just installed beta 5 on our dev environment it lighted up as a
> christmas
> > > tree. I got a a screen full of import errors. I see that the latest
> > commit
> > > did something with import errors... is it coorect?!
> > >
> > > On Sun, Jan 29, 2017 at 4:37 PM Bolke de Bruin 
> > wrote:
> > >
> > > > Hey Boris
> > > >
> > > > The scheduler is a bit more aggressive and can use multiple
> processors,
> > > so
> > > > higher CPU usage is actually a good thing.
> > > >
> > > > I case it is really out of hand look at the new scheduler options and
> > > > heartbeat options (see PR for updating.md not in the beta yet).
> > > >
> > > > Bolke
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On 29 Jan 2017, at 15:35, Boris Tyukin 
> > wrote:
> > > > >
> > > > > I am not sure if it is my config or something, but looks like after
> > the
> > > > > upgrade and start of scheduler, airflow would totally hose CPU. The
> > > > reason
> > > > > is two new examples that start running right away - latest only and
> > > > latest
> > > > > with trigger. Once I pause them, CPU goes back to idle. Is this
> > because
> > > > now
> > > > > dags are not paused by default like it was before?
> > > > >
> > > > > As I mentioned before, I also had to upgrade mysql to 5.7 - if
> > someone
> > > > > needs a step by step instruction, make sure to follow all steps
> > > precisely
> > > > > here for in-place upgrade or you will have heck of the time (like
> > me).
> > > > >
> > > > https://dev.mysql.com/doc/refman/5.7/en/upgrading.html#
> > > upgrade-procedure-inplace
> > > > >
> > > > > BTW official Oracle repository for Oracle Linux only has MySql 5.6
> -
> > > for
> > > > > 5.7 you have to use MySql community repo.
> > > > >
> > > > >> On Sat, Jan 28, 2017 at 10:07 AM, Bolke de Bruin <
> bdbr...@gmail.com
> > >
> > > > wrote:
> > > > >>
> > > > >> Hi All,
> > > > >>
> > > > >> I have made the FIFTH beta of Airflow 1.8.0 available at:
> > > > >> https://dist.apache.org/repos/dist/dev/incubator/airflow/ <
> > > > >> https://dist.apache.org/repos/dist/dev/incubator/airflow/> ,
> public
> > > > keys
> > > > >> are available at https://dist.apache.org/repos/
> > > dist/release/incubator/
> > > > >> airflow/ <https://dist.apache.org/repos/dist/release/incubator/
> > > airflow/
> > > > >
> > > > >> . It is tagged with a local version “apache.incubating” so it
> allows
> > > > >> upgrading from earlier releases.
> > > > >>
> > > > >> Issues fixed:
> > > > >> * Parsing errors not showing up in UI fixing a regression**
> > > > >> * Scheduler would terminate immediately if no dag files present
> > > > >>
> > > > >> ** As this touches the scheduler logic I though it warranted
> another
> > > > beta.
> > > > >>
> > > > >> This should be the last beta in my opinion and we can prepare
> > > changelog,
> > > > >> upgrade notes and release notes for the RC (Feb 2).
> > > > >>
> > > > >> Cheers
> > > > >> Bolke
> > > >
> > > --
> > >   _/
> > > _/ Alex Van Boxel
> > >
> >
> --
>   _/
> _/ Alex Van Boxel
>


Re: Airflow 1.8.0 Release Candidate 1

2017-02-02 Thread Dan Davydov
+1, this requires a lot more work than appears on the surface.

On Thu, Feb 2, 2017 at 12:43 PM, Arthur Wiedmer 
wrote:

> Bolke,
>
> Thank you again for leading this effort. This has been quite the journey.
>
> Best,
> Arthur
>
> On Thu, Feb 2, 2017 at 11:50 AM, Bolke de Bruin  wrote:
>
> > Hi All,
> >
> > I have made the (first) RELEASE CANDIDATE of Airflow 1.8.0 available at:
> > https://dist.apache.org/repos/dist/dev/incubator/airflow/ , public keys
> > are available at https://dist.apache.org/repos/dist/release/incubator/
> > airflow/ . It is tagged with a local version “apache.incubating” so it
> > allows upgrading from earlier releases. This should be considered of
> > release quality, but not yet officially vetted as a release yet.
> >
> > Issues fixed:
> > * Use static nvd3 and d3
> > * Python 3 incompatibilities
> > * CLI API trigger dag issue
> >
> > As the difference between beta 5 and the release candidate is relatively
> > small I hope to start the VOTE for releasing 1.8.0 quite soon (2 days?),
> if
> > the vote passes also a vote needs to happen at the IPMC mailinglist. As
> > this is our first Apache release I expect some comments and required
> > changes and probably a RC 2.
> >
> > Furthermore, we now have a “v1-8-stable” branch. This has version
> > “1.8.0rc1” and will graduate to “1.8.0” when we release. The “v1-8-test”
> > branch now has version “1.8.1alpha0” as version and “master” has version
> > “1.9.0dev0”. Note that “v1-8-stable” is now closed. This means that, per
> > release guidelines, patches accompanied with an ASSIGNED Jira and a
> > sign-off from a committer. Only then the release manager applies the
> patch
> > to stable (In this case that would be me). The release manager then
> closes
> > the bug when the patches have landed in the appropriate branches. For
> more
> > information please see: https://cwiki.apache.org/
> > confluence/display/AIRFLOW/Airflow+Release+Planning+and+
> > Supported+Release+Lifetime  > confluence/display/AIRFLOW/Airflow+Release+Planning+and+
> > Supported+Release+Lifetime> .
> >
> > Any questions or suggestions don’t hesitate to ask!
> >
> > Cheers
> > Bolke
>


Re: Flow-based Airflow?

2017-02-06 Thread Dan Davydov
We have been running in our staging and have found a couple of issues. I
will report back with them soon.

On Thu, Feb 2, 2017 at 2:23 PM, Jeremiah Lowin  wrote:

> Very good point -- however I'm hesitant to overcomplicate the base class.
> At the moment users only have to override "serialize()" and "deserialize()"
> to build any form of remote-backed dataflow, and I like the simplicity of
> that.
>
> However, if you look at my implementation of the GCSDataflow, the
> constructor gets passed serializer and deserializer functions that are
> applied to the data before storage and after recovery. I think that sort of
> runtime-configurable serialization is in the spirit of what you're
> describing and it should be straightforward to adapt it for more specific
> requirements.
>
> On Thu, Feb 2, 2017 at 12:37 PM Laura Lorenz 
> wrote:
>
> > This is great!
> >
> > We work with a lot of external data in wildly non-standard formats so
> > another enhancement here we'd use and support is passing customizable
> > serializers to Dataflow subclasses. This would let the dataflows keyword
> > arg for a task handle dependency management, the Dataflow class or
> > subclasses handle IO, and the Serializer subclasses handle parsing.
> >
> > Happy to contribute here, perhaps to create an S3Dataflow subclass in the
> > style of your Google Cloud storage one for this PR.
> >
> > Laura
> >
> > On Wed, Feb 1, 2017 at 6:14 PM, Jeremiah Lowin 
> wrote:
> >
> > > Great point. I think the best solution is to solve this for all XComs
> by
> > > checking object size before adding it to the DB. I don't see a built in
> > way
> > > of handling it (though apparently MySQL is internally limited to 64kb).
> > > I'll look into a PR that would enforce a similar limit for all
> databases.
> > >
> > > On Wed, Feb 1, 2017 at 4:52 PM Maxime Beauchemin <
> > > maximebeauche...@gmail.com>
> > > wrote:
> > >
> > > I'm not sure about XCom being the default, it seems pretty dangerous.
> It
> > > just takes one person that is not fully aware of the size of the data,
> or
> > > one day with an outlier and that could put the Airflow db in jeopardy.
> > >
> > > I guess it's always been an aspect of XCom, and it could be good to
> have
> > > some explicit gatekeeping there regardless of this PR/feature. Perhaps
> > the
> > > DB itself has protection against large blobs?
> > >
> > > Max
> > >
> > > On Wed, Feb 1, 2017 at 12:42 PM, Jeremiah Lowin 
> > wrote:
> > >
> > > > Yesterday I began converting a complex script to a DAG. It turned out
> > to
> > > be
> > > > a perfect test case for the dataflow model: a big chunk of data
> moving
> > > > through a series of modification steps.
> > > >
> > > > So I have built an extensible dataflow extension for Airflow on top
> of
> > > XCom
> > > > and the existing dependency engine:
> > > > https://issues.apache.org/jira/browse/AIRFLOW-825
> > > > https://github.com/apache/incubator-airflow/pull/2046 (still waiting
> > for
> > > > tests... it will be quite embarrassing if they don't pass)
> > > >
> > > > The philosophy is simple:
> > > > Dataflow objects represent the output of upstream tasks. Downstream
> > tasks
> > > > add Dataflows with a specific key. When the downstream task runs, the
> > > > (optionally indexed) upstream result is available in the downstream
> > > context
> > > > under context['dataflows'][key]. In addition, PythonOperators receive
> > the
> > > > data as a keyword argument.
> > > >
> > > > The basic Dataflow serializes the data through XComs, but is
> trivially
> > > > extended to alternative storage via subclasses. I have provided (in
> > > > contrib) implementations of a local filesystem-based Dataflow as well
> > as
> > > a
> > > > Google Cloud Storage dataflow.
> > > >
> > > > Laura, I hope you can have a look and see if this will bring some of
> > your
> > > > requirements in to Airflow as first-class citizens.
> > > >
> > > > Jeremiah
> > > >
> > >
> >
>


Re: Airflow 1.8.0 Release Candidate 1

2017-02-06 Thread Dan Davydov
On the Airbnb side we should be good once https://github.com/apache/
incubator-airflow/pull/2057/ is merged.

On Mon, Feb 6, 2017 at 9:23 AM, Chris Riccomini 
wrote:

> Upgraded to RC1 in all environments this morning. So far so good.
>
> On Fri, Feb 3, 2017 at 6:04 PM, Jeremiah Lowin  wrote:
>
> > For what it's worth -- everything running smoothly after 24+ hours in a
> > production(ish) environment.
> >
> > On Thu, Feb 2, 2017 at 11:25 PM Jayesh Senjaliya 
> > wrote:
> >
> > > Thank You Bolke for all the efforts you are putting in !!
> > >
> > > I have deployed this RC now.
> > >
> > > On Thu, Feb 2, 2017 at 3:02 PM, Jeremiah Lowin 
> > wrote:
> > >
> > > > Fantastic work on this Bolke, thank you!
> > > >
> > > > We've deployed the RC and will report if there are any issues...
> > > >
> > > > On Thu, Feb 2, 2017 at 4:32 PM Bolke de Bruin 
> > wrote:
> > > >
> > > > > Now I am blushing :-)
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > > > On 2 Feb 2017, at 22:05, Boris Tyukin 
> > wrote:
> > > > > >
> > > > > > LOL awesome!
> > > > > >
> > > > > > On Thu, Feb 2, 2017 at 4:00 PM, Maxime Beauchemin <
> > > > > > maximebeauche...@gmail.com> wrote:
> > > > > >
> > > > > >> The Apache mailing doesn't support images so here's a link:
> > > > > >>
> > > > > >> http://i.imgur.com/DUkpjZu.png
> > > > > >> ​
> > > > > >>
> > > > > >> On Thu, Feb 2, 2017 at 12:52 PM, Boris Tyukin <
> > > bo...@boristyukin.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Bolke, you are our hero! I am sure you put a lot of your time
> to
> > > make
> > > > > it
> > > > > >>> happen
> > > > > >>>
> > > > > >>> On Thu, Feb 2, 2017 at 2:50 PM, Bolke de Bruin <
> > bdbr...@gmail.com>
> > > > > >> wrote:
> > > > > >>>
> > > > >  Hi All,
> > > > > 
> > > > >  I have made the (first) RELEASE CANDIDATE of Airflow 1.8.0
> > > available
> > > > > >> at:
> > > > >  https://dist.apache.org/repos/dist/dev/incubator/airflow/ ,
> > > public
> > > > > >> keys
> > > > >  are available at
> > > > > https://dist.apache.org/repos/dist/release/incubator/
> > > > >  airflow/ . It is tagged with a local version
> “apache.incubating”
> > > so
> > > > it
> > > > >  allows upgrading from earlier releases. This should be
> > considered
> > > of
> > > > >  release quality, but not yet officially vetted as a release
> yet.
> > > > > 
> > > > >  Issues fixed:
> > > > >  * Use static nvd3 and d3
> > > > >  * Python 3 incompatibilities
> > > > >  * CLI API trigger dag issue
> > > > > 
> > > > >  As the difference between beta 5 and the release candidate is
> > > > > >> relatively
> > > > >  small I hope to start the VOTE for releasing 1.8.0 quite soon
> (2
> > > > > >> days?),
> > > > > >>> if
> > > > >  the vote passes also a vote needs to happen at the IPMC
> > > mailinglist.
> > > > > As
> > > > >  this is our first Apache release I expect some comments and
> > > required
> > > > >  changes and probably a RC 2.
> > > > > 
> > > > >  Furthermore, we now have a “v1-8-stable” branch. This has
> > version
> > > > >  “1.8.0rc1” and will graduate to “1.8.0” when we release. The
> > > > > >> “v1-8-test”
> > > > >  branch now has version “1.8.1alpha0” as version and “master”
> has
> > > > > >> version
> > > > >  “1.9.0dev0”. Note that “v1-8-stable” is now closed. This means
> > > that,
> > > > > >> per
> > > > >  release guidelines, patches accompanied with an ASSIGNED Jira
> > and
> > > a
> > > > >  sign-off from a committer. Only then the release manager
> applies
> > > the
> > > > > >>> patch
> > > > >  to stable (In this case that would be me). The release manager
> > > then
> > > > > >>> closes
> > > > >  the bug when the patches have landed in the appropriate
> > branches.
> > > > For
> > > > > >>> more
> > > > >  information please see: https://cwiki.apache.org/
> > > > >  confluence/display/AIRFLOW/Airflow+Release+Planning+and+
> > > > >  Supported+Release+Lifetime  > > > >  confluence/display/AIRFLOW/Airflow+Release+Planning+and+
> > > > >  Supported+Release+Lifetime> .
> > > > > 
> > > > >  Any questions or suggestions don’t hesitate to ask!
> > > > > 
> > > > >  Cheers
> > > > >  Bolke
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
>


Re: Flow-based Airflow?

2017-02-06 Thread Dan Davydov
Woops looks like I replied to the wrong thread! Thanks Bolke.

On Mon, Feb 6, 2017 at 1:42 PM, Bolke de Bruin  wrote:

> Dataflow or 1.8?
>
> Sent from my iPhone
>
> > On 6 Feb 2017, at 22:35, Dan Davydov 
> wrote:
> >
> > We have been running in our staging and have found a couple of issues. I
> > will report back with them soon.
> >
> >> On Thu, Feb 2, 2017 at 2:23 PM, Jeremiah Lowin 
> wrote:
> >>
> >> Very good point -- however I'm hesitant to overcomplicate the base
> class.
> >> At the moment users only have to override "serialize()" and
> "deserialize()"
> >> to build any form of remote-backed dataflow, and I like the simplicity
> of
> >> that.
> >>
> >> However, if you look at my implementation of the GCSDataflow, the
> >> constructor gets passed serializer and deserializer functions that are
> >> applied to the data before storage and after recovery. I think that
> sort of
> >> runtime-configurable serialization is in the spirit of what you're
> >> describing and it should be straightforward to adapt it for more
> specific
> >> requirements.
> >>
> >> On Thu, Feb 2, 2017 at 12:37 PM Laura Lorenz 
> >> wrote:
> >>
> >>> This is great!
> >>>
> >>> We work with a lot of external data in wildly non-standard formats so
> >>> another enhancement here we'd use and support is passing customizable
> >>> serializers to Dataflow subclasses. This would let the dataflows
> keyword
> >>> arg for a task handle dependency management, the Dataflow class or
> >>> subclasses handle IO, and the Serializer subclasses handle parsing.
> >>>
> >>> Happy to contribute here, perhaps to create an S3Dataflow subclass in
> the
> >>> style of your Google Cloud storage one for this PR.
> >>>
> >>> Laura
> >>>
> >>> On Wed, Feb 1, 2017 at 6:14 PM, Jeremiah Lowin 
> >> wrote:
> >>>
> >>>> Great point. I think the best solution is to solve this for all XComs
> >> by
> >>>> checking object size before adding it to the DB. I don't see a built
> in
> >>> way
> >>>> of handling it (though apparently MySQL is internally limited to
> 64kb).
> >>>> I'll look into a PR that would enforce a similar limit for all
> >> databases.
> >>>>
> >>>> On Wed, Feb 1, 2017 at 4:52 PM Maxime Beauchemin <
> >>>> maximebeauche...@gmail.com>
> >>>> wrote:
> >>>>
> >>>> I'm not sure about XCom being the default, it seems pretty dangerous.
> >> It
> >>>> just takes one person that is not fully aware of the size of the data,
> >> or
> >>>> one day with an outlier and that could put the Airflow db in jeopardy.
> >>>>
> >>>> I guess it's always been an aspect of XCom, and it could be good to
> >> have
> >>>> some explicit gatekeeping there regardless of this PR/feature. Perhaps
> >>> the
> >>>> DB itself has protection against large blobs?
> >>>>
> >>>> Max
> >>>>
> >>>> On Wed, Feb 1, 2017 at 12:42 PM, Jeremiah Lowin 
> >>> wrote:
> >>>>
> >>>>> Yesterday I began converting a complex script to a DAG. It turned out
> >>> to
> >>>> be
> >>>>> a perfect test case for the dataflow model: a big chunk of data
> >> moving
> >>>>> through a series of modification steps.
> >>>>>
> >>>>> So I have built an extensible dataflow extension for Airflow on top
> >> of
> >>>> XCom
> >>>>> and the existing dependency engine:
> >>>>> https://issues.apache.org/jira/browse/AIRFLOW-825
> >>>>> https://github.com/apache/incubator-airflow/pull/2046 (still waiting
> >>> for
> >>>>> tests... it will be quite embarrassing if they don't pass)
> >>>>>
> >>>>> The philosophy is simple:
> >>>>> Dataflow objects represent the output of upstream tasks. Downstream
> >>> tasks
> >>>>> add Dataflows with a specific key. When the downstream task runs, the
> >>>>> (optionally indexed) upstream result is available in the downstream
> >>>> context
> >>>>> under context['dataflows'][key]. In addition, PythonOperators receive
> >>> the
> >>>>> data as a keyword argument.
> >>>>>
> >>>>> The basic Dataflow serializes the data through XComs, but is
> >> trivially
> >>>>> extended to alternative storage via subclasses. I have provided (in
> >>>>> contrib) implementations of a local filesystem-based Dataflow as well
> >>> as
> >>>> a
> >>>>> Google Cloud Storage dataflow.
> >>>>>
> >>>>> Laura, I hope you can have a look and see if this will bring some of
> >>> your
> >>>>> requirements in to Airflow as first-class citizens.
> >>>>>
> >>>>> Jeremiah
> >>>>>
> >>>>
> >>>
> >>
>


Re: Airflow 1.8.0 Release Candidate 1

2017-02-06 Thread Dan Davydov
Bolke, attached is the patch for the cgroups fix. Let me know which
branches you would like me to merge it to. If anyone has complaints about
the patch let me know (but it does not touch the core of airflow, only the
new cgroups task runner).

On Mon, Feb 6, 2017 at 4:24 PM, siddharth anand  wrote:

> Actually, I see the error is further down..
>
>   File
> "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py",
> line
> 469, in do_execute
>
> cursor.execute(statement, parameters)
>
> sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) null value in
> column "dag_id" violates not-null constraint
>
> DETAIL:  Failing row contains (null, running, 1, f).
>
>  [SQL: 'INSERT INTO dag_stats (state, count, dirty) VALUES (%(state)s,
> %(count)s, %(dirty)s)'] [parameters: {'count': 1L, 'state': u'running',
> 'dirty': False}]
>
> It looks like an autoincrement is missing for this table.
>
>
> I'm running `SQLAlchemy==1.1.4` - I see our setup.py specifies any version
> greater than 0.9.8
>
> -s
>
>
>
> On Mon, Feb 6, 2017 at 4:11 PM, siddharth anand  wrote:
>
> > I tried upgrading to 1.8.0rc1 from 1.7.1.3 via pip install
> > https://dist.apache.org/repos/dist/dev/incubator/airflow/
> > airflow-1.8.0rc1+apache.incubating.tar.gz and then running airflow
> > upgradedb didn't quite work. First, I thought it completed successfully,
> > then saw errors some tables were indeed missing. I ran it again and
> > encountered the following exception :
> >
> > DB: postgresql://app_coust...@db-cousteau.ep.stage.agari.com:
> 5432/airflow
> >
> > [2017-02-07 00:03:20,309] {db.py:284} INFO - Creating tables
> >
> > INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> >
> > INFO  [alembic.runtime.migration] Will assume transactional DDL.
> >
> > INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 ->
> > 211e584da130, add TI state index
> >
> > INFO  [alembic.runtime.migration] Running upgrade 211e584da130 ->
> > 64de9cddf6c9, add task fails journal table
> >
> > INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 ->
> > f2ca10b85618, add dag_stats table
> >
> > INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 ->
> > 4addfa1236f1, Add fractional seconds to mysql tables
> >
> > INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 ->
> > 8504051e801b, xcom dag task indices
> >
> > INFO  [alembic.runtime.migration] Running upgrade 8504051e801b ->
> > 5e7d17757c7a, add pid field to TaskInstance
> >
> > INFO  [alembic.runtime.migration] Running upgrade 5e7d17757c7a ->
> > 127d2bf2dfa7, Add dag_id/state index on dag_run table
> >
> > /usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/crud.py:692:
> > SAWarning: Column 'dag_stats.dag_id' is marked as a member of the primary
> > key for table 'dag_stats', but has no Python-side or server-side default
> > generator indicated, nor does it indicate 'autoincrement=True' or
> > 'nullable=True', and no explicit value is passed.  Primary key columns
> > typically may not store NULL. Note that as of SQLAlchemy 1.1,
> > 'autoincrement=True' must be indicated explicitly for composite (e.g.
> > multicolumn) primary keys if AUTO_INCREMENT/SERIAL/IDENTITY behavior is
> > expected for one of the columns in the primary key. CREATE TABLE
> statements
> > are impacted by this change as well on most backends.
> >
>


Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc3

2017-02-10 Thread Dan Davydov
Our staging looks good, all the DAGs there pass.
+1 (binding)

On Fri, Feb 10, 2017 at 10:21 AM, Chris Riccomini 
wrote:

> Running in all environments. Will vote after the weekend to make sure
> things are working properly, but so far so good.
>
> On Fri, Feb 10, 2017 at 6:05 AM, Bolke de Bruin  wrote:
>
> > Dear All,
> >
> > Let’s try again!
> >
> > I have made the THIRD RELEASE CANDIDATE of Airflow 1.8.0 available at:
> > https://dist.apache.org/repos/dist/dev/incubator/airflow/ <
> > https://dist.apache.org/repos/dist/dev/incubator/airflow/> , public keys
> > are available at https://dist.apache.org/repos/dist/release/incubator/
> > airflow/ 
> > . It is tagged with a local version “apache.incubating” so it allows
> > upgrading from earlier releases.
> >
> > Two issues have been fixed since release candidate 2:
> >
> > * trigger_dag could create dags with fractional seconds, not supported by
> > logging and UI at the moment
> > * local api client trigger_dag had hardcoded execution of None
> >
> > Known issue:
> > * Airflow on kubernetes and num_runs -1 (default) can expose import
> issues.
> >
> > I have extensively discussed this with Alex (reporter) and we consider
> > this a known issue with a workaround available as we are unable to
> > replicate this in a different environment. UPDATING.md has been updated
> > with the work around.
> >
> > As these issues are confined to a very specific area and full unit tests
> > were added I would also like to raise a VOTE for releasing 1.8.0 based on
> > release candidate 3, i.e. just renaming release candidate 3 to 1.8.0
> > release.
> >
> > Please respond to this email by:
> >
> > +1,0,-1 with *binding* if you are a PMC member or *non-binding* if you
> are
> > not.
> >
> > Thanks!
> > Bolke
> >
> > My VOTE: +1 (binding)
>


Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc3

2017-02-13 Thread Dan Davydov
 > > >>>> Boris, I submitted a PR to address your second point --
> > > > > > >>>> https://github.com/apache/incubator-airflow/pull/2068.
> > Thanks!
> > > > > > >>>>
> > > > > > >>>> On Sat, Feb 11, 2017 at 10:42 AM Boris Tyukin <
> > > > > bo...@boristyukin.com>
> > > > > > >>>> wrote:
> > > > > > >>>>
> > > > > > >>>>> I am running LocalExecutor and not doing crazy things but
> use
> > > DAG
> > > > > > >>>>> generation heavily - everything runs fine as before. As I
> > > > mentioned
> > > > > > in
> > > > > > >>>>> other threads only had a few issues:
> > > > > > >>>>>
> > > > > > >>>>> 1) had to upgrade MySQL which was a PAIN. Cloudera CDH is
> > > running
> > > > > old
> > > > > > >>>>> version of MySQL which was compatible with 1.7.1 but not
> > > > compatible
> > > > > > now
> > > > > > >>>>> with 1.8 because of fractional seconds support PR.
> > > > > > >>>>>
> > > > > > >>>>> 2) when you install airflow, there are two new example DAGs
> > > > > > >>>>> (last_task_only) which are going back very far in the past
> > and
> > > > > > >>> scheduled
> > > > > > >>>> to
> > > > > > >>>>> run every hour - a bunch of dags triggered on the first
> start
> > > of
> > > > > > >>>> scheduler
> > > > > > >>>>> and hosed my CPU
> > > > > > >>>>>
> > > > > > >>>>> Everything else was fine and I LOVE lots of small UI
> changes,
> > > > which
> > > > > > >>>> reduced
> > > > > > >>>>> a lot my use of cli.
> > > > > > >>>>>
> > > > > > >>>>> Thanks again for the amazing work and an awesome project!
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> On Sat, Feb 11, 2017 at 9:17 AM, Jeremiah Lowin <
> > > > jlo...@apache.org
> > > > > >
> > > > > > >>>> wrote:
> > > > > > >>>>>
> > > > > > >>>>>> I was able to deploy successfully. +1 (binding)
> > > > > > >>>>>>
> > > > > > >>>>>> On Fri, Feb 10, 2017 at 7:37 PM Maxime Beauchemin <
> > > > > > >>>>>> maximebeauche...@gmail.com> wrote:
> > > > > > >>>>>>
> > > > > > >>>>>>> +1 (binding)
> > > > > > >>>>>>>
> > > > > > >>>>>>> On Fri, Feb 10, 2017 at 3:44 PM, Arthur Wiedmer <
> > > > > > >>>>>> arthur.wied...@gmail.com>
> > > > > > >>>>>>> wrote:
> > > > > > >>>>>>>
> > > > > > >>>>>>>> +1 (binding)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> On Feb 10, 2017 3:13 PM, "Dan Davydov" <
> > > > dan.davy...@airbnb.com.
> > > > > > >>>>>> invalid>
> > > > > > >>>>>>>> wrote:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>> Our staging looks good, all the DAGs there pass.
> > > > > > >>>>>>>>> +1 (binding)
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> On Fri, Feb 10, 2017 at 10:21 AM, Chris Riccomini <
> > > > > > >>>>>>> criccom...@apache.org
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> wrote:
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>> Running in all environments. Will vote after the
> weekend
> > > to
> > > > > > >>>> make
> > > > > > >>>>>> sure
> > > > > > >>>>>>>>>> things are working properly, but so far so good.
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> On Fri, Feb 10, 2017 at 6:05 AM, Bolke de Bruin <
> > > > > > >>>>> bdbr...@gmail.com
> > > > > > >>>>>>>
> > > > > > >>>>>>>>> wrote:
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>>> Dear All,
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> Let’s try again!
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> I have made the THIRD RELEASE CANDIDATE of Airflow
> > 1.8.0
> > > > > > >>>>>> available
> > > > > > >>>>>>>> at:
> > > > > > >>>>>>>>>>>
> > > https://dist.apache.org/repos/dist/dev/incubator/airflow/
> > > > > > >>> <
> > > > > > >>>>>>>>>>>
> > > https://dist.apache.org/repos/dist/dev/incubator/airflow/>
> > > > > > >>> ,
> > > > > > >>>>>>> public
> > > > > > >>>>>>>>> keys
> > > > > > >>>>>>>>>>> are available at https://dist.apache.org/repos/
> > > > > > >>>>>>>> dist/release/incubator/
> > > > > > >>>>>>>>>>> airflow/ <
> > > > > > >>>>> https://dist.apache.org/repos/dist/release/incubator/
> > > > > > >>>>>>>>> airflow/>
> > > > > > >>>>>>>>>>> . It is tagged with a local version
> “apache.incubating”
> > > so
> > > > > > >>> it
> > > > > > >>>>>>> allows
> > > > > > >>>>>>>>>>> upgrading from earlier releases.
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> Two issues have been fixed since release candidate 2:
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> * trigger_dag could create dags with fractional
> > seconds,
> > > > > > >>> not
> > > > > > >>>>>>>> supported
> > > > > > >>>>>>>>> by
> > > > > > >>>>>>>>>>> logging and UI at the moment
> > > > > > >>>>>>>>>>> * local api client trigger_dag had hardcoded
> execution
> > of
> > > > > > >>>> None
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> Known issue:
> > > > > > >>>>>>>>>>> * Airflow on kubernetes and num_runs -1 (default) can
> > > > > > >>> expose
> > > > > > >>>>>> import
> > > > > > >>>>>>>>>> issues.
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> I have extensively discussed this with Alex
> (reporter)
> > > and
> > > > > > >>> we
> > > > > > >>>>>>>> consider
> > > > > > >>>>>>>>>>> this a known issue with a workaround available as we
> > are
> > > > > > >>>> unable
> > > > > > >>>>>> to
> > > > > > >>>>>>>>>>> replicate this in a different environment.
> UPDATING.md
> > > has
> > > > > > >>>> been
> > > > > > >>>>>>>> updated
> > > > > > >>>>>>>>>>> with the work around.
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> As these issues are confined to a very specific area
> > and
> > > > > > >>> full
> > > > > > >>>>>> unit
> > > > > > >>>>>>>>> tests
> > > > > > >>>>>>>>>>> were added I would also like to raise a VOTE for
> > > releasing
> > > > > > >>>>> 1.8.0
> > > > > > >>>>>>>> based
> > > > > > >>>>>>>>> on
> > > > > > >>>>>>>>>>> release candidate 3, i.e. just renaming release
> > > candidate 3
> > > > > > >>>> to
> > > > > > >>>>>>> 1.8.0
> > > > > > >>>>>>>>>>> release.
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> Please respond to this email by:
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> +1,0,-1 with *binding* if you are a PMC member or
> > > > > > >>>> *non-binding*
> > > > > > >>>>>> if
> > > > > > >>>>>>>> you
> > > > > > >>>>>>>>>> are
> > > > > > >>>>>>>>>>> not.
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> Thanks!
> > > > > > >>>>>>>>>>> Bolke
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> My VOTE: +1 (binding)
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>>
> > > > > > >>>
> > > > > > >> --
> > > > > > >> _/
> > > > > > >> _/ Alex Van Boxel
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > --
> >   _/
> > _/ Alex Van Boxel
> >
>


Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc3

2017-02-13 Thread Dan Davydov
I feel like there might be enough reliance on these features to merge these
in, e.g. mark-successing a non-existent task to prevent it from running.
I'm curious what others think. Also isn't mark success still needed for
when you add a new task with depends_on_past to an existing dag or is that
fixed as well?

On Mon, Feb 13, 2017 at 12:25 PM, Bolke de Bruin  wrote:

> A little bit more background on the issue. Mark success sits in views.py
> as “def success”. The code should mark a task “successful”, with optional
> upstream and downstream tasks as well. Even for tasks in the future (up
> until datetime.now() ) and past. It was often used to kick off the first of
> dag run for when “depends_on_past" was used. As of 1.8.0 this is not
> required anymore. The code is complex, lacks testing and more importantly
> it is outdated: it creates tasks on its own without dag runs, and is not
> aware of the “NONE” state. Next to that it is buggy (upstream/downstream do
> the same currently ie. only downstream). Hence, in my opinion it requires
> refactoring which I am doing at the moment.
>
> Two small fixes could be included in the release, but they don’t solve the
> root cause.
>
> * https://github.com/apache/incubator-airflow/pull/2075 <
> https://github.com/apache/incubator-airflow/pull/2075>
> * https://github.com/apache/incubator-airflow/pull/2074 <
> https://github.com/apache/incubator-airflow/pull/2074>
>
> I suggest fixing this in 1.8.1 properly. Chris :) volunteered to do 1.8.1
> soon after 1.8.0
>
> Any thoughts?
>
> Bolke
>
> > On 13 Feb 2017, at 20:59, Bolke de Bruin  wrote:
> >
> > https://github.com/apache/incubator-airflow/pull/2075 <
> https://github.com/apache/incubator-airflow/pull/2075>
> >
> > Is (part of) the fix. I can include it retroactively if needed, but I
> don’t consider it blocking.
> >
> > Bolke
> >
> >
> >> On 13 Feb 2017, at 20:56, Dan Davydov  <mailto:dan.davy...@airbnb.com.INVALID>> wrote:
> >>
> >> Can you give more details/a repro case Sid? FWIW mark success and clear
> >> both work for me.
> >>
> >> On Mon, Feb 13, 2017 at 11:51 AM, siddharth anand  <mailto:san...@apache.org>> wrote:
> >>
> >>> Folks!
> >>> I need to change my vote.. -1 (Binding).
> >>>
> >>>
> >>> Mark Success/Clear is broken in the UI. It's a regression.
> >>>
> >>> -s
> >>>
> >>> On Mon, Feb 13, 2017 at 10:53 AM, Alex Van Boxel  <mailto:a...@vanboxel.be>> wrote:
> >>>
> >>>> +1 (binding)
> >>>>
> >>>> On Mon, Feb 13, 2017 at 7:45 PM siddharth anand  <mailto:san...@apache.org>>
> >>> wrote:
> >>>>
> >>>>> +1 (binding)
> >>>>>
> >>>>> On Mon, Feb 13, 2017 at 8:57 AM, Chris Riccomini <
> >>> criccom...@apache.org <mailto:criccom...@apache.org>>
> >>>>> wrote:
> >>>>>
> >>>>>> +1 (binding)
> >>>>>>
> >>>>>> On Sun, Feb 12, 2017 at 8:54 AM, Jeremiah Lowin  <mailto:jlo...@apache.org>>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Interesting -- I also run on Kubernetes with a git-sync sidecar,
> >>> but
> >>>>> the
> >>>>>>> containers wait for the synced repo to apprar before starting since
> >>>> it
> >>>>>>> contains some dependencies -- I assume that's why I didn't
> >>> experience
> >>>>> the
> >>>>>>> same issue.
> >>>>>>>
> >>>>>>> On Sun, Feb 12, 2017 at 6:29 AM Bolke de Bruin  <mailto:bdbr...@gmail.com>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Although the race condition doesn't explain why “num_runs = None”
> >>>>>>> resolved
> >>>>>>>> the issue for you earlier, but it does give a clue now: the PR
> >>> that
> >>>>>>>> introduced “num_runs = -1” was there to be able to work with
> >>> empty
> >>>>> dag
> >>>>>>>> dirs, maybe it wasn’t fully covered yet.
> >>>>>>>>
> >>>>>>>> Bolke
> >>>>>>>>
> >>>>>>>>> On 12 Feb 2017, at 12:26, Bolke de Bruin  <mailto:bdbr...@gmail.com>

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-17 Thread Dan Davydov
+1 (binding). Mark success works great now, thanks to Bolke for fixing.

On Fri, Feb 17, 2017 at 12:22 AM, Bolke de Bruin  wrote:

> Dear All,
>
> I have made the FOURTH RELEASE CANDIDATE of Airflow 1.8.0 available at:
> https://dist.apache.org/repos/dist/dev/incubator/airflow/ <
> https://dist.apache.org/repos/dist/dev/incubator/airflow/> , public keys
> are available at https://dist.apache.org/repos/
> dist/release/incubator/airflow/  /dist/release/incubator/airflow/> . It is tagged with a local version
> “apache.incubating” so it allows upgrading from earlier releases.
>
> One issues have been fixed since release candidate 3:
>
> * mark success was not working properly
>
> No known issues anymore.
>
> I would also like to raise a VOTE for releasing 1.8.0 based on release
> candidate 4, i.e. just renaming release candidate 4 to 1.8.0 release.
>
> Please respond to this email by:
>
> +1,0,-1 with *binding* if you are a PMC member or *non-binding* if you are
> not.
>
> Thanks!
> Bolke
>
> My VOTE: +1 (binding)


Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-22 Thread Dan Davydov
I rolled this out in our prod and the webservers failed to load due to this
commit:

 [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag
7c94d81c390881643f94d5e3d7d6fb351a445b72

This fixed it:
- 
+ 

This is caused by assuming that all DAGs have start dates set, so a broken
DAG will take down the whole UI. Not sure if we want to make this a blocker
for the release or not, I'm guessing for most deployments this would occur
pretty rarely. I'll submit a PR to fix it soon.



On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini 
wrote:

> Ack that the vote has already passed, but belated +1 (binding)
>
> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin  wrote:
>
> > IPMC Voting can be found here:
> >
> > http://mail-archives.apache.org/mod_mbox/incubator-general/201702.mbox/%
> > 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com%3e <
> > http://mail-archives.apache.org/mod_mbox/incubator-general/201702.mbox/%
> > 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com%3E>
> >
> > Kind regards,
> > Bolke
> >
> > > On 21 Feb 2017, at 08:20, Bolke de Bruin  wrote:
> > >
> > > Hello,
> > >
> > > Apache Airflow (incubating) 1.8.0 (based on RC4) has been accepted.
> > >
> > > 9 “+1” votes received:
> > >
> > > - Maxime Beauchemin (binding)
> > > - Arthur Wiedmer (binding)
> > > - Dan Davydov (binding)
> > > - Jeremiah Lowin (binding)
> > > - Siddharth Anand (binding)
> > > - Alex van Boxel (binding)
> > > - Bolke de Bruin (binding)
> > >
> > > - Jayesh Senjaliya (non-binding)
> > > - Yi (non-binding)
> > >
> > > Vote thread (start):
> > > http://mail-archives.apache.org/mod_mbox/incubator-
> > airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188-
> > 6c92c31a2...@gmail.com%3e <http://mail-archives.apache.
> > org/mod_mbox/incubator-airflow-dev/201702.mbox/%
> 3C7EB7B6D6-092E-48D2-AA0F-
> > 15f44376a...@gmail.com%3E>
> > >
> > > Next steps:
> > > 1) will start the voting process at the IPMC mailinglist. I do expect
> > some changes to be required mostly in documentation maybe a license here
> > and there. So, we might end up with changes to stable. As long as these
> are
> > not (significant) code changes I will not re-raise the vote.
> > > 2) Only after the positive voting on the IPMC and finalisation I will
> > rebrand the RC to Release.
> > > 3) I will upload it to the incubator release page, then the tar ball
> > needs to propagate to the mirrors.
> > > 4) Update the website (can someone volunteer please?)
> > > 5) Finally, I will ask Maxime to upload it to pypi. It seems we can
> keep
> > the apache branding as lib cloud is doing this as well (
> > https://libcloud.apache.org/downloads.html#pypi-package <
> > https://libcloud.apache.org/downloads.html#pypi-package>).
> > >
> > > Jippie!
> > >
> > > Bolke
> >
> >
>


Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-22 Thread Dan Davydov
Should clarify this occurs when a dagrun does not have a start date, not a
dag (which makes it even less likely to happen). I don't think this is a
blocker for releasing.

On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov  wrote:

> I rolled this out in our prod and the webservers failed to load due to
> this commit:
>
>  [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag
> 7c94d81c390881643f94d5e3d7d6fb351a445b72
>
> This fixed it:
> -  class="glyphicon glyphicon-info-sign" aria-hidden="true" title="Start Date:
> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}">
> +  class="glyphicon glyphicon-info-sign" aria-hidden="true">
>
> This is caused by assuming that all DAGs have start dates set, so a broken
> DAG will take down the whole UI. Not sure if we want to make this a blocker
> for the release or not, I'm guessing for most deployments this would occur
> pretty rarely. I'll submit a PR to fix it soon.
>
>
>
> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini 
> wrote:
>
>> Ack that the vote has already passed, but belated +1 (binding)
>>
>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin 
>> wrote:
>>
>> > IPMC Voting can be found here:
>> >
>> > http://mail-archives.apache.org/mod_mbox/incubator-general/
>> 201702.mbox/%
>> > 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com%3e <
>> > http://mail-archives.apache.org/mod_mbox/incubator-general/
>> 201702.mbox/%
>> > 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com%3E>
>> >
>> > Kind regards,
>> > Bolke
>> >
>> > > On 21 Feb 2017, at 08:20, Bolke de Bruin  wrote:
>> > >
>> > > Hello,
>> > >
>> > > Apache Airflow (incubating) 1.8.0 (based on RC4) has been accepted.
>> > >
>> > > 9 “+1” votes received:
>> > >
>> > > - Maxime Beauchemin (binding)
>> > > - Arthur Wiedmer (binding)
>> > > - Dan Davydov (binding)
>> > > - Jeremiah Lowin (binding)
>> > > - Siddharth Anand (binding)
>> > > - Alex van Boxel (binding)
>> > > - Bolke de Bruin (binding)
>> > >
>> > > - Jayesh Senjaliya (non-binding)
>> > > - Yi (non-binding)
>> > >
>> > > Vote thread (start):
>> > > http://mail-archives.apache.org/mod_mbox/incubator-
>> > airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188-
>> > 6c92c31a2...@gmail.com%3e <http://mail-archives.apache.
>> > org/mod_mbox/incubator-airflow-dev/201702.mbox/%3C7EB7B6D6-
>> 092E-48D2-AA0F-
>> > 15f44376a...@gmail.com%3E>
>> > >
>> > > Next steps:
>> > > 1) will start the voting process at the IPMC mailinglist. I do expect
>> > some changes to be required mostly in documentation maybe a license here
>> > and there. So, we might end up with changes to stable. As long as these
>> are
>> > not (significant) code changes I will not re-raise the vote.
>> > > 2) Only after the positive voting on the IPMC and finalisation I will
>> > rebrand the RC to Release.
>> > > 3) I will upload it to the incubator release page, then the tar ball
>> > needs to propagate to the mirrors.
>> > > 4) Update the website (can someone volunteer please?)
>> > > 5) Finally, I will ask Maxime to upload it to pypi. It seems we can
>> keep
>> > the apache branding as lib cloud is doing this as well (
>> > https://libcloud.apache.org/downloads.html#pypi-package <
>> > https://libcloud.apache.org/downloads.html#pypi-package>).
>> > >
>> > > Jippie!
>> > >
>> > > Bolke
>> >
>> >
>>
>
>


Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-22 Thread Dan Davydov
Bumping the thread so another user can comment.

On Wed, Feb 22, 2017 at 3:12 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> What I meant to ask is "how much engineering effort it takes to bake a
> single RC?", I guess it depends on how much git-fu is necessary plus some
> overhead cost of doing the series of actions/commands/emails/jira.
>
> I can volunteer for 1.8.1 (hopefully I can get do it along another Airbnb
> engineer/volunteer to tag along) and will try to document/automate
> everything I can as I go through the process. The goal of 1.8.1 could be to
> basically package 1.8.0 + Dan's bugfix, and for Airbnb to get familiar with
> the process.
>
> It'd be great if you can dump your whole process on the wiki, and we'll
> improve it on this next pass.
>
> Thanks again for the mountain of work that went into packaging this
> release.
>
> Max
>
> On Wed, Feb 22, 2017 at 2:44 PM, Bolke de Bruin  wrote:
>
> > I thought you volunteered to baby sit 1.8.1 Chris ;-)?
> >
> > Sent from my iPhone
> >
> > > On 22 Feb 2017, at 23:31, Chris Riccomini 
> wrote:
> > >
> > > I'm +1 for doing a 1.8.1 fast follow-on
> > >
> > > On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <
> > > maximebeauche...@gmail.com> wrote:
> > >
> > >> Our database may have edge cases that could be associated with running
> > any
> > >> previous version that may or may not have been part of an official
> > release.
> > >>
> > >> Let's see if anyone else reports the issue. If no one does, one option
> > is
> > >> to release 1.8.0 as is with a comment in the release notes, and have a
> > >> future official minor apache release 1.8.1 that would fix these minor
> > >> issues that are not deal breaker.
> > >>
> > >> @bolke, I'm curious, how long does it take you to go through one
> release
> > >> cycle? Oh, and do you have a documented step by step process for
> > releasing?
> > >> I'd like to add the Pypi part to this doc and add committers that are
> > >> interested to have rights on the project on Pypi.
> > >>
> > >> Max
> > >>
> > >>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin 
> > wrote:
> > >>>
> > >>> So it is a database integrity issue? Afaik a start_date should always
> > be
> > >>> set for a DagRun (create_dagrun) does so  I didn't check the code
> > though.
> > >>>
> > >>> Sent from my iPhone
> > >>>
> > >>>> On 22 Feb 2017, at 22:19, Dan Davydov  > INVALID>
> > >>> wrote:
> > >>>>
> > >>>> Should clarify this occurs when a dagrun does not have a start date,
> > >> not
> > >>> a
> > >>>> dag (which makes it even less likely to happen). I don't think this
> is
> > >> a
> > >>>> blocker for releasing.
> > >>>>
> > >>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov <
> dan.davy...@airbnb.com
> > >
> > >>> wrote:
> > >>>>>
> > >>>>> I rolled this out in our prod and the webservers failed to load due
> > to
> > >>>>> this commit:
> > >>>>>
> > >>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag
> > >>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72
> > >>>>>
> > >>>>> This fixed it:
> > >>>>> -  > >>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true"
> title="Start
> > >>> Date:
> > >>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}">
> > >>>>> +  > >>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true">
> > >>>>>
> > >>>>> This is caused by assuming that all DAGs have start dates set, so a
> > >>> broken
> > >>>>> DAG will take down the whole UI. Not sure if we want to make this a
> > >>> blocker
> > >>>>> for the release or not, I'm guessing for most deployments this
> would
> > >>> occur
> > >>>>> pretty rarely. I'll submit a PR to fix it soon.
> >

Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-23 Thread Dan Davydov
Some more issues found by our users in addition to the one Alex reported
and the UI issue when a dagrun doesn't have a start date:
1. If a task fails it fails the whole dagrun immediately fails, this is a
very large change to how control flow works as the rest of the tasks in the
DAG are not run (even e.g. leaf tasks). The same is true of the skipped
status (if a leaf task is skipped then the root task for the DAG will get
skipped and none of the other tasks in the DAG will run).
2. The black squares in the UI for tasks that aren't ready to run yet are
confusing and make it hard for users to see which tasks haven't run yet
(lower contrast). We should never initialize tasks in the DB that do not
have a state (or at the least these should be white).
3. The Dagrun has a get_task_instance method that will fail if a dagrun
doesn't have a copy of a task instance created which we have seen happen
for some DAGs. This prevents those tasks from getting scheduled.

I already patched 3 (and have a PR in flight for open source), and am
working on a patch for 1 internally. 1 should be a blocker for releasing.

On Wed, Feb 22, 2017 at 4:38 PM, Alex Guziel  wrote:

> I have some concern that this change
> https://github.com/apache/incubator-airflow/pull/1939
> [AIRFLOW-679] may be having issues because we are seeing lots of double
> triggers
> of tasks and tasks being killed as a result.
>
>
>
>
>
> On Wed, Feb 22, 2017 4:35 PM, Dan Davydov dan.davy...@airbnb.com.INVALID
> wrote:
> Bumping the thread so another user can comment.
>
>
>
>
> On Wed, Feb 22, 2017 at 3:12 PM, Maxime Beauchemin <
>
> maximebeauche...@gmail.com> wrote:
>
>
>
>
> > What I meant to ask is "how much engineering effort it takes to bake a
>
> > single RC?", I guess it depends on how much git-fu is necessary plus some
>
> > overhead cost of doing the series of actions/commands/emails/jira.
>
> >
>
> > I can volunteer for 1.8.1 (hopefully I can get do it along another Airbnb
>
> > engineer/volunteer to tag along) and will try to document/automate
>
> > everything I can as I go through the process. The goal of 1.8.1 could be
> to
>
> > basically package 1.8.0 + Dan's bugfix, and for Airbnb to get familiar
> with
>
> > the process.
>
> >
>
> > It'd be great if you can dump your whole process on the wiki, and we'll
>
> > improve it on this next pass.
>
> >
>
> > Thanks again for the mountain of work that went into packaging this
>
> > release.
>
> >
>
> > Max
>
> >
>
> > On Wed, Feb 22, 2017 at 2:44 PM, Bolke de Bruin 
> wrote:
>
> >
>
> > > I thought you volunteered to baby sit 1.8.1 Chris ;-)?
>
> > >
>
> > > Sent from my iPhone
>
> > >
>
> > > > On 22 Feb 2017, at 23:31, Chris Riccomini 
>
> > wrote:
>
> > > >
>
> > > > I'm +1 for doing a 1.8.1 fast follow-on
>
> > > >
>
> > > > On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <
>
> > > > maximebeauche...@gmail.com> wrote:
>
> > > >
>
> > > >> Our database may have edge cases that could be associated with
> running
>
> > > any
>
> > > >> previous version that may or may not have been part of an official
>
> > > release.
>
> > > >>
>
> > > >> Let's see if anyone else reports the issue. If no one does, one
> option
>
> > > is
>
> > > >> to release 1.8.0 as is with a comment in the release notes, and
> have a
>
> > > >> future official minor apache release 1.8.1 that would fix these
> minor
>
> > > >> issues that are not deal breaker.
>
> > > >>
>
> > > >> @bolke, I'm curious, how long does it take you to go through one
>
> > release
>
> > > >> cycle? Oh, and do you have a documented step by step process for
>
> > > releasing?
>
> > > >> I'd like to add the Pypi part to this doc and add committers that
> are
>
> > > >> interested to have rights on the project on Pypi.
>
> > > >>
>
> > > >> Max
>
> > > >>
>
> > > >>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin  >
>
> > > wrote:
>
> > > >>>
>
> > > >>> So it is a database integrity issue? Afaik a start_date should
> always
>
> > > be
>
> > > >>> set for a DagRun (create_dagrun) does so I didn't check the 

Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-23 Thread Dan Davydov
Here is an example for 1, you can see that there are some white tasks that
should have been run. I don't have time to create a skeleton DAG at the
moment unfortunately because of release-related firefighting. Will
hopefully post back here later once firefighting is done.
[image: Inline image 1]

On Thu, Feb 23, 2017 at 12:00 PM, Bolke de Bruin  wrote:

> Hey Dan, Alex,
>
> Indeed #1 seems serious, specifically the the second part - skipping the
> root task (root task of the whole DAG?). Do you have a skeleton DAG that
> exposes the issue? Is there a root cause analysis? When was the issue
> introduced? On the the issue Alex mentioned, we don’t see that and I cannot
> really align the description of the issue with the PR yet, ie. I need
> clarification.
>
> Obviously, I’m not very happy if we indeed need to retract the release as
> we are ~12 hours away from closing of the vote at the IPMC mailinglist
> (strangely enough no one has voted yet). However, if it is that serious
> that it cannot wait for 1.8.1 then we need to do it. I would define
> “serious” as many people are going to be affected by it and they will not
> have a workaround available to them (ie. patching code or database), but
> the opinion of the community might differ.
>
> Cheers
> Bolke
>
> P.S. I am also interested in #3, as it sounds like a integrity issue
> (which verify_integrity should catch) but also maybe too strong a
> assumption that such a task should exist (ie. a task was added to a Dag in
> a later stage).
>
>
> > On 23 Feb 2017, at 20:15, Dan Davydov 
> wrote:
> >
> > Some more issues found by our users in addition to the one Alex reported
> > and the UI issue when a dagrun doesn't have a start date:
> > 1. If a task fails it fails the whole dagrun immediately fails, this is a
> > very large change to how control flow works as the rest of the tasks in
> the
> > DAG are not run (even e.g. leaf tasks). The same is true of the skipped
> > status (if a leaf task is skipped then the root task for the DAG will get
> > skipped and none of the other tasks in the DAG will run).
> > 2. The black squares in the UI for tasks that aren't ready to run yet are
> > confusing and make it hard for users to see which tasks haven't run yet
> > (lower contrast). We should never initialize tasks in the DB that do not
> > have a state (or at the least these should be white).
> > 3. The Dagrun has a get_task_instance method that will fail if a dagrun
> > doesn't have a copy of a task instance created which we have seen happen
> > for some DAGs. This prevents those tasks from getting scheduled.
> >
> > I already patched 3 (and have a PR in flight for open source), and am
> > working on a patch for 1 internally. 1 should be a blocker for releasing.
> >
> > On Wed, Feb 22, 2017 at 4:38 PM, Alex Guziel  invalid
> >> wrote:
> >
> >> I have some concern that this change
> >> https://github.com/apache/incubator-airflow/pull/1939
> >> [AIRFLOW-679] may be having issues because we are seeing lots of double
> >> triggers
> >> of tasks and tasks being killed as a result.
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Feb 22, 2017 4:35 PM, Dan Davydov dan.davy...@airbnb.com.INVALID
> >> wrote:
> >> Bumping the thread so another user can comment.
> >>
> >>
> >>
> >>
> >> On Wed, Feb 22, 2017 at 3:12 PM, Maxime Beauchemin <
> >>
> >> maximebeauche...@gmail.com> wrote:
> >>
> >>
> >>
> >>
> >>> What I meant to ask is "how much engineering effort it takes to bake a
> >>
> >>> single RC?", I guess it depends on how much git-fu is necessary plus
> some
> >>
> >>> overhead cost of doing the series of actions/commands/emails/jira.
> >>
> >>>
> >>
> >>> I can volunteer for 1.8.1 (hopefully I can get do it along another
> Airbnb
> >>
> >>> engineer/volunteer to tag along) and will try to document/automate
> >>
> >>> everything I can as I go through the process. The goal of 1.8.1 could
> be
> >> to
> >>
> >>> basically package 1.8.0 + Dan's bugfix, and for Airbnb to get familiar
> >> with
> >>
> >>> the process.
> >>
> >>>
> >>
> >>> It'd be great if you can dump your whole process on the wiki, and we'll
> >>
> >>> improve it on this next pass.
> >>
> >>>
> >>
> >>> Thanks a

Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-23 Thread Dan Davydov
Here is the DAG: http://imgur.com/a/zXXsS

On Thu, Feb 23, 2017 at 12:18 PM, Arthur Wiedmer 
wrote:

> Dan,
>
> Inline images get stripped by the mailing server. You will have to upload
> to imgur or something.
>
> Best
> Arthur
>
> On Feb 23, 2017 12:13 PM, "Dan Davydov" 
> wrote:
>
> > Here is an example for 1, you can see that there are some white tasks
> that
> > should have been run. I don't have time to create a skeleton DAG at the
> > moment unfortunately because of release-related firefighting. Will
> > hopefully post back here later once firefighting is done.
> > [image: Inline image 1]
> >
> > On Thu, Feb 23, 2017 at 12:00 PM, Bolke de Bruin 
> > wrote:
> >
> >> Hey Dan, Alex,
> >>
> >> Indeed #1 seems serious, specifically the the second part - skipping the
> >> root task (root task of the whole DAG?). Do you have a skeleton DAG that
> >> exposes the issue? Is there a root cause analysis? When was the issue
> >> introduced? On the the issue Alex mentioned, we don’t see that and I
> cannot
> >> really align the description of the issue with the PR yet, ie. I need
> >> clarification.
> >>
> >> Obviously, I’m not very happy if we indeed need to retract the release
> as
> >> we are ~12 hours away from closing of the vote at the IPMC mailinglist
> >> (strangely enough no one has voted yet). However, if it is that serious
> >> that it cannot wait for 1.8.1 then we need to do it. I would define
> >> “serious” as many people are going to be affected by it and they will
> not
> >> have a workaround available to them (ie. patching code or database), but
> >> the opinion of the community might differ.
> >>
> >> Cheers
> >> Bolke
> >>
> >> P.S. I am also interested in #3, as it sounds like a integrity issue
> >> (which verify_integrity should catch) but also maybe too strong a
> >> assumption that such a task should exist (ie. a task was added to a Dag
> in
> >> a later stage).
> >>
> >>
> >> > On 23 Feb 2017, at 20:15, Dan Davydov  INVALID>
> >> wrote:
> >> >
> >> > Some more issues found by our users in addition to the one Alex
> reported
> >> > and the UI issue when a dagrun doesn't have a start date:
> >> > 1. If a task fails it fails the whole dagrun immediately fails, this
> is
> >> a
> >> > very large change to how control flow works as the rest of the tasks
> in
> >> the
> >> > DAG are not run (even e.g. leaf tasks). The same is true of the
> skipped
> >> > status (if a leaf task is skipped then the root task for the DAG will
> >> get
> >> > skipped and none of the other tasks in the DAG will run).
> >> > 2. The black squares in the UI for tasks that aren't ready to run yet
> >> are
> >> > confusing and make it hard for users to see which tasks haven't run
> yet
> >> > (lower contrast). We should never initialize tasks in the DB that do
> not
> >> > have a state (or at the least these should be white).
> >> > 3. The Dagrun has a get_task_instance method that will fail if a
> dagrun
> >> > doesn't have a copy of a task instance created which we have seen
> happen
> >> > for some DAGs. This prevents those tasks from getting scheduled.
> >> >
> >> > I already patched 3 (and have a PR in flight for open source), and am
> >> > working on a patch for 1 internally. 1 should be a blocker for
> >> releasing.
> >> >
> >> > On Wed, Feb 22, 2017 at 4:38 PM, Alex Guziel  >> .invalid
> >> >> wrote:
> >> >
> >> >> I have some concern that this change
> >> >> https://github.com/apache/incubator-airflow/pull/1939
> >> >> [AIRFLOW-679] may be having issues because we are seeing lots of
> double
> >> >> triggers
> >> >> of tasks and tasks being killed as a result.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Feb 22, 2017 4:35 PM, Dan Davydov
> >> dan.davy...@airbnb.com.INVALID
> >> >> wrote:
> >> >> Bumping the thread so another user can comment.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Feb 22, 2017 at 3:12 PM, Maxime Beauchemin <
> >> >>
> >> >>

Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-23 Thread Dan Davydov
To expand on Max's point it doesn't concern me that this is a blocker for
AirBnB, but it's not logical behavior and I'm sure many companies rely on
the previous behavior (which I would say is the logically correct one). We
are already running a fork of the release internally so we are unaffected,
I'm more concerned about:
a) Airflow 1.8.0 having a huge issue/regression in behavior that causes a
lot of companies to revert or patch after upgrading.
b) An illogical change being made in Airflow that makes the behavior
non-intuitive.

Here are my PRs to fix the various issues (we might as well merge all of
them in the next RC if we have one):
Here is the fix for the dagruns ending prematurely: https://github.
com/apache/incubator-airflow/pull/2099

Here is the fix for dagruns in a bad state crashing the UI (not a blocker
but might as well include it in the next RC if we create one):
https://github.com/apache/incubator-airflow/pull/2094

Black Squares in UI: No fix yet (will try to work on one shortly) but it's
not a blocker.

Double Trigger Issue That Alex G Mentioned: We have been seeing tasks in
the running state get run by another worker almost exactly 1 hour after
they start running. Double triggers are pretty unacceptable in Airflow, but
I'm not counting this as a blocker because I don't fully understand what it
is happening but it is still pretty scary. Internally we have a patch that
mitigates this to some degree but Alex G is still investigating.

On Thu, Feb 23, 2017 at 1:49 PM, Bolke de Bruin  wrote:

> I’m not particularly against another RC. On the IPMC there were some
> issues mentioned regarding licensing, which probably are blocking as well
> (eg. no LICENSE etc in the tar ball). I found some HighCharts left overs as
> well, while addressing the licensing issues. PR here:
> https://github.com/apache/incubator-airflow/pull/2098 <
> https://github.com/apache/incubator-airflow/pull/2098> , will be merged
> shortly.
>
> I just hope we can get our own vote to pass quickly(!) and not have
> another last minute blocker :P.
>
> Cheers
> Bolke
>
> > On 23 Feb 2017, at 22:41, Maxime Beauchemin 
> wrote:
> >
> > IMHO 1 is a blocker. The other issues could have been mitigated but 1 is
> a
> > dealbreaker for Airbnb. We have lots of large, critical DAGs that would
> be
> > in a standstill because of individual task failures, where in reality a
> lot
> > of progress can be made.
> >
> > Airflow should really do as much work as possible and honor the
> > dependencies specified by the user before giving up and requiring
> > intervention.
> >
> > Max
> >
> > On Thu, Feb 23, 2017 at 1:10 PM, Chris Riccomini 
> > wrote:
> >
> >> My 2c:
> >>
> >> I observed both #1 and #2 in Dan's list. I figured y'all had had a
> >> discussion about the change in behavior. :) In any case, I made my peace
> >> with it, and we've been running happily in production for weeks now, so
> I
> >> personally don't see it as a blocker. Obviously, if it's an issue for
> you
> >> guys at AirBNB, a patch and merge to master is critical, but I still
> think
> >> we should fix this stuff as part of 1.8.1.
> >>
> >> One compelling counter argument to this is that there's a bit of
> whiplash
> >> in terms of behavior, where 1.7.1.* behaves one way, then 1.8.0 behaves
> >> another, then 1.8.1 goes back to the old way again. I guess I'm just not
> >> that worried about it.
> >>
> >> Anyway.. take it or leave it. :)
> >>
> >> Cheers,
> >> Chris
> >>
> >> On Thu, Feb 23, 2017 at 12:31 PM, Bolke de Bruin 
> >> wrote:
> >>
> >>> Gotcha. Will be patient. Good luck.
> >>>
> >>> Bolke
> >>>
> >>>> On 23 Feb 2017, at 21:12, Dan Davydov  INVALID>
> >>> wrote:
> >>>>
> >>>> Here is an example for 1, you can see that there are some white tasks
> >>> that should have been run. I don't have time to create a skeleton DAG
> at
> >>> the moment unfortunately because of release-related firefighting. Will
> >>> hopefully post back here later once firefighting is done.
> >>>>
> >>>>
> >>>> On Thu, Feb 23, 2017 at 12:00 PM, Bolke de Bruin  >>> <mailto:bdbr...@gmail.com>> wrote:
> >>>> Hey Dan, Alex,
> >>>>
> >>>> Indeed #1 seems serious, specifically the the second part - skipping
> >> the
> >>> root task (root task of the whole DAG?). Do you have a skel

Re: scheduler running on multiple nodes

2017-02-24 Thread Dan Davydov
Fwiw Airbnb was running multiple schedulers for a short while on 1.7.1 and
we didn't seem to have issues.

On Feb 24, 2017 12:25 AM, "Bolke de Bruin"  wrote:

> While I agree with the assessment of Sid that a lot has changed and we do
> not officially test on multiple schedulers, many changes were in the area
> of proper locking which benefit multiple schedulers. In addition the tasks
> themselves have built in checks that they don’t run twice at the same time.
>
> Yet YMMV.
>
> Bolke
>
> > On 24 Feb 2017, at 03:13, siddharth anand  wrote:
> >
> > I did  run 2 or more schedulers with Local Executors up until mid last
> > year. There have been enough changes to the code and feature additions
> that
> > I don't think this is a recommended practice at this point. Also, there
> is
> > not a lot of synchronization in the scheduler to ensure this will work.
> >
> > -s
> >
> > On Thu, Feb 9, 2017 at 6:47 AM, matus valo  wrote:
> >
> >> Hi all,
> >>
> >>
> >>
> >> I am considering deployment of airflow as pipeline framework. I have
> found
> >> out multiple articles explaining deployment of airflow in distributed
> >> environment (e.g. [1]). Unfortunately, I was not able to find out any
> use
> >> case where scheduler is deployed distributed on multiple nodes. Is it
> >> possible to have scheduler distributed on multiple nodes to prevent
> single
> >> point of failure? I haven’t found any mention about it in
> documentation. I
> >> have found out in [2] that it is not possible but on the other hand in
> [3]
> >> is reference that this can be solved in new version of airflow.
> >>
> >>
> >>
> >> Thanks,
> >>
> >>
> >> Matus
> >>
> >>
> >>
> >> [1] http://site.clairvoyantsoft.com/setting-apache-airflow-cluster/
> >>
> >> [2] https://groups.google.com/forum/#!topic/airbnb_airflow/-1wKa3OcwME
> >>
> >> [3] https://issues.apache.org/jira/browse/AIRFLOW-678
> >>
>
>


Re: scheduler running on multiple nodes

2017-02-24 Thread Dan Davydov
We just had two running by accident for some period of time.

On Feb 24, 2017 5:52 AM, "Jason Jho" 
wrote:

> Hi Dan / Sid,
>
> Would you be able to elaborate on the multiple scheduler setup? Curious how
> that would have been deployed. Was the purpose to have some kind of
> failover or to distribute execution of jobs?
>
> Thanks!
> On Fri, Feb 24, 2017 at 3:49 AM Dan Davydov  invalid>
> wrote:
>
> > Fwiw Airbnb was running multiple schedulers for a short while on 1.7.1
> and
> > we didn't seem to have issues.
> >
> > On Feb 24, 2017 12:25 AM, "Bolke de Bruin"  wrote:
> >
> > > While I agree with the assessment of Sid that a lot has changed and we
> do
> > > not officially test on multiple schedulers, many changes were in the
> area
> > > of proper locking which benefit multiple schedulers. In addition the
> > tasks
> > > themselves have built in checks that they don’t run twice at the same
> > time.
> > >
> > > Yet YMMV.
> > >
> > > Bolke
> > >
> > > > On 24 Feb 2017, at 03:13, siddharth anand  wrote:
> > > >
> > > > I did  run 2 or more schedulers with Local Executors up until mid
> last
> > > > year. There have been enough changes to the code and feature
> additions
> > > that
> > > > I don't think this is a recommended practice at this point. Also,
> there
> > > is
> > > > not a lot of synchronization in the scheduler to ensure this will
> work.
> > > >
> > > > -s
> > > >
> > > > On Thu, Feb 9, 2017 at 6:47 AM, matus valo 
> > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >>
> > > >>
> > > >> I am considering deployment of airflow as pipeline framework. I have
> > > found
> > > >> out multiple articles explaining deployment of airflow in
> distributed
> > > >> environment (e.g. [1]). Unfortunately, I was not able to find out
> any
> > > use
> > > >> case where scheduler is deployed distributed on multiple nodes. Is
> it
> > > >> possible to have scheduler distributed on multiple nodes to prevent
> > > single
> > > >> point of failure? I haven’t found any mention about it in
> > > documentation. I
> > > >> have found out in [2] that it is not possible but on the other hand
> in
> > > [3]
> > > >> is reference that this can be solved in new version of airflow.
> > > >>
> > > >>
> > > >>
> > > >> Thanks,
> > > >>
> > > >>
> > > >> Matus
> > > >>
> > > >>
> > > >>
> > > >> [1] http://site.clairvoyantsoft.com/setting-apache-airflow-cluster/
> > > >>
> > > >> [2]
> > https://groups.google.com/forum/#!topic/airbnb_airflow/-1wKa3OcwME
> > > >>
> > > >> [3] https://issues.apache.org/jira/browse/AIRFLOW-678
> > > >>
> > >
> > >
> >
>


Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-24 Thread Dan Davydov
Update on old pending issues:
- Black Squares in UI: Fix merged
- Double Trigger Issue That Alex G Mentioned: Alex has a PR in flight

New Issues:
- Backfill seems to be having issues (only running one dagrun at a time),
we are still investigating - might be a blocker
- High DB Load (~8x more than 1.7) - We are still investigating but it's
probably not a blocker for the release
- Skipped tasks potentially cause a dagrun to be marked as failure/success
prematurely - not sure whether or not to classify this as a blocker (only
really an issue for users who use the BranchingPythonOperator, which AirBnB
does)

On Thu, Feb 23, 2017 at 5:59 PM, siddharth anand  wrote:

> IMHO, a DAG run without a start date is non-sensical but is not enforced
>  That said, our UI allows for the manual creation of DAG Runs without a
> start date as shown in the images below:
>
>
>- https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
>202017-02-22%2016.00.40.png?dl=0
><https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
> 202017-02-22%2016.00.40.png?dl=0>
>- https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
>202017-02-22%2016.02.22.png?dl=0
><https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
> 202017-02-22%2016.02.22.png?dl=0>
>
>
> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > Our database may have edge cases that could be associated with running
> any
> > previous version that may or may not have been part of an official
> release.
> >
> > Let's see if anyone else reports the issue. If no one does, one option is
> > to release 1.8.0 as is with a comment in the release notes, and have a
> > future official minor apache release 1.8.1 that would fix these minor
> > issues that are not deal breaker.
> >
> > @bolke, I'm curious, how long does it take you to go through one release
> > cycle? Oh, and do you have a documented step by step process for
> releasing?
> > I'd like to add the Pypi part to this doc and add committers that are
> > interested to have rights on the project on Pypi.
> >
> > Max
> >
> > On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin 
> wrote:
> >
> > > So it is a database integrity issue? Afaik a start_date should always
> be
> > > set for a DagRun (create_dagrun) does so  I didn't check the code
> though.
> > >
> > > Sent from my iPhone
> > >
> > > > On 22 Feb 2017, at 22:19, Dan Davydov  INVALID>
> > > wrote:
> > > >
> > > > Should clarify this occurs when a dagrun does not have a start date,
> > not
> > > a
> > > > dag (which makes it even less likely to happen). I don't think this
> is
> > a
> > > > blocker for releasing.
> > > >
> > > >> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov <
> dan.davy...@airbnb.com>
> > > wrote:
> > > >>
> > > >> I rolled this out in our prod and the webservers failed to load due
> to
> > > >> this commit:
> > > >>
> > > >> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag
> > > >> 7c94d81c390881643f94d5e3d7d6fb351a445b72
> > > >>
> > > >> This fixed it:
> > > >> -  > > >> class="glyphicon glyphicon-info-sign" aria-hidden="true"
> title="Start
> > > Date:
> > > >> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}">
> > > >> +  > > >> class="glyphicon glyphicon-info-sign" aria-hidden="true">
> > > >>
> > > >> This is caused by assuming that all DAGs have start dates set, so a
> > > broken
> > > >> DAG will take down the whole UI. Not sure if we want to make this a
> > > blocker
> > > >> for the release or not, I'm guessing for most deployments this would
> > > occur
> > > >> pretty rarely. I'll submit a PR to fix it soon.
> > > >>
> > > >>
> > > >>
> > > >> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini <
> > criccom...@apache.org
> > > >
> > > >> wrote:
> > > >>
> > > >>> Ack that the vote has already passed, but belated +1 (binding)
> > > >>>
> > > >>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin  >
> > > >>> wrote:
> > > >>>
> > > &

Re: Cutting down on testing time

2017-02-27 Thread Dan Davydov
This looks like a great effort to me at least in the short term (in the
long term I think most of the integration tests should be run together if
the infra allows this). Another thing we could start looking into is
parallelizing tests (though this may require beefier machines from Travis).

On Sat, Feb 25, 2017 at 8:58 AM, Bolke de Bruin  wrote:

> Hi All,
>
> Jeremiah and I have been looking into optimising the time that is spend on
> tests. The reason for this was that Travis’ runs are taking more and more
> time and we are being throttled by travis. As part of that we enabled color
> coding of test outcomes and timing of tests. The results kind of
> …surprising.
>
> This is the top 20 of tests were we spend the most time. MySQL (remember
> concurrent access enabled) - https://s3.amazonaws.com/
> archive.travis-ci.org/jobs/205277617/log.txt:
>
> tests.BackfillJobTest.test_backfill_examples:  287.9209s
> tests.BackfillJobTest.test_backfill_multi_dates:  53.5198s
> tests.SchedulerJobTest.test_scheduler_start_date:  36.4935s
> tests.CoreTest.test_scheduler_job:  35.5852s
> tests.CliTests.test_backfill:  29.7484s
> tests.SchedulerJobTest.test_scheduler_multiprocessing:  26.1573s
> tests.DaskExecutorTest.test_backfill_integration:  24.5456s
> tests.CoreTest.test_schedule_dag_no_end_date_up_to_today_only:  17.3278s
> tests.SubDagOperatorTests.test_subdag_deadlock:  16.1957s
> tests.SensorTimeoutTest.test_timeout:  15.1000s
> tests.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past:
> 13.8812s
> tests.BackfillJobTest.test_cli_backfill_depends_on_past:  12.9539s
> tests.SchedulerJobTest.test_dagrun_deadlock_ignore_
> depends_on_past_advance_ex_date:  12.8779s
> tests.SchedulerJobTest.test_dagrun_success:  12.8177s
> tests.SchedulerJobTest.test_dagrun_root_fail:  10.3953s
> tests.SchedulerJobTest.test_dag_with_system_exit:  10.1132s
> tests.TransferTests.test_mysql_to_hive:  8.5939s
> tests.SchedulerJobTest.test_retry_still_in_executor:  8.1739s
> tests.SchedulerJobTest.test_dagrun_fail:  7.9855s
> tests.ImpersonationTest.test_default_impersonation:  7.4993s
>
> Yes we spend a whopping 5 minutes on executing all examples. Another
> interesting one is “tests.CoreTest.test_scheduler_job”. This test just
> checks whether a certain directories are creating as part of logging. This
> could have been covered by a real unit test just covering the functionality
> of the function that creates the files - now it takes 35s.
>
> We discussed several strategies for reducing time apart from rewriting
> some of the tests (that would be a herculean job!). What the most optimal
> seems is:
>
> 1. Run the scheduler tests apart from all other tests.
> 2. Run “operator” integration tests in their own unit.
> 3. Run UI tests separate
> 4. Run API tests separate
>
> This creates the following build matrix (warning ASCII art):
>
> ——
> |   |  Scheduler |  Operators   |
>  UI  |   API |
> ——
> | Python 2  | x  |. x   |
>  x   |   x   |
> ——
> | Python 3  | x  |  x   |
>  x   |   x   |
> ——
> | Kerberos  ||  |
>  x   |   x   |
> ——
> | Ldap  ||  |
>  x   |   |
> ——
> | Hive  ||  x   |
>  x   |   x   |
> ——
> | SSH   ||  x   |
>  |   |
> ——
> | Postgres  | x  |  x   |
>  x   |   x   |
> ——
> | MySQL | x  |  x   |
>  x   |   x   |
> ——
> | SQLite| x  |  x
>  |   x   |   x   |
> ——
>
>
> So from this build matrix one can deduct that Postgres, MySQL are generic
> services that will be present in every build. In addition all builds will
> use Python 2 and Python 3. And I propose using Python 3.4 and Python 3.5.
>
>
> Furthermore, I would like us to label our tests correctly, e.g. unit test
> or integration test.


Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-27 Thread Dan Davydov
gt; >>> On 25 Feb 2017, at 09:07, Bolke de Bruin  wrote:
> >>>
> >>> Hi Dan,
> >>>
> >>> - Backfill indeed runs only one dagrun at the time, see line 1755 of
> >> jobs.py. I’ll think about how to fix this over the weekend (I think it
> was
> >> my change that introduced this). Suggestions always welcome. Depending
> the
> >> impact it is a blocker or not. We don’t often use backfills and
> definitely
> >> not at your size, so that is why it didn’t pop up with us. I’m assuming
> >> blocker for now, btw.
> >>> - Speculation on the High DB Load. I’m not sure what your benchmark is
> >> here (1.7.1 + multi processor dags?), but as you mentioned in the code
> >> dependencies are checked a couple of times for one run and even task
> >> instance. Dependency checking requires aggregation on the DB, which is a
> >> performance killer. Annoying but not a blocker.
> >>> - Skipped tasks potentially cause a dagrun to be marked failure/success
> >> prematurely. BranchOperators are widely used if it affects these
> operators,
> >> then it is a blocker.
> >>>
> >>> - Bolke
> >>>
> >>>> On 25 Feb 2017, at 02:04, Dan Davydov  INVALID>
> >> wrote:
> >>>>
> >>>> Update on old pending issues:
> >>>> - Black Squares in UI: Fix merged
> >>>> - Double Trigger Issue That Alex G Mentioned: Alex has a PR in flight
> >>>>
> >>>> New Issues:
> >>>> - Backfill seems to be having issues (only running one dagrun at a
> >> time),
> >>>> we are still investigating - might be a blocker
> >>>> - High DB Load (~8x more than 1.7) - We are still investigating but
> it's
> >>>> probably not a blocker for the release
> >>>> - Skipped tasks potentially cause a dagrun to be marked as
> >> failure/success
> >>>> prematurely - not sure whether or not to classify this as a blocker
> >> (only
> >>>> really an issue for users who use the BranchingPythonOperator, which
> >> AirBnB
> >>>> does)
> >>>>
> >>>> On Thu, Feb 23, 2017 at 5:59 PM, siddharth anand 
> >> wrote:
> >>>>
> >>>>> IMHO, a DAG run without a start date is non-sensical but is not
> >> enforced
> >>>>> That said, our UI allows for the manual creation of DAG Runs without
> a
> >>>>> start date as shown in the images below:
> >>>>>
> >>>>>
> >>>>> - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
> >>>>> 202017-02-22%2016.00.40.png?dl=0
> >>>>> <https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot%
> >>>>> 202017-02-22%2016.00.40.png?dl=0>
> >>>>> - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
> >>>>> 202017-02-22%2016.02.22.png?dl=0
> >>>>> <https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot%
> >>>>> 202017-02-22%2016.02.22.png?dl=0>
> >>>>>
> >>>>>
> >>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <
> >>>>> maximebeauche...@gmail.com> wrote:
> >>>>>
> >>>>>> Our database may have edge cases that could be associated with
> running
> >>>>> any
> >>>>>> previous version that may or may not have been part of an official
> >>>>> release.
> >>>>>>
> >>>>>> Let's see if anyone else reports the issue. If no one does, one
> >> option is
> >>>>>> to release 1.8.0 as is with a comment in the release notes, and
> have a
> >>>>>> future official minor apache release 1.8.1 that would fix these
> minor
> >>>>>> issues that are not deal breaker.
> >>>>>>
> >>>>>> @bolke, I'm curious, how long does it take you to go through one
> >> release
> >>>>>> cycle? Oh, and do you have a documented step by step process for
> >>>>> releasing?
> >>>>>> I'd like to add the Pypi part to this doc and add committers that
> are
> >>>>>> interested to have rights on the project on Pypi.
> >>>>>>
> >>>>>> Max
> >>>>>>
> >>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin 

Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-27 Thread Dan Davydov
Repro steps:
- Create a DAG with a dummy task
- Let this DAG run for one dagrun
- Add a new subdag operator that contains a dummy operator to this DAG that
has depends_on_past set to true
- click on the white square for the new subdag operator in the DAGs first
dagrun
- Click "Zoom into subdag" (takes you to the graph view for the subdag)
- Click the dummy task in the graph view and click "Mark Success"
- Observe that the list of tasks to mark as success is empty (it should
contain the dummy task)

On Mon, Feb 27, 2017 at 1:03 PM, Bolke de Bruin  wrote:

> Dan
>
> Can you elaborate on 2, cause I thought I specifically took care of that.
>
> Cheers
> Bolke
>
> Sent from my iPhone
>
> > On 27 Feb 2017, at 20:27, Dan Davydov 
> wrote:
> >
> > I created https://issues.apache.org/jira/browse/AIRFLOW-921 to track the
> > pending issues.
> >
> > There are two more issues we found which I included there:
> > 1. Task instances that have their state manually set to running make the
> UI
> > for their DAG unable to parse
> > 2. Mark success doesn't work for non existent task instances/dagruns
> which
> > breaks the subdag use case (setting tasks as successful via the graph
> view)
> >
> >> On Mon, Feb 27, 2017 at 11:06 AM, Bolke de Bruin 
> wrote:
> >>
> >> Hey Max
> >>
> >> It is massive for sure. Sorry about that ;-). However it is not as
> massive
> >> as you might deduct from a first view. 0) run tasks concurrently across
> dag
> >> runs 1) ordering of the tasks was added to the loop. 2) calculating of
> >> deadlocks, running tasks, tasks to run was corrected, 3) relying on the
> >> executor for status updates was replaced, 4) (tbd) executor failure
> check
> >> to protect against endless Ioops.
> >>
> >> 0+1 seem bigger than they are due to the amount of lines changed. 2 is a
> >> subtle change, that touches a couple of lines to pop/push properly. 3)
> is
> >> bigger, as I didn't like the reliance on the executor. 4) is old code
> that
> >> needs to be added again.
> >>
> >> I probably can leave out 3 which makes 4 mood. The change would be
> >> smaller. Maybe I could even completely remove 3 and just add 4. What are
> >> your thoughts?
> >>
> >> The random failures we were seeing were the "implicit" test of not a
> >> executing in the right order and then deadlocking. But no explicit tests
> >> exist. Help would definitely be appreciated.
> >>
> >> Yes I thought about using the scheduler and/or reusing logic from the
> >> scheduler. I even experimented a little with it but it didn't allow me
> to
> >> pass the tests effectively.
> >>
> >> What I am planning to do is split the function and make it unit testable
> >> if you agree with the current approach.
> >>
> >> Bolke
> >>
> >> Sent from my iPhone
> >>
> >>> On 27 Feb 2017, at 18:35, Maxime Beauchemin <
> maximebeauche...@gmail.com>
> >> wrote:
> >>>
> >>> This PR is pretty massive and complex! It looks like solid work but
> let's
> >>> be really careful around testing and rolling this out.
> >>>
> >>> This may be out of scope for this PR, but wanted to discuss the idea of
> >>> using the scheduler's logic to perform backfills. It'd be nice to have
> >> that
> >>> logic in one place, though I lost grasp on the details around
> feasibility
> >>> around this approach. I'm sure you looked into this option before
> issuing
> >>> this PR and I'm curious to hear your thoughts on blockers/challenges
> >> around
> >>> this alternate approach.
> >>>
> >>> Also I'm wondering whether we have any sort of mechanisms in our
> >>> integration test to validate that task dependencies are respected and
> run
> >>> in the right order. If not I was thinking we could build some
> abstraction
> >>> to make it easy to write this type of tests in an expressive way.
> >>>
> >>> ```
> >>> #[some code to run a backfill, or a scheduler session]
> >>> it = IntegrationTestResults(dag_id='exmaple1')
> >>> assert it.ran_before('task1', 'task_2')
> >>> assert ti.overlapped('task1', 'task_3') # confirms 2 tasks ran in
> >> parallel
> >>> assert ti.none_failed()
> >>> assert ti.ran_last('ro

Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-27 Thread Dan Davydov
rc + your patch (and a couple of our own custom ones)

On Mon, Feb 27, 2017 at 2:11 PM, Bolke de Bruin  wrote:

> Dan
>
> Btw are you running with my patch for this? Or still plain rc?
>
> Cheers
> Bolke
>
> Sent from my iPhone
>
> > On 27 Feb 2017, at 22:46, Bolke de Bruin  wrote:
> >
> > I'll have a look. I verified and the code is there to take of this.
> >
> > B.
> >
> > Sent from my iPhone
> >
> >> On 27 Feb 2017, at 22:34, Dan Davydov 
> wrote:
> >>
> >> Repro steps:
> >> - Create a DAG with a dummy task
> >> - Let this DAG run for one dagrun
> >> - Add a new subdag operator that contains a dummy operator to this DAG
> that
> >> has depends_on_past set to true
> >> - click on the white square for the new subdag operator in the DAGs
> first
> >> dagrun
> >> - Click "Zoom into subdag" (takes you to the graph view for the subdag)
> >> - Click the dummy task in the graph view and click "Mark Success"
> >> - Observe that the list of tasks to mark as success is empty (it should
> >> contain the dummy task)
> >>
> >>> On Mon, Feb 27, 2017 at 1:03 PM, Bolke de Bruin 
> wrote:
> >>>
> >>> Dan
> >>>
> >>> Can you elaborate on 2, cause I thought I specifically took care of
> that.
> >>>
> >>> Cheers
> >>> Bolke
> >>>
> >>> Sent from my iPhone
> >>>
> >>>> On 27 Feb 2017, at 20:27, Dan Davydov  INVALID>
> >>> wrote:
> >>>>
> >>>> I created https://issues.apache.org/jira/browse/AIRFLOW-921 to track
> the
> >>>> pending issues.
> >>>>
> >>>> There are two more issues we found which I included there:
> >>>> 1. Task instances that have their state manually set to running make
> the
> >>> UI
> >>>> for their DAG unable to parse
> >>>> 2. Mark success doesn't work for non existent task instances/dagruns
> >>> which
> >>>> breaks the subdag use case (setting tasks as successful via the graph
> >>> view)
> >>>>
> >>>>> On Mon, Feb 27, 2017 at 11:06 AM, Bolke de Bruin 
> >>> wrote:
> >>>>>
> >>>>> Hey Max
> >>>>>
> >>>>> It is massive for sure. Sorry about that ;-). However it is not as
> >>> massive
> >>>>> as you might deduct from a first view. 0) run tasks concurrently
> across
> >>> dag
> >>>>> runs 1) ordering of the tasks was added to the loop. 2) calculating
> of
> >>>>> deadlocks, running tasks, tasks to run was corrected, 3) relying on
> the
> >>>>> executor for status updates was replaced, 4) (tbd) executor failure
> >>> check
> >>>>> to protect against endless Ioops.
> >>>>>
> >>>>> 0+1 seem bigger than they are due to the amount of lines changed. 2
> is a
> >>>>> subtle change, that touches a couple of lines to pop/push properly.
> 3)
> >>> is
> >>>>> bigger, as I didn't like the reliance on the executor. 4) is old code
> >>> that
> >>>>> needs to be added again.
> >>>>>
> >>>>> I probably can leave out 3 which makes 4 mood. The change would be
> >>>>> smaller. Maybe I could even completely remove 3 and just add 4. What
> are
> >>>>> your thoughts?
> >>>>>
> >>>>> The random failures we were seeing were the "implicit" test of not a
> >>>>> executing in the right order and then deadlocking. But no explicit
> tests
> >>>>> exist. Help would definitely be appreciated.
> >>>>>
> >>>>> Yes I thought about using the scheduler and/or reusing logic from the
> >>>>> scheduler. I even experimented a little with it but it didn't allow
> me
> >>> to
> >>>>> pass the tests effectively.
> >>>>>
> >>>>> What I am planning to do is split the function and make it unit
> testable
> >>>>> if you agree with the current approach.
> >>>>>
> >>>>> Bolke
> >>>>>
> >>>>> Sent from my iPhone
> >>>>>
> >>>>>> On 27 Feb 2017, at 18:35, Maxime Beauchemin <
> 

Re: [RESULT] [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc4

2017-02-28 Thread Dan Davydov
I wasn't able to reproduce the described behavior but replied in the JIRA
(let's discuss there) https://issues.apache.org/jira/browse/AIRFLOW-920

On Tue, Feb 28, 2017 at 3:56 AM, Bolke de Bruin  wrote:

> Gotcha:
>
> It works, but then slightly different.
>
> If you added the subdag, do not zoom in, but click on the subdag in the
> main dag. Use mark success there. It will then allow you to mark all tasks
> successful that are part of the subdag.
>
> Do we still consider this a blocker? Imho, no as a workaround seems to
> exist.
>
> - Bolke
>
> > On 27 Feb 2017, at 23:19, Dan Davydov 
> wrote:
> >
> > rc + your patch (and a couple of our own custom ones)
> >
> > On Mon, Feb 27, 2017 at 2:11 PM, Bolke de Bruin 
> wrote:
> >
> >> Dan
> >>
> >> Btw are you running with my patch for this? Or still plain rc?
> >>
> >> Cheers
> >> Bolke
> >>
> >> Sent from my iPhone
> >>
> >>> On 27 Feb 2017, at 22:46, Bolke de Bruin  wrote:
> >>>
> >>> I'll have a look. I verified and the code is there to take of this.
> >>>
> >>> B.
> >>>
> >>> Sent from my iPhone
> >>>
> >>>> On 27 Feb 2017, at 22:34, Dan Davydov  INVALID>
> >> wrote:
> >>>>
> >>>> Repro steps:
> >>>> - Create a DAG with a dummy task
> >>>> - Let this DAG run for one dagrun
> >>>> - Add a new subdag operator that contains a dummy operator to this DAG
> >> that
> >>>> has depends_on_past set to true
> >>>> - click on the white square for the new subdag operator in the DAGs
> >> first
> >>>> dagrun
> >>>> - Click "Zoom into subdag" (takes you to the graph view for the
> subdag)
> >>>> - Click the dummy task in the graph view and click "Mark Success"
> >>>> - Observe that the list of tasks to mark as success is empty (it
> should
> >>>> contain the dummy task)
> >>>>
> >>>>> On Mon, Feb 27, 2017 at 1:03 PM, Bolke de Bruin 
> >> wrote:
> >>>>>
> >>>>> Dan
> >>>>>
> >>>>> Can you elaborate on 2, cause I thought I specifically took care of
> >> that.
> >>>>>
> >>>>> Cheers
> >>>>> Bolke
> >>>>>
> >>>>> Sent from my iPhone
> >>>>>
> >>>>>> On 27 Feb 2017, at 20:27, Dan Davydov  >> INVALID>
> >>>>> wrote:
> >>>>>>
> >>>>>> I created https://issues.apache.org/jira/browse/AIRFLOW-921 to
> track
> >> the
> >>>>>> pending issues.
> >>>>>>
> >>>>>> There are two more issues we found which I included there:
> >>>>>> 1. Task instances that have their state manually set to running make
> >> the
> >>>>> UI
> >>>>>> for their DAG unable to parse
> >>>>>> 2. Mark success doesn't work for non existent task instances/dagruns
> >>>>> which
> >>>>>> breaks the subdag use case (setting tasks as successful via the
> graph
> >>>>> view)
> >>>>>>
> >>>>>>> On Mon, Feb 27, 2017 at 11:06 AM, Bolke de Bruin <
> bdbr...@gmail.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>> Hey Max
> >>>>>>>
> >>>>>>> It is massive for sure. Sorry about that ;-). However it is not as
> >>>>> massive
> >>>>>>> as you might deduct from a first view. 0) run tasks concurrently
> >> across
> >>>>> dag
> >>>>>>> runs 1) ordering of the tasks was added to the loop. 2) calculating
> >> of
> >>>>>>> deadlocks, running tasks, tasks to run was corrected, 3) relying on
> >> the
> >>>>>>> executor for status updates was replaced, 4) (tbd) executor failure
> >>>>> check
> >>>>>>> to protect against endless Ioops.
> >>>>>>>
> >>>>>>> 0+1 seem bigger than they are due to the amount of lines changed. 2
> >> is a
> >>>>>>> subtle change, that touches a couple of lines to pop/push properly.
> >> 3)
> >>>>> i

Re: Getting to RC5: Update

2017-03-01 Thread Dan Davydov
We are seeing another major issue with backfills where task instances are
being deleted and marked as "removed", I am still investigating. Let's keep
discussion about these in https://issues.apache.org/jira/browse/AIRFLOW-921
and the subtask comments to have it one place. I will look at the other
points you cc'd me on too. Thanks for continuing to drive this forward!

On Wed, Mar 1, 2017 at 8:22 AM, Bolke de Bruin  wrote:

> Hi,
>
> Just wanted to give an update about the progress getting to RC5. As
> reported we have 6 blockers listed.
>
> 1. Double run job should not terminate the existing running job. -> Patch
> Available
> 2. Parallelize dag runs in backfills -> Patch Available, Tests need to be
> updated, see below
> 3. Setting a task to running manually breaks a DAGs UI -> Patch merged
> 4. Can't mark non-existent tasks as successful from graph view ->
> Workaround available (t.b.c.), Patch Available unit tests to be added
> 5. (Named)HivePartitionSensor broken if hook attr not set -> Patch merged
> 6. Skipped tasks potentially cause a dagrun to be marked as
> failure/success prematurely -> see below
>
> On 2 I would like to have some more discussion of this would be acceptable
> (https://github.com/apache/incubator-airflow/pull/2107). I have written
> the patch for this, however we are not large backfill users. So I need
> feedback specifically on ripping out the “executor” part: @dan, @max.
>
> On 6 Alex has reported this earlier and written a PR for this (
> https://github.com/apache/incubator-airflow/pull/1961). Maxime had some
> thoughts about this, which are currently blocking the integration. However,
> in testing it seems to solve the issue. Can we finalise the discussion
> please @max @dan @alex?
>
> Cheers
> Bolke


Re: Getting to RC5: Update

2017-03-01 Thread Dan Davydov
Agreed, I created a JIRA a couple of minutes ago (it's a subtask in the
JIRA I mentioned).

On Wed, Mar 1, 2017 at 10:58 AM, Bolke de Bruin  wrote:

> Please create a Jira and provide context when this happens. “REMOVED”
> marked means the taskinstance does not have a task equivalent anymore in
> the dag (or so it should :)).
>
> Bolke
>
> > On 1 Mar 2017, at 19:55, Dan Davydov 
> wrote:
> >
> > We are seeing another major issue with backfills where task instances are
> > being deleted and marked as "removed", I am still investigating. Let's
> keep
> > discussion about these in https://issues.apache.org/
> jira/browse/AIRFLOW-921
> > and the subtask comments to have it one place. I will look at the other
> > points you cc'd me on too. Thanks for continuing to drive this forward!
> >
> > On Wed, Mar 1, 2017 at 8:22 AM, Bolke de Bruin 
> wrote:
> >
> >> Hi,
> >>
> >> Just wanted to give an update about the progress getting to RC5. As
> >> reported we have 6 blockers listed.
> >>
> >> 1. Double run job should not terminate the existing running job. ->
> Patch
> >> Available
> >> 2. Parallelize dag runs in backfills -> Patch Available, Tests need to
> be
> >> updated, see below
> >> 3. Setting a task to running manually breaks a DAGs UI -> Patch merged
> >> 4. Can't mark non-existent tasks as successful from graph view ->
> >> Workaround available (t.b.c.), Patch Available unit tests to be added
> >> 5. (Named)HivePartitionSensor broken if hook attr not set -> Patch
> merged
> >> 6. Skipped tasks potentially cause a dagrun to be marked as
> >> failure/success prematurely -> see below
> >>
> >> On 2 I would like to have some more discussion of this would be
> acceptable
> >> (https://github.com/apache/incubator-airflow/pull/2107). I have written
> >> the patch for this, however we are not large backfill users. So I need
> >> feedback specifically on ripping out the “executor” part: @dan, @max.
> >>
> >> On 6 Alex has reported this earlier and written a PR for this (
> >> https://github.com/apache/incubator-airflow/pull/1961). Maxime had some
> >> thoughts about this, which are currently blocking the integration.
> However,
> >> in testing it seems to solve the issue. Can we finalise the discussion
> >> please @max @dan @alex?
> >>
> >> Cheers
> >> Bolke
>
>


Re: Airflow running different with different user id ?

2017-03-03 Thread Dan Davydov
Yes it is starting on 1.8.0 which will be released soon, you can look in
the documentation/grep for "run_as".

On Mar 3, 2017 8:50 AM, "Michael Gong"  wrote:

> Hi,
>
>
> Suppose I have 1 airflow instance running 2 different DAGs, is it possible
> to specify the 2 DAGs running under 2 different ids ?
>
>
> Any advises are welcomed.
>
>
> Thanks.
>
> Michael
>
>
>
>
>


Re: Airflow running different with different user id ?

2017-03-03 Thread Dan Davydov
Within a couple of weeks.

On Fri, Mar 3, 2017 at 12:34 PM, Michael Gong  wrote:

> When approximately will it be released?
>
> Sent from my PP•KING™ smartphone
>
> On Mar 3, 2017 1:42 PM, Dan Davydov 
> wrote:
> Yes it is starting on 1.8.0 which will be released soon, you can look in
> the documentation/grep for "run_as".
>
> On Mar 3, 2017 8:50 AM, "Michael Gong"  wrote:
>
> > Hi,
> >
> >
> > Suppose I have 1 airflow instance running 2 different DAGs, is it
> possible
> > to specify the 2 DAGs running under 2 different ids ?
> >
> >
> > Any advises are welcomed.
> >
> >
> > Thanks.
> >
> > Michael
> >
> >
> >
> >
> >
>


Proposal to simplify start/end dates

2017-03-07 Thread Dan Davydov
A very common source of confusion for our users is when they specify
start_date in default_args but not in their DAG arguments and then try to
change this start_date to move the execution of their DAG forward (e.g.
from 2015 to 2016). This doesn't work because the logic that is used to
calculate the "initial" start date of a dag differs from the logic to
calculate subsequent dagrun start dates.

Current Airflow Logic:
DS to schedule initial dagrun: dag.start_date if it exists, else min(start
date of tasks_of_dag)
DS to schedule subsequent dagruns: last_dagrun + scheduled_interval

There are a couple ways of addressing this:
1. Change the definition of start date for subsequent dagruns to match the
"initial" dagrun start date (calculated from the minimum of task start
dates)
2. Force explicit dag start dates

I personally like 1.

I also propose that we throw errors for DAGs that have tasks that depend on
other tasks with start dates that occur after theirs (otherwise there could
be deadlocks).

What do people think?


Re: Proposal to simplify start/end dates

2017-03-07 Thread Dan Davydov
Sure thing.

Current Behavior:
- User creates DAG with default_args start date to 2015
- dagrun gets kicked off for 2015
- User changes default_args start date to 2016
- dagruns continue running for 2015

New Behavior:
- User creates DAG with default_args start date to 2015
- dagrun gets kicked off for 2015
- User changes default_args start date to 2016
- *dagruns start running for the 2016 start date instead of 2015*

On Tue, Mar 7, 2017 at 11:49 AM, Bolke de Bruin  wrote:

> Hey Dan,
>
> Im not sure if I am seeing a difference for #1 vs now, except you are
> excluding backfills now from the calculation? Can you provide an example?
>
> Bolke
>
> > On 7 Mar 2017, at 20:38, Dan Davydov 
> wrote:
> >
> > A very common source of confusion for our users is when they specify
> > start_date in default_args but not in their DAG arguments and then try to
> > change this start_date to move the execution of their DAG forward (e.g.
> > from 2015 to 2016). This doesn't work because the logic that is used to
> > calculate the "initial" start date of a dag differs from the logic to
> > calculate subsequent dagrun start dates.
> >
> > Current Airflow Logic:
> > DS to schedule initial dagrun: dag.start_date if it exists, else
> min(start
> > date of tasks_of_dag)
> > DS to schedule subsequent dagruns: last_dagrun + scheduled_interval
> >
> > There are a couple ways of addressing this:
> > 1. Change the definition of start date for subsequent dagruns to match
> the
> > "initial" dagrun start date (calculated from the minimum of task start
> > dates)
> > 2. Force explicit dag start dates
> >
> > I personally like 1.
> >
> > I also propose that we throw errors for DAGs that have tasks that depend
> on
> > other tasks with start dates that occur after theirs (otherwise there
> could
> > be deadlocks).
> >
> > What do people think?
>
>


Re: Proposal to simplify start/end dates

2017-03-07 Thread Dan Davydov
Those semantics should not change with my specific proposal, but I think
logically moving a DAG's start date back should backfill those old dagruns
(not the current behavior which continues running from the current date).
Interval changes are a bit of a hairy topic, I think those kinds of changes
along with mutations of DAG topology (new tasks, task dependency changes,
etc.) need to be thought out a little bit more and have a proposal drafted
(e.g. I think that Airflow should support dags with tasks with different
scheduling intervals). For the purposes of this proposal (and internally at
Airbnb) we do not support interval changes and recommend a new DAG be
created for these cases.

On Tue, Mar 7, 2017 at 1:52 PM, Bolke de Bruin  wrote:

> Ok sounds good. What do you do with a dag that gets predated and with an
> existing dag run? What happens if the interval changes, i.e. non cron
> syntax?
>
> (Just thinking out loud)
>
> B.
>
> Sent from my iPhone
>
> > On 7 Mar 2017, at 22:27, Dan Davydov 
> wrote:
> >
> > Sure thing.
> >
> > Current Behavior:
> > - User creates DAG with default_args start date to 2015
> > - dagrun gets kicked off for 2015
> > - User changes default_args start date to 2016
> > - dagruns continue running for 2015
> >
> > New Behavior:
> > - User creates DAG with default_args start date to 2015
> > - dagrun gets kicked off for 2015
> > - User changes default_args start date to 2016
> > - *dagruns start running for the 2016 start date instead of 2015*
> >
> >> On Tue, Mar 7, 2017 at 11:49 AM, Bolke de Bruin 
> wrote:
> >>
> >> Hey Dan,
> >>
> >> Im not sure if I am seeing a difference for #1 vs now, except you are
> >> excluding backfills now from the calculation? Can you provide an
> example?
> >>
> >> Bolke
> >>
> >>> On 7 Mar 2017, at 20:38, Dan Davydov 
> >> wrote:
> >>>
> >>> A very common source of confusion for our users is when they specify
> >>> start_date in default_args but not in their DAG arguments and then try
> to
> >>> change this start_date to move the execution of their DAG forward (e.g.
> >>> from 2015 to 2016). This doesn't work because the logic that is used to
> >>> calculate the "initial" start date of a dag differs from the logic to
> >>> calculate subsequent dagrun start dates.
> >>>
> >>> Current Airflow Logic:
> >>> DS to schedule initial dagrun: dag.start_date if it exists, else
> >> min(start
> >>> date of tasks_of_dag)
> >>> DS to schedule subsequent dagruns: last_dagrun + scheduled_interval
> >>>
> >>> There are a couple ways of addressing this:
> >>> 1. Change the definition of start date for subsequent dagruns to match
> >> the
> >>> "initial" dagrun start date (calculated from the minimum of task start
> >>> dates)
> >>> 2. Force explicit dag start dates
> >>>
> >>> I personally like 1.
> >>>
> >>> I also propose that we throw errors for DAGs that have tasks that
> depend
> >> on
> >>> other tasks with start dates that occur after theirs (otherwise there
> >> could
> >>> be deadlocks).
> >>>
> >>> What do people think?
> >>
> >>
>


Re: High load in CPU of MySQL when running airflow

2017-03-07 Thread Dan Davydov
FWIW we use the following DAG at Airbnb to reap the task instances table
(this is a stopgap):

# DAG to delete old TIs so that UI operations on the webserver are fast.
This DAG is a
# stopgap, ideally we would make the UI not query all task instances and
add indexes to
# the task_instance table where appropriate to speed up the remaining
webserver table
# queries.
# Note that there is a slight risk that some of these deleted task
instances may break
# the depends_on_past dependency for the following tasks but this should
rarely happy
# and is easy to diagnose and fix.

from datetime import datetime

from airflow import DAG
from airflow.operators import MySqlOperator

args = {
'owner': 'xxx',
'email': ['xxx'],
'start_date': datetime(2017, 1, 30),
'mysql_conn_id': 'airflow_db',
}

dag = DAG(
'airflow_old_task_instance_pruning',
default_args=args,
)

# TODO: TIs that have are successful without a start date will never be
# reaped because they have been mark-success'd in the UI. One fix for this
would be to
# make airflow set start_date when mark-success-ing.
sql = """\
DELETE ti FROM task_instance ti
LEFT OUTER JOIN dag_run dr
ON ti.execution_date = dr.execution_date AND
   ti.dag_id = dr.dag_id
WHERE ((ti.start_date <= DATE_SUB(NOW(), INTERVAL 30 DAY) AND
ti.state != "running") OR
   (ISNULL(ti.start_date) AND
ti.state = "failed")) AND
  (ISNULL(dr.id) OR dr.state != "running")
"""
MySqlOperator(
task_id='delete_old_tis',
sql=sql,
dag=dag,
)



On Tue, Mar 7, 2017 at 5:39 PM, Jason Chen 
wrote:

> Hi Bolke,
>
>  Thanks, but it looks you are actually talking about Harish's use case.
>
>  My use case is about 50 Dags (each one with about 2-3 tasks). I feel our
> run interval setting for the dags are too low (~15 mins). It may result in
> high CPU of MySQL.
>
>  Meanwhile, I dig to MySQL and I noticed a frequently running SQL statement
> as below. It's without proper index on column task_instance.state.
>
> Shouldn't it index "state", given that there could be million of rows in
> task_instance?
>
> SQL Statement:
> "SELECT task_instance.task_id AS task_instance_task_id,
> task_instance.dag_id AS task_instance_dag_id, FROM task_instance WHERE
> task_instance.state = 'queued'"
>
> Also, is there a possibility to clean some "unneeded" entries in the tables
> (say, task_instance) ?  I mean, for example, removing task states older
> than 6 months?
>
> Feedback are welcome.
>
> Thanks.
>
> -Jason
>
>
>
> On Tue, Mar 7, 2017 at 11:45 AM, Bolke de Bruin  wrote:
>
> > Hi Jason
> >
> > I think you need to back it up with more numbers. You assume that a load
> > of 100% is bad and also that 16GB of mem is a lot.
> >
> > 30x25 = 750 tasks per hour = 12,5 tasks per minute. For every task we
> > launch a couple of processes (at least 2) that do not share memory, this
> is
> > to ensure tasks cannot hurt each other. Curl tasks are probably launched
> by
> > using a BashOperator, which means another process. Curl is itself another
> > process. So 4 processes per task, that cannot share memory. Curl can
> cache
> > memory itself as well. You probably have peak times and longer running
> > tasks so it is not evenly spread, then it starts adding up quickly?
> >
> > Bolke.
> >
> >
> > > On 7 Mar 2017, at 19:41, Jason Chen  wrote:
> > >
> > > Hi Harish,
> > > Thanks for the fast response and feedback.
> > > Yeah, I want to see the fix or more discussion !
> > >
> > > BTW, I assume that, given your 30 dags, airflow runs fine after your
> > > increase of heartbeat ?
> > > The default is 5 secs.
> > >
> > >
> > > Thanks.
> > > Jason
> > >
> > >
> > > On Tue, Mar 7, 2017 at 10:24 AM, harish singh <
> harish.sing...@gmail.com>
> > > wrote:
> > >
> > >> I had seen a similar behavior, a year ago, when we were are < 5 Dags.
> > Even
> > >> then the cpu utilization was reaching 100%.
> > >> One way to deal with this is - You could play with "heatbeat" numbers
> > (i.e
> > >> increase heartbeat).
> > >> But then you are introducing more delay to start jobs that are ready
> to
> > run
> > >> (ready to be queued -> queued -> run)
> > >>
> > >> Right now, we have more than 30 dags (each with ~ 20-25 tasks) that
> runs
> > >> every hour.
> > >> We are giving airflow about 5-6 cores (which still seems less for
> > airflow).
> > >> Also, for so many tasks every hour,  our mem consumption is over 16G.
> > >> All our tasks are basically doing "curl". So 16G seems too high.
> > >>
> > >> Having said that, I remember reading somewhere that there was a fix
> > coming
> > >> for this.
> > >> If not, I would definitely want to see more discussion on this.
> > >>
> > >> Thanks for opening this. I would love to hear on how people are
> working
> > >> around this.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Tue, Mar 7, 2017 at 9:42 AM, Jason Chen  >
> > >> wrote:
> > >>
> > >>> Hi  team,
> > >>>
> > >>> We are using a

Re: High load in CPU of MySQL when running airflow

2017-03-07 Thread Dan Davydov
We will need to come up with a plan soon (better DB indexes and/or the
ability to rotate out old task instances according to some policy). Nothing
concrete as of yet though.

On Tue, Mar 7, 2017 at 6:18 PM, Jason Chen 
wrote:

> Hi Dan,
>
>  Thanks so much. This is exactly what I am looking for.
>
> Is there a plan on the future airflow road map to clean this up from
> Airflow system level? Say, in airflow.cfg, a setting to clean up data older
> than specified time.
>
> Your solution is to run an airflow job to clean up the data. That's great.
> In a short term for us, I will be just running the SQL command directly
> from MySQL CLI and then setup an airflow job to do that periodically.
>
> Thanks.
> -Jason
>
> On Tue, Mar 7, 2017 at 5:47 PM, Dan Davydov  invalid>
> wrote:
>
> > FWIW we use the following DAG at Airbnb to reap the task instances table
> > (this is a stopgap):
> >
> > # DAG to delete old TIs so that UI operations on the webserver are fast.
> > This DAG is a
> > # stopgap, ideally we would make the UI not query all task instances and
> > add indexes to
> > # the task_instance table where appropriate to speed up the remaining
> > webserver table
> > # queries.
> > # Note that there is a slight risk that some of these deleted task
> > instances may break
> > # the depends_on_past dependency for the following tasks but this should
> > rarely happy
> > # and is easy to diagnose and fix.
> >
> > from datetime import datetime
> >
> > from airflow import DAG
> > from airflow.operators import MySqlOperator
> >
> > args = {
> > 'owner': 'xxx',
> > 'email': ['xxx'],
> > 'start_date': datetime(2017, 1, 30),
> > 'mysql_conn_id': 'airflow_db',
> > }
> >
> > dag = DAG(
> > 'airflow_old_task_instance_pruning',
> > default_args=args,
> > )
> >
> > # TODO: TIs that have are successful without a start date will never be
> > # reaped because they have been mark-success'd in the UI. One fix for
> this
> > would be to
> > # make airflow set start_date when mark-success-ing.
> > sql = """\
> > DELETE ti FROM task_instance ti
> > LEFT OUTER JOIN dag_run dr
> > ON ti.execution_date = dr.execution_date AND
> >ti.dag_id = dr.dag_id
> > WHERE ((ti.start_date <= DATE_SUB(NOW(), INTERVAL 30 DAY) AND
> > ti.state != "running") OR
> >(ISNULL(ti.start_date) AND
> > ti.state = "failed")) AND
> >   (ISNULL(dr.id) OR dr.state != "running")
> > """
> > MySqlOperator(
> > task_id='delete_old_tis',
> > sql=sql,
> > dag=dag,
> > )
> >
> >
> >
> > On Tue, Mar 7, 2017 at 5:39 PM, Jason Chen 
> > wrote:
> >
> > > Hi Bolke,
> > >
> > >  Thanks, but it looks you are actually talking about Harish's use case.
> > >
> > >  My use case is about 50 Dags (each one with about 2-3 tasks). I feel
> our
> > > run interval setting for the dags are too low (~15 mins). It may result
> > in
> > > high CPU of MySQL.
> > >
> > >  Meanwhile, I dig to MySQL and I noticed a frequently running SQL
> > statement
> > > as below. It's without proper index on column task_instance.state.
> > >
> > > Shouldn't it index "state", given that there could be million of rows
> in
> > > task_instance?
> > >
> > > SQL Statement:
> > > "SELECT task_instance.task_id AS task_instance_task_id,
> > > task_instance.dag_id AS task_instance_dag_id, FROM task_instance
> > WHERE
> > > task_instance.state = 'queued'"
> > >
> > > Also, is there a possibility to clean some "unneeded" entries in the
> > tables
> > > (say, task_instance) ?  I mean, for example, removing task states older
> > > than 6 months?
> > >
> > > Feedback are welcome.
> > >
> > > Thanks.
> > >
> > > -Jason
> > >
> > >
> > >
> > > On Tue, Mar 7, 2017 at 11:45 AM, Bolke de Bruin 
> > wrote:
> > >
> > > > Hi Jason
> > > >
> > > > I think you need to back it up with more numbers. You assume that a
> > load
> > > > of 100% is bad and also that 16GB 

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-13 Thread Dan Davydov
I'll test this on staging as soon as I get a chance (the testing is
non-blocking on the rc5). Bolke very much in particular :).

On Mon, Mar 13, 2017 at 10:46 AM, Jeremiah Lowin  wrote:

> +1 (binding) extremely impressed by the work and diligence all contributors
> have put in to getting these blockers fixed, Bolke in particular.
>
> On Mon, Mar 13, 2017 at 1:07 AM Arthur Wiedmer  wrote:
>
> > +1 (binding)
> >
> > Thanks again for steering us through Bolke.
> >
> > Best,
> > Arthur
> >
> > On Sun, Mar 12, 2017 at 9:59 PM, Bolke de Bruin 
> wrote:
> >
> > > Dear All,
> > >
> > > Finally, I have been able to make the FIFTH RELEASE CANDIDATE of
> Airflow
> > > 1.8.0 available at: https://dist.apache.org/repos/
> > > dist/dev/incubator/airflow/  > > repos/dist/dev/incubator/airflow/> , public keys are available at
> > > https://dist.apache.org/repos/dist/release/incubator/airflow/ <
> > > https://dist.apache.org/repos/dist/release/incubator/airflow/> . It is
> > > tagged with a local version “apache.incubating” so it allows upgrading
> > from
> > > earlier releases.
> > >
> > > Issues fixed since rc4:
> > >
> > > [AIRFLOW-900] Double trigger should not kill original task instance
> > > [AIRFLOW-900] Fixes bugs in LocalTaskJob for double run protection
> > > [AIRFLOW-932] Do not mark tasks removed when backfilling
> > > [AIRFLOW-961] run onkill when SIGTERMed
> > > [AIRFLOW-910] Use parallel task execution for backfills
> > > [AIRFLOW-967] Wrap strings in native for py2 ldap compatibility
> > > [AIRFLOW-941] Use defined parameters for psycopg2
> > > [AIRFLOW-719] Prevent DAGs from ending prematurely
> > > [AIRFLOW-938] Use test for True in task_stats queries
> > > [AIRFLOW-937] Improve performance of task_stats
> > > [AIRFLOW-933] use ast.literal_eval rather eval because ast.literal_eval
> > > does not execute input.
> > > [AIRFLOW-919] Running tasks with no start date shouldn't break a DAGs
> UI
> > > [AIRFLOW-897] Prevent dagruns from failing with unfinished tasks
> > > [AIRFLOW-861] make pickle_info endpoint be login_required
> > > [AIRFLOW-853] use utf8 encoding for stdout line decode
> > > [AIRFLOW-856] Make sure execution date is set for local client
> > > [AIRFLOW-830][AIRFLOW-829][AIRFLOW-88] Reduce Travis log verbosity
> > > [AIRFLOW-794] Access DAGS_FOLDER and SQL_ALCHEMY_CONN exclusively from
> > > settings
> > > [AIRFLOW-694] Fix config behaviour for empty envvar
> > > [AIRFLOW-365] Set dag.fileloc explicitly and use for Code view
> > > [AIRFLOW-931] Do not set QUEUED in TaskInstances
> > > [AIRFLOW-899] Tasks in SCHEDULED state should be white in the UI
> instead
> > > of black
> > > [AIRFLOW-895] Address Apache release incompliancies
> > > [AIRFLOW-893][AIRFLOW-510] Fix crashing webservers when a dagrun has no
> > > start date
> > > [AIRFLOW-793] Enable compressed loading in S3ToHiveTransfer
> > > [AIRFLOW-863] Example DAGs should have recent start dates
> > > [AIRFLOW-869] Refactor mark success functionality
> > > [AIRFLOW-856] Make sure execution date is set for local client
> > > [AIRFLOW-814] Fix Presto*CheckOperator.__init__
> > > [AIRFLOW-844] Fix cgroups directory creation
> > >
> > > No known issues anymore.
> > >
> > > I would also like to raise a VOTE for releasing 1.8.0 based on release
> > > candidate 5, i.e. just renaming release candidate 5 to 1.8.0 release.
> > >
> > > Please respond to this email by:
> > >
> > > +1,0,-1 with *binding* if you are a PMC member or *non-binding* if you
> > are
> > > not.
> > >
> > > Thanks!
> > > Bolke
> > >
> > > My VOTE: +1 (binding)
> >
>


Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-15 Thread Dan Davydov
 > >>>>>>> Ruslan Dautkhanov
> > >>>>>>>
> > >>>>>>>> On Tue, Mar 14, 2017 at 6:19 PM, siddharth anand <
> > san...@apache.org>
> > >>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>> FYI,
> > >>>>>>>> I've just hit a major bug in the release candidate related to
> > "clear
> > >>>>>> task"
> > >>>>>>>> behavior.
> > >>>>>>>>
> > >>>>>>>> I've been running airflow in both stage and prod since yesterday
> > on
> > >>>>>> rc5 and
> > >>>>>>>> have reproduced this in both environments. I will file a JIRA
> for
> > >>>> this
> > >>>>>>>> tonight, but wanted to send a note over email as well.
> > >>>>>>>>
> > >>>>>>>> In my example, I have a 2 task DAG. For a given DAG run that has
> > >>>>>> completed
> > >>>>>>>> successfully, if I
> > >>>>>>>> 1) clear task2 (leaf task in this case), the
> previously-successful
> > >>>> DAG
> > >>>>>> Run
> > >>>>>>>> goes back to Running, requeues, and executes the task
> > successfully.
> > >>>>>> The DAG
> > >>>>>>>> Run the returns from Running to Success.
> > >>>>>>>> 2) clear task1 (root task in this case), the
> previously-successful
> > >>>> DAG
> > >>>>>> Run
> > >>>>>>>> goes back to Running, DOES NOT requeue or execute the task at
> all.
> > >>>> The
> > >>>>>> DAG
> > >>>>>>>> Run the returns from Running to Success though it never ran the
> > task.
> > >>>>>>>>
> > >>>>>>>> 1) is expected and previous behavior. 2) is a regression.
> > >>>>>>>>
> > >>>>>>>> The only workaround is to use the CLI to run the task cleared.
> > Here
> > >>>> are
> > >>>>>>>> some images :
> > >>>>>>>> *After Clearing the Tasks*
> > >>>>>>>> https://www.dropbox.com/s/wmuxt0krwx6wurr/Screenshot%
> > >>>>>>>> 202017-03-14%2014.09.34.png?dl=0
> > >>>>>>>>
> > >>>>>>>> *After DAG Runs return to Success*
> > >>>>>>>> https://www.dropbox.com/s/qop933rzgdzchpd/Screenshot%
> > >>>>>>>> 202017-03-14%2014.09.49.png?dl=0
> > >>>>>>>>
> > >>>>>>>> This is a major regression because it will force everyone to use
> > the
> > >>>>>> CLI
> > >>>>>>>> for things that they would normally use the UI for.
> > >>>>>>>>
> > >>>>>>>> -s
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> -s
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> On Tue, Mar 14, 2017 at 1:32 PM, Daniel Huang <
> dxhu...@gmail.com
> > >
> > >>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> +1 (non-binding)!
> > >>>>>>>>>
> > >>>>>>>>> On Tue, Mar 14, 2017 at 11:35 AM, siddharth anand <
> > >>>> san...@apache.org>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> +1 (binding)
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Tue, Mar 14, 2017 at 8:42 AM, Maxime Beauchemin <
> > >>>>>>>>>> maximebeauche...@gmail.com> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> +1 (binding)
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Tue, Mar 14, 2017 at 3:59 AM, Alex Van Boxel <
> 

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-15 Thread Dan Davydov
Upon further investigation this was caused by a change in the semantics of
ALL_SUCCESS, which I have these feelings about:
Intuitively you would expect to skip any task that has dependencies that
weren't run by default, i.e. the trigger rule is called ALL_SUCCESS and
skipped tasks are not successful ones, and that was also the old behavior
in 1.7.3.

This is going to break some use cases which could be alright, but I feel
these new semantics make less sense than before so it's a bad reason to
break existing use cases.

I will get started on a PR for a new ALL_SUCCESS_NOT_SKIPPED trigger rule
but again I feel this is hacky and really we should have the old
ALL_SUCCESS (default) and a new ALL_SUCCESS_OR_SKIPPED trigger rule if
desired.

On Wed, Mar 15, 2017 at 6:25 PM, Dan Davydov  wrote:

> Another issue we are seeing is https://issues.apache.org/
> jira/browse/AIRFLOW-992 - tasks that have both skipped children and
> successful children are run instead of skipped. Not blocking the release on
> this just letting you guys know for the release bug notes. We will be
> cherrypicking a fix for this onto our production when we release 1.8 once
> we come up with one.
>
> It's possibly thought not necessarily related to an incomplete/incorrect
> fix of https://issues.apache.org/jira/browse/AIRFLOW-719 .
>
> On Wed, Mar 15, 2017 at 4:53 PM, siddharth anand 
> wrote:
>
>> Confirmed that Bolke's PR above fixes the issue.
>>
>> Also, I agree this is not a blocker for the current airflow release, so my
>> +1 (binding) stands.
>> -s
>>
>> On Wed, Mar 15, 2017 at 3:11 PM, Bolke de Bruin 
>> wrote:
>>
>> > PR is available: https://github.com/apache/incubator-airflow/pull/2154
>> >
>> > But marked for 1.8.1.
>> >
>> > - Bolke
>> >
>> > > On 15 Mar 2017, at 14:37, Bolke de Bruin  wrote:
>> > >
>> > > On second thought I do consider it a bug and can have a fix out pretty
>> > quickly, but I don’t consider it a blocker.
>> > >
>> > > - B.
>> > >
>> > >> On 15 Mar 2017, at 14:21, Bolke de Bruin  wrote:
>> > >>
>> > >> Just to be clear: Also in 1.7.1 the DagRun was marked successful, but
>> > its tasks continued to be scheduled. So one could also consider 1.7.1
>> > behaviour a bug. I am not sure here, but I think it kind of makes sense
>> to
>> > consider the behaviour of 1.7.1 a bug. It has been present throughout
>> all
>> > the 1.8 rc/beta/apha series.
>> > >>
>> > >> So yes it is a change in behaviour whether it is a regression or an
>> > integrity improvement is up for discussion. Either way I don’t consider
>> it
>> > a blocker.
>> > >>
>> > >> Bolke.
>> > >>
>> > >>> On 15 Mar 2017, at 14:06, siddharth anand 
>> wrote:
>> > >>>
>> > >>> Here's the JIRA :
>> > >>> https://issues.apache.org/jira/browse/AIRFLOW-989
>> > >>>
>> > >>> I confirmed it is a regression from 1.7.1.3, which I installed via
>> pip
>> > and
>> > >>> tested against the same DAG in the JIRA.
>> > >>>
>> > >>> The issue occurs if a leaf / last / terminal downstream task is not
>> > >>> cleared. You won't see this issue if you clear the entire DAG Run or
>> > clear
>> > >>> a task and all of its downstream tasks. If you truly want to only
>> > clear and
>> > >>> rerun a task, but not its downstream tasks, you can use the CLI to
>> > execute
>> > >>> a specific task (e.g. vial airflow run).
>> > >>>
>> > >>> This is a change in behavior -- if we do go ahead with the release,
>> > then
>> > >>> this JIRA should be in a list of JIRAs of known issues related to
>> the
>> > new
>> > >>> version.
>> > >>> -s
>> > >>>
>> > >>> On Wed, Mar 15, 2017 at 9:17 AM, Chris Riccomini <
>> > criccom...@apache.org>
>> > >>> wrote:
>> > >>>
>> > >>>> @Sid, does this happen if you clear downstream as well?
>> > >>>>
>> > >>>> On Wed, Mar 15, 2017 at 9:04 AM, Chris Riccomini <
>> > criccom...@apache.org>
>> > >>>> wrote:
>> > >>>>
>> > >>>>> Has anyone been able to reproduce Sid's issue?
>> > >

Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-15 Thread Dan Davydov
The only thing is that this is a change in semantics and changing semantics
(breaking some DAGs) and then changing them back (and breaking things
again) isn't great.

On Wed, Mar 15, 2017 at 7:02 PM, Bolke de Bruin  wrote:

> Indeed that could be the case. Let's get 1.8.0 out the door so we can
> focus on these bug fixes for 1.8.1.
>
> Bolke
>
> Sent from my iPhone
>
> > On 15 Mar 2017, at 18:25, Dan Davydov 
> wrote:
> >
> > Another issue we are seeing is
> > https://issues.apache.org/jira/browse/AIRFLOW-992 - tasks that have both
> > skipped children and successful children are run instead of skipped. Not
> > blocking the release on this just letting you guys know for the release
> bug
> > notes. We will be cherrypicking a fix for this onto our production when
> we
> > release 1.8 once we come up with one.
> >
> > It's possibly thought not necessarily related to an incomplete/incorrect
> > fix of https://issues.apache.org/jira/browse/AIRFLOW-719 .
> >
> >> On Wed, Mar 15, 2017 at 4:53 PM, siddharth anand 
> wrote:
> >>
> >> Confirmed that Bolke's PR above fixes the issue.
> >>
> >> Also, I agree this is not a blocker for the current airflow release, so
> my
> >> +1 (binding) stands.
> >> -s
> >>
> >>> On Wed, Mar 15, 2017 at 3:11 PM, Bolke de Bruin 
> wrote:
> >>>
> >>> PR is available: https://github.com/apache/incubator-airflow/pull/2154
> >>>
> >>> But marked for 1.8.1.
> >>>
> >>> - Bolke
> >>>
> >>>> On 15 Mar 2017, at 14:37, Bolke de Bruin  wrote:
> >>>>
> >>>> On second thought I do consider it a bug and can have a fix out pretty
> >>> quickly, but I don’t consider it a blocker.
> >>>>
> >>>> - B.
> >>>>
> >>>>> On 15 Mar 2017, at 14:21, Bolke de Bruin  wrote:
> >>>>>
> >>>>> Just to be clear: Also in 1.7.1 the DagRun was marked successful, but
> >>> its tasks continued to be scheduled. So one could also consider 1.7.1
> >>> behaviour a bug. I am not sure here, but I think it kind of makes sense
> >> to
> >>> consider the behaviour of 1.7.1 a bug. It has been present throughout
> all
> >>> the 1.8 rc/beta/apha series.
> >>>>>
> >>>>> So yes it is a change in behaviour whether it is a regression or an
> >>> integrity improvement is up for discussion. Either way I don’t consider
> >> it
> >>> a blocker.
> >>>>>
> >>>>> Bolke.
> >>>>>
> >>>>>> On 15 Mar 2017, at 14:06, siddharth anand 
> wrote:
> >>>>>>
> >>>>>> Here's the JIRA :
> >>>>>> https://issues.apache.org/jira/browse/AIRFLOW-989
> >>>>>>
> >>>>>> I confirmed it is a regression from 1.7.1.3, which I installed via
> >> pip
> >>> and
> >>>>>> tested against the same DAG in the JIRA.
> >>>>>>
> >>>>>> The issue occurs if a leaf / last / terminal downstream task is not
> >>>>>> cleared. You won't see this issue if you clear the entire DAG Run or
> >>> clear
> >>>>>> a task and all of its downstream tasks. If you truly want to only
> >>> clear and
> >>>>>> rerun a task, but not its downstream tasks, you can use the CLI to
> >>> execute
> >>>>>> a specific task (e.g. vial airflow run).
> >>>>>>
> >>>>>> This is a change in behavior -- if we do go ahead with the release,
> >>> then
> >>>>>> this JIRA should be in a list of JIRAs of known issues related to
> the
> >>> new
> >>>>>> version.
> >>>>>> -s
> >>>>>>
> >>>>>> On Wed, Mar 15, 2017 at 9:17 AM, Chris Riccomini <
> >>> criccom...@apache.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> @Sid, does this happen if you clear downstream as well?
> >>>>>>>
> >>>>>>> On Wed, Mar 15, 2017 at 9:04 AM, Chris Riccomini <
> >>> criccom...@apache.org>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Has anyone been able to reproduce Sid's issue?
> >>>>

Re: Airflow Committers: Landscape checks doing more harm than good?

2017-03-16 Thread Dan Davydov
+1 as well though I have found it useful on larger PRs to help me catch
some issues so it probably makes sense to make to add the travis linting at
the same time we remove it in landscape. Not sure how much usability we
lose by losing the landscape UI but I like that all of the errors would be
in one place.

On Thu, Mar 16, 2017 at 4:51 PM, Bolke de Bruin  wrote:

> We can do it in Travis’ afaik. We should replace it.
>
> So +1.
>
> B.
>
> > On 16 Mar 2017, at 16:48, Jeremiah Lowin  wrote:
> >
> > This may be an unpopular opinion, but most Airflow PRs have a little red
> > "x" next to them not because they have failing unit tests, but because
> the
> > Landscape check has decided they introduce bad code.
> >
> > Unfortunately Landscape is often wrong -- here it is telling me my latest
> > PR introduced no less than 30 errors... in files I didn't touch!
> > https://github.com/apache/incubator-airflow/pull/2157 (however, it
> gives me
> > credit for fixing 23 errors in those same files, so I've got that going
> for
> > me... which is nice.)
> >
> > The upshot is that Github's "health" indicator can be swayed by minor or
> > erroneous issues, and therefore it serves little purpose other than
> making
> > it look like every PR is bad. This creates committer fatigue, since every
> > PR needs to be parsed to see if it actually is OK or not.
> >
> > Don't get me wrong, I'm all for proper style and on occasion Landscape
> has
> > pointed out problems that I've gone and fixed. But on the whole, I
> believe
> > that having it as part of our red / green PR evaluation -- equal to and
> > often superseding unit tests -- is harmful. I'd much rather be able to
> scan
> > the PR list and know unequivocally that "green" indicates ready to merge.
> >
> > J
>
>


Re: Make Scheduler More Centralized

2017-03-17 Thread Dan Davydov
I'm not convinced that this would add *that* much more load, we could
probably change this functionality now if we wanted to. Just my two cents.

On Thu, Mar 16, 2017 at 4:06 PM, Rui Wang  wrote:

> Thanks all your comments!
>
> Then looks like we should focus on scalability of scheduler now rather
> than adding more load on it. I will give up this centralized idea now.
>
> On Tue, Mar 14, 2017 at 3:08 PM, Rui Wang  wrote:
>
>> Hi,
>> The design doc below I created is trying to make airflow scheduler more
>> centralized. Briefly speaking, I propose moving state change of
>> TaskInstance to scheduler. You can see the reasons for this change below.
>>
>>
>> Could you take a look and comment if you see anything does not make sense?
>>
>> -Rui
>>
>> 
>> --
>> Current The state of TaskInstance is changed by both scheduler and
>> worker. On worker side, worker monitors TaskInstance and changes the state
>> to RUNNING, SUCCESS, if task succeed, or to UP_FOR_RETRY, FAILED if task
>> fail. Worker also does failure email logic and failure callback logic.
>> Proposal The general idea is to make a centralized scheduler and make
>> workers dumb. Worker should not change state of TaskInstance, but just
>> executes what it is assigned and reports the result of the task. Instead,
>> the scheduler should make the decision on TaskInstance state change.
>> Ideally, workers should not even handle the failure emails and callbacks
>> unless the scheduler asks it to do so.
>> Why Worker does not have as much information as scheduler has. There
>> were bugs observed caused by worker when worker gets into trouble but
>> cannot make decision to change task state due to lack of information.
>> Although there is airflow metadata DB, it is still not easy to share all
>> information that scheduler has with workers.
>>
>> We can also ensure a consistent environment. There are slight differences
>> in the chef recipes for the different workers which can cause strange
>> issues when DAGs parse on one but not the other.
>>
>> In the meantime, moving state changes to the scheduler can reduce the
>> complexity of airflow. It especially helps when airflow needs to move to
>> distributed schedulers. In that case state change everywhere by both
>> schedulers and workers are harder to maintain.
>> How to change After lots of discussions, following step will be done:
>>
>> 1. Add a new column to TaskInstance table. Worker will fill this column
>> with the task process exit code.
>>
>> 2. Worker will only set TaskInstance state to RUNNING when it is ready to
>> run task. There was debate on moving RUNNING to scheduler as well. If
>> moving RUNNING to scheduler, either scheduler marks TaskInstance RUNNING
>> before it gets into queue, or scheduler checks the status code in column
>> above, which is updated by worker when worker is ready to run task. In
>> Former case, from user's perspective, it is bad to mark TaskInstance as
>> RUNNING when worker is not ready to run. User could be confused. In the
>> latter case, scheduler could mark task as RUNNING late due to schedule
>> interval. It is still not a good user experience. Since only worker knows
>> when is ready to run task, worker should still deliver this message to user
>> by setting RUNNING state.
>>
>> 3. In any other cases, worker should not change state of TaskInstance,
>> but save defined status code into column above.
>>
>> 4. Worker still handles failure emails and callbacks because there were
>> concern that scheduler could use too much resource to run failure callbacks
>> given unpredictable callback sizes. ( I think ideally scheduler should
>> treat failure callbacks and emails as tasks, and assign such tasks to
>> workers after TaskInstance state changes correspondingly). Eventually this
>> logic will be moved to the workers once there is support for multiple
>> distributed schedulers.
>>
>> 5. In scheduler's loop, scheduler should check TaskInstance status code,
>> then change state and retry/fail TaskInstance correspondingly.
>>
>
>


Re: [RESULT][VOTE]Release Airflow 1.8.0 based on Airflow 1.8.0rc5

2017-03-17 Thread Dan Davydov
That's reasonable (treating it a bug instead of a change in behavior). Full
speed ahead!

On Thu, Mar 16, 2017 at 9:01 AM, Bolke de Bruin  wrote:

> Hello,
>
> Apache Airflow (incubating) 1.8.0 (RC5) has been accepted.
>
> 9 “+1” votes received:
>
> - Maxime Beauchemin (binding)
> - Chris Riccomini (binding)
> - Arthur Wiedmer (binding)
> - Jeremiah Lowin (binding)
> - Siddharth Anand (binding)
> - Alex van Boxel (binding)
> - Bolke de Bruin (binding)
>
> - Daniel Huang (non-binding)
>
> Vote thread (start):
> http://mail-archives.apache.org/mod_mbox/incubator-
> airflow-dev/201703.mbox/%3cB1833A3A-05FB-4112-B395-
> 135caf930...@gmail.com%3e
>
> Next steps:
> 1) will start the voting process at the IPMC mailinglist. I don’t expect
> changes.
> 2) Only after the positive voting on the IPMC and finalisation I will
> rebrand the RC to Release.
> 3) I will upload it to the incubator release page, then the tar ball needs
> to propagate to the mirrors.
> 4) Update the website (can someone volunteer please?)
> 5) Finally I will ask Maxime to upload it to pypi. It seems we can keep
> the apache branding as lib cloud is doing this as well (
> https://libcloud.apache.org/downloads.html#pypi-package).
>
> Cheers,
>
> Bolke


Re: 1.8.1 release

2017-03-21 Thread Dan Davydov
Here is my list for targeted 1.8.1 fixes:
https://issues.apache.org/jira/browse/AIRFLOW-982
https://issues.apache.org/jira/browse/AIRFLOW-983
https://issues.apache.org/jira/browse/AIRFLOW-1019 (and in general the slow
startup time from this new logic of orphaned/reset task)
https://issues.apache.org/jira/browse/AIRFLOW-1017 (which I will hopefully
have a fix out for soon just finishing up tests)

We are also hitting a new issue with subdags with rc5 that we weren't
hitting with rc4 where subdags will occasionally just hang (had to roll
back from rc5 to rc4), I'll try to spin up a JIRA for it soon which should
be on the list too.


On Tue, Mar 21, 2017 at 1:54 PM, Chris Riccomini 
wrote:

> Agreed. I'm looking for a list of checksums/JIRAs that we want in the
> bugfix release.
>
> On Tue, Mar 21, 2017 at 12:54 PM, Bolke de Bruin 
> wrote:
>
> >
> >
> > > On 21 Mar 2017, at 12:51, Bolke de Bruin  wrote:
> > >
> > > My suggestion, as we are using semantic versioning is:
> > >
> > > 1) no new features in the 1.8 branch
> > > 2) only bug fixes in the 1.8 branch
> > > 3) new features to land in 1.9
> > >
> > > This allows companies to
> >
> > Have a "known" version and can move to the new branch when they want to
> > get new features. Obviously we only support N-1, so when 1.10 comes out
> we
> > stop supporting 1.8.X.
> >
> > >
> > > Sent from my iPhone
> > >
> > >> On 21 Mar 2017, at 11:22, Chris Riccomini 
> > wrote:
> > >>
> > >> Hey all,
> > >>
> > >> I suggest that we start a 1.8.1 Airflow release now. The goal would
> be:
> > >>
> > >> 1) get a second release under our belt
> > >> 2) patch known issues with the 1.8.0 release
> > >>
> > >> I'm happy to run it, but I saw Maxime mentioning that Airbnb might
> want
> > to.
> > >> @Max et al, can you comment?
> > >>
> > >> Also, can folks supply JIRAs for stuff that think needs to be in the
> > 1.8.1
> > >> bugfix release?
> > >>
> > >> Cheers,
> > >> Chris
> >
>


Re: 1.8.1 release

2017-03-21 Thread Dan Davydov
It seemed to only affect some subdags (but I think even when restarted they
were still affected), it seemed like a race condition.

For the second question, we did not yet (but checking this is part of the
ticket).

On Tue, Mar 21, 2017 at 3:01 PM, Bolke de Bruin  wrote:

> @dan
>
> I'm obviously interested in the subdag issue as it is executed by the
> backfill logic. Do you have anything to reproduce it with? Can also talk
> about it tomorrow.
>
> Secondly, did you verify the all success / skipped 'fix' against 'wait for
> all tasks to finish'?
>
> @chris I also suggest using/embracing jira more (as you are doing), as it
> helps with cleaner changelogs, tracking and targeting releases.
>
> Also note that I already included some fixes in v1-8-test.
>
> Bolke
>
> Sent from my iPhone
>
> > On 21 Mar 2017, at 14:29, Ruslan Dautkhanov 
> wrote:
> >
> > Some of the issues I ran into while testing 1.8rc5 :
> >
> > https://issues.apache.org/jira/browse/AIRFLOW-1015
> >> https://issues.apache.org/jira/browse/AIRFLOW-1013
> >> https://issues.apache.org/jira/browse/AIRFLOW-1004
> >> https://issues.apache.org/jira/browse/AIRFLOW-1003
> >> https://issues.apache.org/jira/browse/AIRFLOW-1001
> >> https://issues.apache.org/jira/browse/AIRFLOW-1015
> >
> >
> > It would be great to have at least some of them fixed in 1.8.1.
> >
> > Thank you.
> >
> >
> >
> >
> > --
> > Ruslan Dautkhanov
> >
> > On Tue, Mar 21, 2017 at 3:02 PM, Dan Davydov  invalid
> >> wrote:
> >
> >> Here is my list for targeted 1.8.1 fixes:
> >> https://issues.apache.org/jira/browse/AIRFLOW-982
> >> https://issues.apache.org/jira/browse/AIRFLOW-983
> >> https://issues.apache.org/jira/browse/AIRFLOW-1019 (and in general the
> >> slow
> >> startup time from this new logic of orphaned/reset task)
> >> https://issues.apache.org/jira/browse/AIRFLOW-1017 (which I will
> hopefully
> >> have a fix out for soon just finishing up tests)
> >>
> >> We are also hitting a new issue with subdags with rc5 that we weren't
> >> hitting with rc4 where subdags will occasionally just hang (had to roll
> >> back from rc5 to rc4), I'll try to spin up a JIRA for it soon which
> should
> >> be on the list too.
> >>
> >>
> >> On Tue, Mar 21, 2017 at 1:54 PM, Chris Riccomini  >
> >> wrote:
> >>
> >>> Agreed. I'm looking for a list of checksums/JIRAs that we want in the
> >>> bugfix release.
> >>>
> >>> On Tue, Mar 21, 2017 at 12:54 PM, Bolke de Bruin 
> >>> wrote:
> >>>
> >>>>
> >>>>
> >>>>> On 21 Mar 2017, at 12:51, Bolke de Bruin  wrote:
> >>>>>
> >>>>> My suggestion, as we are using semantic versioning is:
> >>>>>
> >>>>> 1) no new features in the 1.8 branch
> >>>>> 2) only bug fixes in the 1.8 branch
> >>>>> 3) new features to land in 1.9
> >>>>>
> >>>>> This allows companies to
> >>>>
> >>>> Have a "known" version and can move to the new branch when they want
> to
> >>>> get new features. Obviously we only support N-1, so when 1.10 comes
> out
> >>> we
> >>>> stop supporting 1.8.X.
> >>>>
> >>>>>
> >>>>> Sent from my iPhone
> >>>>>
> >>>>>> On 21 Mar 2017, at 11:22, Chris Riccomini 
> >>>> wrote:
> >>>>>>
> >>>>>> Hey all,
> >>>>>>
> >>>>>> I suggest that we start a 1.8.1 Airflow release now. The goal would
> >>> be:
> >>>>>>
> >>>>>> 1) get a second release under our belt
> >>>>>> 2) patch known issues with the 1.8.0 release
> >>>>>>
> >>>>>> I'm happy to run it, but I saw Maxime mentioning that Airbnb might
> >>> want
> >>>> to.
> >>>>>> @Max et al, can you comment?
> >>>>>>
> >>>>>> Also, can folks supply JIRAs for stuff that think needs to be in the
> >>>> 1.8.1
> >>>>>> bugfix release?
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Chris
> >>>>
> >>>
> >>
>


Welcome @saguziel as a committer and PMC member!

2017-04-13 Thread Dan Davydov
Alex (@saguziel - AirBnB) has been making contributions and reviews for
quite a long time now and I'm very happy to say he has just become an
official committer and PMC member.

He has ~13 commits, most of which are to the core of Airflow, and has been
active reviewing open source PRs, contributing in the recent release (e.g.
fixing blocking issues), and has a strong understanding of the the core
Airflow logic (he has submitted a couple of patches to remove race
conditions, and security patches).

Congratulations and welcome Alex!
-Dan


Re: dag file processing times

2017-04-24 Thread Dan Davydov
One idea to solve this is to use a daemon that uses inotify to watch for
changes in files and then reprocesses just those files. The hard part is
without any kind of dependency/build system for DAGs it can be hard to tell
which DAGs depend on which files.

On Mon, Apr 24, 2017 at 1:21 PM, Gerard Toonstra 
wrote:

> Hey,
>
> I've seen some people complain about DAG file processing times. An issue
> was raised about this today:
>
> https://issues.apache.org/jira/browse/AIRFLOW-1139
>
> I attempted to provide a good explanation what's going on. Feel free to
> validate and comment.
>
>
> I'm noticing that the file processor is a bit naive in the way it
> reprocesses DAGs. It doesn't look at the DAG interval for example, so it
> looks like it reprocesses all files continuously in one big batch, even if
> we can determine that the next "schedule"  for all its dags are in the
> future?
>
>
> Wondering if a change in the DagFileProcessingManager could optimize things
> a bit here.
>
> In the part where it gets the simple_dags from a file it's currently
> processing:
>
> for simple_dag in processor.result:
> simple_dags.append(simple_dag)
>
> the file_path is in the context and the simple_dags should be able to
> provide the next interval date for each dag in the file.
>
> The idea is to add files to a sorted deque by "next_schedule_datetime" (the
> minimum next interval date), so that when we build the list
> "files_paths_to_queue", it can remove files that have dags that we know
> won't have a new dagrun for a while.
>
> One gotcha to resolve after that is to deal with files getting updated with
> new dags or changed dag definitions and renames and different interval
> schedules.
>
> Worth a PR to glance over?
>
> Rgds,
>
> Gerard
>


Re: dag file processing times

2017-04-24 Thread Dan Davydov
Was talking with Alex about the DB case offline, for those we could support
a force refresh arg with an interval param.

Manifests would need to be hierarchal but I feel like it would spin out
into a full blown build system inevitably.

On Mon, Apr 24, 2017 at 3:02 PM, Arthur Wiedmer 
wrote:

> What if the DAG actually depends on configuration that only exists in a
> database and is retrieved by the Python code generating the DAG?
>
> Just asking because we have this case in production here. It is slowly
> changing, so still fits within the Airflow framework, but you cannot just
> watch a file...
>
> Best,
> Arthur
>
> On Mon, Apr 24, 2017 at 2:55 PM, Bolke de Bruin  wrote:
>
> > Inotify can work without a daemon. Just fire a call to the API when a
> file
> > changes. Just a few lines in bash.
> >
> > If you bundle you dependencies in a zip you should be fine with the
> above.
> > Or if we start using manifests that list the files that are needed in a
> > dag...
> >
> >
> > Sent from my iPhone
> >
> > > On 24 Apr 2017, at 22:46, Dan Davydov 
> > wrote:
> > >
> > > One idea to solve this is to use a daemon that uses inotify to watch
> for
> > > changes in files and then reprocesses just those files. The hard part
> is
> > > without any kind of dependency/build system for DAGs it can be hard to
> > tell
> > > which DAGs depend on which files.
> > >
> > > On Mon, Apr 24, 2017 at 1:21 PM, Gerard Toonstra 
> > > wrote:
> > >
> > >> Hey,
> > >>
> > >> I've seen some people complain about DAG file processing times. An
> issue
> > >> was raised about this today:
> > >>
> > >> https://issues.apache.org/jira/browse/AIRFLOW-1139
> > >>
> > >> I attempted to provide a good explanation what's going on. Feel free
> to
> > >> validate and comment.
> > >>
> > >>
> > >> I'm noticing that the file processor is a bit naive in the way it
> > >> reprocesses DAGs. It doesn't look at the DAG interval for example, so
> it
> > >> looks like it reprocesses all files continuously in one big batch,
> even
> > if
> > >> we can determine that the next "schedule"  for all its dags are in the
> > >> future?
> > >>
> > >>
> > >> Wondering if a change in the DagFileProcessingManager could optimize
> > things
> > >> a bit here.
> > >>
> > >> In the part where it gets the simple_dags from a file it's currently
> > >> processing:
> > >>
> > >>for simple_dag in processor.result:
> > >>simple_dags.append(simple_dag)
> > >>
> > >> the file_path is in the context and the simple_dags should be able to
> > >> provide the next interval date for each dag in the file.
> > >>
> > >> The idea is to add files to a sorted deque by "next_schedule_datetime"
> > (the
> > >> minimum next interval date), so that when we build the list
> > >> "files_paths_to_queue", it can remove files that have dags that we
> know
> > >> won't have a new dagrun for a while.
> > >>
> > >> One gotcha to resolve after that is to deal with files getting updated
> > with
> > >> new dags or changed dag definitions and renames and different interval
> > >> schedules.
> > >>
> > >> Worth a PR to glance over?
> > >>
> > >> Rgds,
> > >>
> > >> Gerard
> > >>
> >
>


Re: Discussion on Airflow 1.8.1 RC2

2017-05-03 Thread Dan Davydov
cc Alex and Rui who were working on fixes, I'm not sure if their commits
got in before 1.8.1.

On Wed, May 3, 2017 at 1:09 PM, Bolke de Bruin  wrote:

> Hi Dan,
>
> (Thread renamed to make sure it does not clash, dev@ now added)
>
> It surprises me that you found regression from 1.8.0 to 1.8.1 as 1.8.1 is
> very much focused on bug fixes. Were the regressions shared yet?
>
> The whole 1.8.X release will be bug fix focused (per release management)
> and minor feature updates. The 1.9.0 release will be the first release with
> major feature updates. So what you want, more robustness and focus on
> stability, is now underway. I agree with beefing up tests and including the
> major operators in this. Executors should also be on this list btw. Turning
> on coverage reporting might be a first step in helping this (it isn’t the
> solution obviously).
>
> Cheers
> Bolke
>
>
> On 3 May 2017, at 20:28, Dan Davydov  wrote:
>
> We saw several regressions moving from 1.8.0 to 1.8.1 the first time we
> tried, and while I think we merged all our fixes to master (not sure if
> they all made it into 1.8.1 however), we have put releasing on hold due to
> stability issues from the last couple of releases. It's either the case
> that:
> A) Airbnb requires more robustness from new releases.
> or
> B) Most companies using Airflow require more robustness and we should halt
> on feature work until we are more confident in our testing
>
> I think the biggest problem currently is the lack of unit testing
> coverage, e.g. when the backfill framework was refactored (which was the
> right long-term fix), it caused a lot of breakages that weren't caught by
> tests. I think we need to audit the major operators/classes and beef up the
> unit testing coverage. The coverage metric does not necessarily cover these
> cases (e.g. cyclomatic complexity). Writing regression tests is good but we
> shouldn't have so many new blocker issues in our releases.
>
> We are fighting some fires internally at the moment (not Airflow related),
> but Alex and I have been working on some stuff that we will push to the
> community once we are done. Alex is working on a good solution for python
> package isolation, and I'm working on integration with Kubernetes at the
> executor level.
>
> Feel free to forward any of my messages to the dev mailing list.
>
> On Wed, May 3, 2017 at 11:18 AM, Bolke de Bruin  wrote:
>
>> Grrr, I seriously dislike to send button on the touch bar…here goes again.
>>
>> Hi Dan,
>>
>> (Please note I would like to forward the next message to dev@, but let
>> me know if you don’t find it comfortable)
>>
>> I understand your point. The gap between 1.7.1 was large in terms of
>> functionality changes etc. It was going to be a (bit?) rough and as you
>> guys are using many of the edge cases you probably found more issues than
>> any of us. Still, between 1.8.0 and 1.8.1 we have added many tests
>> (coverage increased from 67% to close to 69%, which is a lot as you know).
>> It would be nice if you can share where your areas of concern are so we can
>> address those and a suggestion on how to proceed with integration tests is
>> also welcome.
>>
>> You guys (=Airbnb) have been a bit quiet over the past couple of days, so
>> I am getting a bit worried in terms of engagement. Is that warranted?
>>
>> Cheers
>> Bolke
>>
>>
>> On 3 May 2017, at 20:13, Bolke de Bruin  wrote:
>>
>> Hi Dan,
>>
>> (Please note I would like to forward the next message to dev@, but let
>> me know if you don’t find it comfortable)
>>
>> I understand your point. The gap between 1.7.1 was large in terms of
>> functionality changes etc. It was going to be a (bit?) rough and as you
>> guys are using many of the edge cases you probably found more issues than
>> any of us. Still, between 1.8.0 and 1.8.1 we have added many tests
>> (coverage increased from 67
>>
>> On 3 May 2017, at 19:41, Arthur Wiedmer 
>> wrote:
>>
>> As a counterpoint,
>>
>> I am comfortable voting +1 on this release in the sense that it fixes
>> some of the issues with 1.8.0. It is unfortunate that we cannot test it on
>> the Airbnb production for now and we should definitely invest in increasing
>> testing coverage, but some of the fixes are needed for ease of use/adoption
>> (See for instance AIRFLOW-832), and this release is a step in the right
>> direction.
>>
>> Best,
>> Arthur
>>
>> On Wed, May 3, 2017 at 10:30 AM, Dan Davydov 
>> wrote:
>>
>>> I'm not comfortable voting without d

Re: Discussion on Airflow 1.8.1 RC2

2017-05-04 Thread Dan Davydov
Thinking back it may have been 1.8.0rc5-> 1.8.0 regressions. I am still
worried about the large number of PRs in 1.8.1 even if they are all bug
fixes though (known issues that we already have patches for vs unknown new
issues introduced with the 1.8.1 patches) , but I agree with your sentiment
that these PRs should most likely make things more stable.

On Thu, May 4, 2017 at 10:55 AM, Alex Guziel  wrote:

> I don't think any of the fixes I did were regressions.
>
> On Thu, May 4, 2017 at 8:11 AM, Bolke de Bruin  wrote:
>
>> I know of one that Alex wanted to get in, but wasn’t targeted for 1.8.1
>> in Jira and thus didn’t make the cut at RC time. There is is another one
>> out that seems to have stalled a bit (https://github.com/apache/inc
>> ubator-airflow/pull/2205).
>>
>> Reading the changelog of 1.8.1 I see bug fixes, apache requirements and
>> one “new” feature (UI lightning bolt). Regressions could have happened but
>> we have been quite vigilant on the fact that these bug fixes needed proper
>> tests, so I am very interested in 1.8.0 -> 1.8.1 regressions. If it is a
>> pre-backfill-change 1.8.0 to 1.8.1 regression then I would also like to
>> know, cause I made that change and feel responsible for it.
>>
>> Cheers
>> Bolke
>>
>>
>> On 3 May 2017, at 22:13, Dan Davydov  wrote:
>>
>> cc Alex and Rui who were working on fixes, I'm not sure if their commits
>> got in before 1.8.1.
>>
>> On Wed, May 3, 2017 at 1:09 PM, Bolke de Bruin  wrote:
>>
>>> Hi Dan,
>>>
>>> (Thread renamed to make sure it does not clash, dev@ now added)
>>>
>>> It surprises me that you found regression from 1.8.0 to 1.8.1 as 1.8.1
>>> is very much focused on bug fixes. Were the regressions shared yet?
>>>
>>> The whole 1.8.X release will be bug fix focused (per release management)
>>> and minor feature updates. The 1.9.0 release will be the first release with
>>> major feature updates. So what you want, more robustness and focus on
>>> stability, is now underway. I agree with beefing up tests and including the
>>> major operators in this. Executors should also be on this list btw. Turning
>>> on coverage reporting might be a first step in helping this (it isn’t the
>>> solution obviously).
>>>
>>> Cheers
>>> Bolke
>>>
>>>
>>> On 3 May 2017, at 20:28, Dan Davydov  wrote:
>>>
>>> We saw several regressions moving from 1.8.0 to 1.8.1 the first time we
>>> tried, and while I think we merged all our fixes to master (not sure if
>>> they all made it into 1.8.1 however), we have put releasing on hold due to
>>> stability issues from the last couple of releases. It's either the case
>>> that:
>>> A) Airbnb requires more robustness from new releases.
>>> or
>>> B) Most companies using Airflow require more robustness and we should
>>> halt on feature work until we are more confident in our testing
>>>
>>> I think the biggest problem currently is the lack of unit testing
>>> coverage, e.g. when the backfill framework was refactored (which was the
>>> right long-term fix), it caused a lot of breakages that weren't caught by
>>> tests. I think we need to audit the major operators/classes and beef up the
>>> unit testing coverage. The coverage metric does not necessarily cover these
>>> cases (e.g. cyclomatic complexity). Writing regression tests is good but we
>>> shouldn't have so many new blocker issues in our releases.
>>>
>>> We are fighting some fires internally at the moment (not Airflow
>>> related), but Alex and I have been working on some stuff that we will push
>>> to the community once we are done. Alex is working on a good solution for
>>> python package isolation, and I'm working on integration with Kubernetes at
>>> the executor level.
>>>
>>> Feel free to forward any of my messages to the dev mailing list.
>>>
>>> On Wed, May 3, 2017 at 11:18 AM, Bolke de Bruin 
>>> wrote:
>>>
>>>> Grrr, I seriously dislike to send button on the touch bar…here goes
>>>> again.
>>>>
>>>> Hi Dan,
>>>>
>>>> (Please note I would like to forward the next message to dev@, but let
>>>> me know if you don’t find it comfortable)
>>>>
>>>> I understand your point. The gap between 1.7.1 was large in terms of
>>>> functionality changes etc. It was going to be a (bit?) rough and as you
>>>> guys are usi

Re: Role Based Access Control for Airflow UI

2017-06-12 Thread Dan Davydov
Looks good to me in general, thanks for putting this together!

I think the ability to integrate with external RBAC systems like LDAP is
important (i.e. the Airflow DB should not be decoupled with the RBAC
database wherever possible).

I wouldn't be too worried about the permissions about refreshing DAGs, as
far as I know this functionality is no longer required with the new
webservers which reload state periodically, and will certainly be removed
when we have a better DAG consistency story.

I think it would also be good to think about this proposal/implementation
and how it applied in the API-driven world (e.g. when webserver hits APIs
like /clear on behalf of users instead of running commands against the
database directly).

On Mon, Jun 12, 2017 at 11:12 AM, Bolke de Bruin  wrote:

> Will respond but im traveling at the moment. Give me a few days.
>
> Sent from my iPhone
>
> > On 12 Jun 2017, at 13:39, Chris Riccomini  wrote:
> >
> > Hey all,
> >
> > Checking in on this. We spent a good chunk of time thinking about this,
> and
> > want to move forward with it, but want to make sure we're all on the same
> > page.
> >
> > Max? Bolke? Dan? Jeremiah?
> >
> > Cheers,
> > Chris
> >
> > On Thu, Jun 8, 2017 at 1:49 PM, kalpesh dharwadkar <
> > kalpeshdharwad...@gmail.com> wrote:
> >
> >> Hello everyone,
> >>
> >> As you all know, currently Airflow doesn’t have a built-in Role Based
> >> Access Control(RBAC) capability.  It does provide very limited
> >> authorization capability by providing admin, data_profiler, and user
> roles.
> >> However, associating these roles to authenticated identities is not a
> >> simple effort.
> >>
> >> To address this issue, I have created a design proposal for building
> RBAC
> >> into Airflow and simplifying user access management via the Airflow UI.
> >>
> >> The design proposal is located at https://cwiki.apache.org/
> >> confluence/display/AIRFLOW/Airflow+RBAC+proposal
> >>
> >> Any comments/questions/feedback are much appreciated.
> >>
> >> Thanks
> >> Kalpesh
> >>
>


Re: Airflow Logging Improvements

2017-06-21 Thread Dan Davydov
Responding to some of Bolke's concerns in the github PR for this change:

> Mmm still not convinced. Especially on elastic search it is just easier
to use the start_date to shard on.
sharding on start_date isn't great because there is still some risk of
collisions and it means that we are coupling the primary key with
start_date unnecessarily (e.g. hypothetically you could allow two tasks to
run at the same in Airflow and in this case start_date would no longer be a
valid primary key), using monotonically increasing IDs for DB entries like
this is pretty standard practice.

> In addition I'm very against the managing of log files this way. Log
files are already a mess and should be refactored to be consistent and to
be managed from one place.

I agree about the logging mess, and there seem to have been efforts
attempting to fix this but they have all been abandoned so we decided to
move ahead with this change. I need to take a look at the PR first, but
this change should actually make logging less messy, since it should add an
abstraction for logging modules, and because you know exactly which try
numbers (and how many) ran on which workers from the file path. The log
folder structure already kind of mimicked the primary key of the
task_instance table (dag_id + task_id + execution_date), but really
try_number logically belongs in this key as well (at least for the key for
log files).


> The docker packagers can already not package airflow correctly without
jumping through hoops. Arbitrarily naming it certainly does not help here.

If this is referring to the // in the path, I don't think this
is arbitrarily naming it. A log "unit" really should be a single task run
(not an arbitrary grouping of a variable number of multiple runs), and each
unit should have a unique key or location. One of the reasons we are
working on this effort is to actually make Airflow play nicer with
Kubernetes/Docker (since airflow workers should ideally be ephemeral), and
allowing a separate service to read and ship the logs is necessary in this
case since the logs will be destroyed along with the worker instance. I
think in the future we should also allow custom logging modules (e.g.
directly writing logs to some service).



On Wed, Jun 21, 2017 at 3:11 PM, Allison Wang 
wrote:

> Hi,
>
> I am in the process of making airflow logging backed by Elasticsearch
> (more detail please check AIRFLOW-1325
> ). Here are several
> more logging improvements we are considering:
>
> *1. Log streaming.* Auto-refresh the logs if tasks are running.
>
> *2. Separate logs by attempts.*
> [image: Screen Shot 2017-06-21 at 2.49.11 PM.png]
> Instead of logging everything into one file, logs can be separated by
> attempt number and displayed using tabs. Attempt number here is a
> monotonically increasing number that represents each task instance run
> (unlike try_number, clear task instance won't reset attempt number).
> *try_number:* n^th retry by the task instance. try_number should not be
> greater than retries. Clear task will set try_number to 0.
> *attempt:* number of times current task instance got executed.
>
> *3. Collapsable logs.* Collapse logs that are mainly for debugging
> airflow internal and aren't really related to users' tasks (for example,
> logs showed before "starting attempt 1 of 1")
>
> All suggestions are welcome.
>
> Thanks,
> Allison
>


Re: Airflow Logging Improvements

2017-06-22 Thread Dan Davydov
I don't think Allison's PR fixes logging, but it's a step in the right
direction. The current approach creates an abstraction around reading logs,
whereas the final solution should define an interface for writings to logs
in addition to reading logs (which could indeed use something like
https://github.com/cmanaha/python-elasticsearch-logger for the writing
part). I agree we should move the logging towards something like log4j
(with context awareness of task id/dag id/execution date/attempt #). If
there are incompatibilities with this approach and the log4j solution (or
reasons why it would be difficult to port the PR over to the final model),
we should definitely address those concerns, but otherwise I still feel
this is a step in the right direction.

The concept of "attempt" is needed regardless of logging, the way retries
are stored/handled right now is not very sane.
- Old TI state is permanently deleted
- In the task logs you get "Try 1/6"... 2/6... 3/6... 1/6... 2/6 in logs
which doesn't make sense (if it's the Nth time the task is running it
should be logged as the Nth time). I recall other strange behaviors in
these log lines too (maybe something like Try 4/3).
- The "primary key" for a task instance run is not complete (which is what
Allison's logging change needs), you could say that TaskInstance should
only keep track of the latest TaskInstance run, but we still want to store
all tries for a task instance somewhere in the database, and we still need
to key this off of "attempt".

On Thu, Jun 22, 2017 at 11:00 AM, Allison Wang 
wrote:

> Hi Bolke,
>
> I agree that we should make logging configurable lbut I wouldn't think
> using handlers like python-elasticsearch-logger is a good idea over
> flushing logs into files. Here are some reasons:
>
>1. Such handlers do not have the built-in backpressure-sensitive
>protocol that can prevent overwhelm ElasticSearch.
>2. Logs will be lost if ElasitcSearch cluster is down for reasons like
>upgrading.
>
> In general, it's not a good practice for python logger talk directly to
> ElasticSearch. Flushing logs into files give us more flexibilities to use
> tools like Filebeat and Logstash to collect and index logs into
> ElasticSearch.
>
> Thanks,
> Allison
>
> On Thu, Jun 22, 2017 at 12:05 AM Bolke de Bruin  wrote:
>
>> In the light of fixing logging, I would definitely appreciate written
>> design. Especially, as there have been multiple attempts to fix some issues
>> but these have been more like stop gap fixes.
>>
>> In my opinion Airflow should not stipulate in a hard coded fashion where
>> and how logging takes place. It should behave more like ‘log4j’
>> configurations. So it should not just use “dag_id + task+id +
>> execution_date” and write this to an arbitrary location on the filesystem.
>> I could imagine a settings file “logging.conf” that setups something like
>> this:
>>
>> [logger_scheduler]
>> level = INFO
>> handler = stderr
>> qualname = airflow.scheduler
>> formatter=scheduler_formatter
>>
>> In airflow.cfg it should allow setting something like this:
>>
>> [scheduler]
>> use_syslog = True
>> syslog_log_facility = LOG_LOCAL0
>>
>> To allow logging to syslog so it can be moved to a centralised location
>> if required (syslog being a special case afaik).
>>
>> Elasticsearch and any other backend can then just be a handler and we can
>> remove the custom stuff that is proposed in PR https://github.com/apache/
>> incubator-airflow/pull/2380 <https://github.com/apache/
>> incubator-airflow/pull/2380> by https://github.com/cmanaha/
>> python-elasticsearch-logger <https://github.com/cmanaha/
>> python-elasticsearch-logger> for example.
>>
>> I then can be convinced to add something like “attempt”, but probably
>> there are more friendly ways to solve it at that time. In addition
>> ‘attempts' should then imho not be managed by the task or cli, but rather
>> by the executor as that is the process which “attempts” a task.
>>
>> Bolke.
>>
>>
>> > On 22 Jun 2017, at 01:21, Dan Davydov  wrote:
>> >
>> > Responding to some of Bolke's concerns in the github PR for this change:
>> >
>> > > Mmm still not convinced. Especially on elastic search it is just
>> easier to use the start_date to shard on.
>> > sharding on start_date isn't great because there is still some risk of
>> collisions and it means that we are coupling the primary key with
>> start_date unnecessarily (e.g. hypothetically you could allow two tasks to
>> run at the same in Airflow and

Re: airflow backfill seems to ignore -I

2017-07-12 Thread Dan Davydov
Airflow dependencies were simplified a bit, -i no longer ignores failed
state tasks, check out the -A flag which ignores pretty much all
dependencies (including the failed state tasks), though depending on the
version you are using there is a bug that is being fixed here:
https://github.com/apache/incubator-airflow/pull/2327

On Wed, Jul 5, 2017 at 8:45 AM, Weiwei Zhang  wrote:

> I am using airflow 1.8.1 as well. It is able to pick up the rest of the
> tasks when using backfill with the only exception which is when there is a
> task failed and I had to clear the status to allow the backfill to work.
> Any ideas why it is behaving like this? The previous version 1.6.2 didn't
> require clearing the failed task before doing backfill.
>
> Thx a lot,
> Viv
>
> > On Jul 5, 2017, at 7:38 AM, Tobias Feldhaus <
> tobias.feldh...@localsearch.ch> wrote:
> >
> > I’ve just pulled the newest master and build it; the behaviour is the
> same. How can it be that “–i” is not honoured and dependencies are checked?
> >
> >
> > On 05.07.2017, 15:49, "Tobias Feldhaus" 
> wrote:
> >
> >But nonetheless, is it not possible to backfill and ignore the
> upstream dependencies with “-i” ?
> >
> >On 05.07.2017, 14:34, "Tobias Feldhaus"  ch> wrote:
> >
> >I meant –i , but I just needed to manually set the upstream
> things to success and it worked. Nevermind.
> >
> >Best,
> >Tobi
> >
> >On 05.07.2017, 14:28, "Tobias Feldhaus" <
> tobias.feldh...@localsearch.ch>
> wrote:
> >
> >Hi,
> >
> >When running airflow (1.8.1) backfill with –I and –t like:
> >
> >airflow backfill -t 'nonspider_sessions' -i -I -s 2017-05-30 -e
> 2017-05-31 google_pipelines
> >
> >I would expect it to rerun that specific task and ignoring the
> dependencies. Instead I see this:
> >
> >[2017-07-05 12:23:30,419] {base_task_runner.py:95} INFO -
> Subtask: [2017-07-05 12:23:30,419] {models.py:1145} INFO - Dependencies not
> met for  05:30:00 [queued]>, dependency 'Trigger Rule' FAILED: Task's trigger rule
> 'all_success' requires all upstream tasks to have succeeded, but found 3
> non-success(es). upstream_tasks_state={'successes': 0L, 'failed': 0L,
> 'upstream_failed': 0L, 'skipped': 0L, 'done': 0L},
> upstream_task_ids=['frontend_sensor', 'log_sensor', 'tracker_pipeline']
> >
> >Am I doing it wrong?
> >
> >
> >
> >Best,
> >Tobi
> >
> >
> >
> >
>


Re: What argument does -A / --ignore_all_dependencies expect when triggering via airflow run?

2017-07-12 Thread Dan Davydov
Probably caused by the issue being fixed here: https://github.com/apache/i
ncubator-airflow/pull/2327

On Wed, Jul 12, 2017 at 4:25 AM, Tobias Feldhaus <
tobias.feldh...@localsearch.ch> wrote:

> I am trying to force airflow to run a task so that the depends_on_past
> setting of the DAG is honoured and I can get it to run the rest of it:
>
> I have tried using the –A / --ignore_all_dependencies parameter of the
> airflow run command, but I don’t know what argument it does expect:
>
>
>
> airflow run -l -A IGNORE_ALL_DEPENDENCIES -f google_pipelines
> search_log_sensor 2017-07-01
>
> airflow run -l --ignore_all_dependencies IGNORE_ALL_DEPENDENCIES -f
> google_pipelines search_log_sensor 2017-07-01
>
> both give me:
>
> airflow run: error: argument -A/--ignore_all_dependencies: expected one
> argument
>
>
> Am I using it wrong?
>
>
>
> Best,
> Tobias
>
>


Re: Good practices around restarting scheduler and server of airflow

2017-07-13 Thread Dan Davydov
Airbnb switched to running the scheduler without restarts about a month
ago, and it is working well. The one issue you might hit is if you are
using a version of Celery with a bug (where workers will pull off tasks
even if they don't have space to run them), tasks can potentially be lost
once they are sent to your Celery backend. With restarts these tasks are
automatically resent.

On Thu, Jul 13, 2017 at 1:38 PM, manish ranjan 
wrote:

> Hi All,
>
> We have recently started to use airflow to make our data engineering more
> robust.
> 1. I wanted to know if there are good practices around scheduling a daily/
> bi-weekly/ weekly restart of the airflow server and scheduler? We are using
> just the instance and not cluster.
> 2.  Also wanted to know the troubles I may run into if someone is ready to
> share based on experience?
>
> ~Manish
>


Re: Airflow + Kubernetes discussion

2017-07-20 Thread Dan Davydov
I'm in.

On Thu, Jul 20, 2017 at 4:30 PM, Daniel Imberman 
wrote:

> Glad to hear that people are interested! I've created a google calendar
> event and messaged everyone in this thread, if anyone else would like to
> join please let me know!
>
> On Thu, Jul 20, 2017 at 1:04 PM Bolke de Bruin  wrote:
>
> > Invite would be nice, I will try to join!
> >
> > > On 20 Jul 2017, at 20:36, Gerard Toonstra  wrote:
> > >
> > > send me an invite too!
> > >
> > > On Thu, Jul 20, 2017 at 8:17 PM, Jeremiah Lowin 
> > wrote:
> > >
> > >> I'm interested as well.
> > >>
> > >> On Thu, Jul 20, 2017 at 1:51 PM Marc Bollinger 
> > wrote:
> > >>
> > >>> +1 We're in the middle of moving some services to k8s, and have had
> our
> > >>> eye on Airflow.
> > >>>
> >  On Jul 20, 2017, at 10:37 AM, Sumit Maheshwari <
> > sumeet.ma...@gmail.com
> > >>>
> > >>> wrote:
> > 
> >  I would join as well for sure.
> > 
> >  Thanks,
> >  Sumit Maheshwari
> >  cell. 9632202950
> > 
> > 
> >  On Thu, Jul 20, 2017 at 11:00 PM, Chris Riccomini <
> > >> criccom...@apache.org
> > 
> >  wrote:
> > 
> > > I would definitely be up to joining. We're interested in the K8s
> work
> > > that's going on. That time works for me.
> > >
> > > On Thu, Jul 20, 2017 at 9:54 AM, Daniel Imberman <
> > > daniel.imber...@gmail.com>
> > > wrote:
> > >
> > >> Hello everyone,
> > >>
> > >> Recently there's been a fair amount of discussion regarding the
> > > integration
> > >> of airflow with kubernetes. If there is interest I would love to
> > host
> > >>> an
> > >> e-meeting to discuss this integration. I can go over the
> > architecture
> > >>> as
> > > it
> > >> stands right now and would love feedback on
> > > improvements/features/design. I
> > >> could also attempt to get one or two members of google's
> kubernetes
> > >>> team
> > > to
> > >> join to discuss best practices.
> > >>
> > >> I'm currently thinking that next Thursday at 11AM PST over
> zoom.us,
> > > though
> > >> if there's strong opinions otherwise I'd be glad to propose other
> > >>> times.
> > >>
> > >> Cheers!
> > >>
> > >> Daniel
> > >>
> > >
> > >>>
> > >>
> >
> >
>


Re: Role Based Access Control for Airflow UI

2017-07-25 Thread Dan Davydov
> > > > > >>>>>>> On 22 Jun 2017, at 09:36, Bolke de Bruin <
> bdbr...@gmail.com>
> > > > > >>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>> Hi Guys,
> > > > > >>>>>>>
> > > > > >>>>>>> Thanks for putting the thinking in! It is about time that
> we
> > > get
> > > > > >>> this
> > > > > >>>>>> moving.
> > > > > >>>>>>>
> > > > > >>>>>>> The design looks pretty sound. One can argue about the
> > > different
> > > > > >>>> roles
> > > > > >>>>>> that are required, but that will be situation dependent I
> > guess.
> > > > > >>>>>>>
> > > > > >>>>>>> Implementation wise I would argue together with Max that
> FAB
> > > is a
> > > > > >>>>> better
> > > > > >>>>>> or best fit. The ER model that is being described is pretty
> > > much a
> > > > > >>> copy
> > > > > >>>>> of
> > > > > >>>>>> a normal security model. So a reimplementation of that is 1)
> > > > > >>>> significant
> > > > > >>>>>> duplication of effort and 2) bound to have bugs that have
> been
> > > > > >> solved
> > > > > >>>> in
> > > > > >>>>>> the other framework. Moreover, FAB does have integration out
> > of
> > > > the
> > > > > >>> box
> > > > > >>>>>> with some enterprisey systems like IPA, ActiveDirectory, and
> > > LDAP.
> > > > > >>>>>>>
> > > > > >>>>>>> So while you argue that using FAB would increase the scope
> of
> > > the
> > > > > >>>>>> proposal significantly, but I think that is not true. Using
> > FAB
> > > > > >> would
> > > > > >>>>> allow
> > > > > >>>>>> you to focus on what kind of out-of-the-box permission sets
> > and
> > > > > >> roles
> > > > > >>>> we
> > > > > >>>>>> would need and maybe address some issues that FAB lacks
> (maybe
> > > how
> > > > > >> to
> > > > > >>>>> deal
> > > > > >>>>>> with non web access - ie. in DAGs, maybe Kerberos, probably
> > how
> > > to
> > > > > >>> deal
> > > > > >>>>>> with API calls that are not CRUD). Implementation wise it
> > > probably
> > > > > >>>>>> simplifies what we need to do. Maybe - using Max’s early POC
> > as
> > > an
> > > > > >>>>> example
> > > > > >>>>>> - we can slowly move over?
> > > > > >>>>>>>
> > > > > >>>>>>> On a side note: Im planning to hire 2-3 ppl to work on
> > Airflow
> > > > > >>> coming
> > > > > >>>>>> year. Improvement of Security, Enterprise Integration,
> Revamp
> > UI
> > > > > >> are
> > > > > >>> on
> > > > > >>>>> the
> > > > > >>>>>> todo list. However, this is not confirmed yet as business
> > > > > >> priorities
> > > > > >>>>> might
> > > > > >>>>>> change.
> > > > > >>>>>>>
> > > > > >>>>>>> Bolke.
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>> On 15 Jun 2017, at 21:45, kalpesh dharwadkar <
> > > > > >>>>>> kalpeshdharwad...@gmail.com> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>> @Dan:
> > > > > >>>>>>>>
> > > > > >>>>>>>> Thanks for your feedback. I will remove t

Re: Airflow + Kubernetes Talk video

2017-07-28 Thread Dan Davydov
Thanks for organizing, and leading this effort in general!

On Fri, Jul 28, 2017 at 2:40 PM, Daniel Imberman 
wrote:

> Hi guys!
>
> Thank you again to everyone who attended the talk yesterday. I've posted
> the video of the conversation to youtube, and will soon add the video and
> slides to the airflow Wiki
>
> Cheers,
> Daniel
>
> https://www.youtube.com/watch?v=5BU3YPYYRno
>


Re: Airflow + Kubernetes update meeting

2017-09-05 Thread Dan Davydov
Works for me as well!

On Tue, Sep 5, 2017 at 10:43 AM Daniel Imberman 
wrote:

> @Marc we will make sure to record the meeting/supply notes. This should be
> a pretty straightforward update/overview meeint.
> @ChrisB this meeting will be a virtual meeting, though Bloomberg is
> definitely interested in hosting an airflow meetup at our SF location if
> there is sufficient interest :).
> @ChrisR Great to hear :). We've been working with members of the openshift
> community so we can definitely speak to those requirem
>
>
> On Tue, Sep 5, 2017 at 10:24 AM Feng Lu  wrote:
>
>> +1, either way works for me.
>>
>> On Tue, Sep 5, 2017 at 10:10 AM, Chris Riccomini 
>> wrote:
>>
>> > Works for me.
>> >
>> > On Tue, Sep 5, 2017 at 7:44 AM, Grant Nicholas > > northwestern.edu> wrote:
>> >
>> >> +1 for me if it works with others.
>> >>
>> >> On Mon, Sep 4, 2017 at 11:02 PM, Anirudh Ramanathan <
>> >> ramanath...@google.com> wrote:
>> >>
>> >>> Date/time work for me if we get quorum from this group.
>> >>>
>> >>> On Thu, Aug 31, 2017 at 7:54 PM, Christopher Bockman <
>> >>> ch...@fathomhealth.co> wrote:
>> >>>
>>  Hi Daniel, would this be remote or in person?
>> 
>> 
>>  On Aug 31, 2017 4:16 PM, "Daniel Imberman" <
>> daniel.imber...@gmail.com>
>>  wrote:
>> 
>>  Hey guys!
>> 
>>  So I wanted to set up a meeting to discuss some of the
>> updates/current
>>  work
>>  that is going on with both the kubernetes operator and kubernetes
>>  executor
>>  efforts. There has been some really cool updates/proposals on the
>>  design of
>>  these two features and I would love to get some community feedback to
>>  make
>>  sure that we are taking this in a direction that benefits everyone.
>> 
>>  I am thinking of having this meeting at 10:00AM on Thursday,
>> September
>>  7th
>>  PST. Would this time/place work?
>> 
>>  Thanks!
>> 
>>  Daniel
>> 
>> 
>> 
>> >>>
>> >>>
>> >>> --
>> >>> Anirudh Ramanathan
>> >>>
>> >>
>> >>
>> >
>>
>


Re: Airflow 1.8.2 released

2017-09-05 Thread Dan Davydov
+1, this was a lot more work than anticipated.

On Tue, Sep 5, 2017 at 10:10 AM Chris Riccomini 
wrote:

> Thanks so much for slogging through this!
>
> On Mon, Sep 4, 2017 at 10:26 AM, Sumit Maheshwari 
> wrote:
>
> > Awesome!!
> >
> > Thanks a lot Max for being the RM for this release.
> >
> >
> >
> > On Mon, Sep 4, 2017 at 10:51 PM, Maxime Beauchemin <
> > maximebeauche...@gmail.com> wrote:
> >
> > > Dear Airflow community,
> > >
> > > Airflow 1.8.2 was just released.
> > >
> > > The source release as well as the binary "sdist" release are available
> > > here:
> > > https://dist.apache.org/repos/dist/release/incubator/
> > > airflow/1.8.2-incubating/
> > >
> > > We also made this version available on Pypi for convenience (`pip
> install
> > > apache-airflow`):
> > > https://pypi.python.org/pypi/apache-airflow
> > >
> > > Note that 1.8.2 is a minor release that is several months behind the
> > > current `master` branch. We're trying to increase our release cadence
> as
> > we
> > > iron out the process and Apache requirements. The process requires a
> fair
> > > amount of back and forth with the community and Apache, and I have to
> > admit
> > > that I wasn't exactly on top of it as the person in charge of this
> > release.
> > > Some of the work I've done around the LICENSE files should make future
> > > releases easier though, so that's a positive thing.
> > >
> > > Find the CHANGELOG here for more details:
> > > https://github.com/apache/incubator-airflow/pull/2562
> > >
> > > Also note that 1.9.0rc1 will be cut off of master shortly and should
> > > include all of the latest development.
> > >
> > > Enjoy!
> > >
> > > Max
> > >
> >
>


Re: Some random fun

2017-09-25 Thread Dan Davydov
Haha, this is great.

On Mon, Sep 25, 2017 at 11:37 AM Shah Altaf  wrote:

> **CupcakeSensor activated**
>
>
>
> On Mon, Sep 25, 2017 at 7:31 PM Laura Lorenz 
> wrote:
>
> > Just thought everyone here would appreciate the nerdy party our data team
> > threw ourselves for completing a milestone on a difficult DAG recently.
> We
> > played Pin the Task on the DAG and ate Task State cupcakes: see pics at
> > https://twitter.com/lalorenz6/status/912383049354096641
> >
>


Re: Apache Airflow welcome new committer/PMC member : Fokko Driespong (a.k.a. fokko)

2017-10-04 Thread Dan Davydov
Welcome!

On Wed, Oct 4, 2017 at 2:31 PM Maxime Beauchemin 
wrote:

> Welcome on board Fokko!
>
> Max
>
> On Wed, Oct 4, 2017 at 2:18 PM, Chris Riccomini 
> wrote:
>
> > Welcome!!
> >
> > On Wed, Oct 4, 2017 at 12:51 PM, Sid Anand  wrote:
> >
> > > Folks,
> > > Please join the Apache Airflow PMC in welcoming its newest member and
> > > co-committer, Fokko Driespong (a.k.a. fokko  >).
> > >
> > > https://cwiki.apache.org/confluence/display/AIRFLOW/
> > > Announcements#Announcements-Oct1,2017
> > >
> > >
> > > -s
> > >
> >
>


Meetup Interest?

2017-10-13 Thread Dan Davydov
Is there interest in doing an Airflow meet-up? Airbnb can host one in San
Francisco.

Some talk ideas can include the progress on Kubernetes integration and
Scaling & Operations with Airflow. If you want to see other topics covered,
feel free to suggest them!


Re: Meetup Interest?

2017-10-16 Thread Dan Davydov
Glad to see there is interest! I'll work on setting this up.

On Sun, Oct 15, 2017 at 10:47 AM Cade Markegard 
wrote:

> +1 at SF meetup
>
> Would be interested in seeing that progress of airflow + k8s and any other
> advancements the community has made.
> On Sun, Oct 15, 2017 at 9:24 AM Feng Lu  wrote:
>
> > +1
> >
> > We can give an update on task secret management in K8SExecutor and also
> > want to share our thoughts and get feedback on Airflow CI/CD with the set
> > of GCP operators/hooks as an example.
> >
> > On Sat, Oct 14, 2017 at 7:06 PM, Marc Bollinger 
> > wrote:
> >
> > > +1
> > >
> > > We'd definitely be in. Would love to chat more about K8s/Airflow--Data
> > Eng
> > > has been a little twitchy about being the guinea pigs in our org, but
> the
> > > production app is now serving all traffic from it, so we're planning
> out
> > > our strategy.
> > >
> > > On Fri, Oct 13, 2017 at 1:29 PM, Daniel Imberman (BLOOMBERG/ SAN FRAN)
> <
> > > dimber...@bloomberg.net> wrote:
> > >
> > > > +1
> > > >
> > > > We're getting really close on the Kubernetes Executor PR. Would love
> to
> > > > discuss final features/architecture to make sure we cover our bases
> > > before
> > > > we try to roll out alpha.
> > > >
> > > >
> > > > From: mw...@newrelic.com
> > > > Subject: Re: Meetup Interest?
> > > >
> > > > +1 for this meetup idea! We don't use Kube+Airflow, but I'd love to
> see
> > > > talks on scaling it out team-wise and some design patterns people
> have
> > > come
> > > > up with.
> > > >
> > > > --
> > > > Marc Weil | Lead Engineer | Growth Automation, Marketing, and
> > Engagement
> > > |
> > > > New Relic
> > > > On Fri, Oct 13, 2017 at 1:03 PM, Christopher Bockman <
> > > > ch...@fathomhealth.co> wrote:
> > > >
> > > > +1 as a vote.
> > > >
> > > > We're very actively working on Kube+Airflow, so would be particularly
> > > > interested on discussions there.
> > > >
> > > > On Fri, Oct 13, 2017 at 12:59 PM, Joy Gao  wrote:
> > > >
> > > > > Hi Dan,
> > > > >
> > > > > I'd be happy to give an update on progress of the new RBAC UI we've
> > > been
> > > > > working on here at WePay.
> > > > >
> > > > > Cheers,
> > > > > Joy
> > > > >
> > > > > On Fri, Oct 13, 2017 at 12:10 PM, Dan Davydov <
> > > > > dan.davy...@airbnb.com.invalid> wrote:
> > > > >
> > > > > > Is there interest in doing an Airflow meet-up? Airbnb can host
> one
> > in
> > > > San
> > > > > > Francisco.
> > > > > >
> > > > > > Some talk ideas can include the progress on Kubernetes
> integration
> > > and
> > > > > > Scaling & Operations with Airflow. If you want to see other
> topics
> > > > > covered,
> > > > > > feel free to suggest them!
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> >
>


Airflow Meetup @ Airbnb - Mon Dec 4th

2017-10-25 Thread Dan Davydov
Hey guys, we are doing an Airflow meetup at Airbnb on December 4th!

All are welcome, and food will be provided.

Please RSVP and see the details/agenda at
https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/events/244525050/

Sincerely,
Dan


Re: Experimental API

2017-10-30 Thread Dan Davydov
FWIW I am hoping we can change this insecure-by-default for 2.0, and there
is already some stuff in the Airflow config that lets you do this out of
the box if you tweak a couple of config values (e.g. check out secure_mode
that we can hopefully build upon).

On Mon, Oct 30, 2017 at 3:22 PM Bolke de Bruin  wrote:

> Hi All,
>
> Airflow out of the box comes without security configured. This goes for
> both the API and the UI. Currently, the API and the UI make use of
> different authentication backends due to the way authentication needed to
> be implemented. This should be better documented.
>
> So while “the web ui is protected, thus automatically the API as well” is
> the ideal situation, it is not an oversight and “not something has gone
> wrong”.
>
> Some part of this is technical debt. Which we probably won’t solve until
> the move towards FlaskApplicationBuilder, hopefully not too far out. That
> being said we might choose to have an Rest API as a separate service from
> the WebUI.
>
> Cheers
> Bolke
>
>
>
> > On 30 Oct 2017, at 16:42, Ash Berlin-Taylor <
> ash_airflowl...@firemirror.com> wrote:
> >
> > Oh gods.
> >
> > Something has gone wrong - the methods are decorated with
> `@requires_authentication` but they... don't. Oh, because the default
> backend doesn't do any authentication or protection at all.
> >
> > I thik this is CVEworthy - using the User+Password auth for the web
> front end/using default config should not leave the API unprotected. I
> think the default API auth backend should deny all rather than allow all?
> >
> > -ash
> >
> >> On 30 Oct 2017, at 08:51, Niels Zeilemaker <
> nielszeilema...@godatadriven.com> wrote:
> >>
> >> Hi All,
> >>
> >> I've implemented HTTP Basic Authentication for the experiment API, see
> https://github.com/apache/incubator-airflow/pull/2730. This seems to work
> fine.
> >> However, while implementing this. I noticed, to my surprise, that the
> experimental API was open even though we enabled Password authentication
> for the web-interface.
> >> This seems like a bug to me, as one would expect that the experimental
> API would use the same auth backend as the web-interface.
> >>
> >> Why did Airflow choose to split the authentication for the
> web-interface  and experimental API?
> >> And if it's not possible to combine those, is it possible to lock down
> the experimental API if one chooses a non-default web-interface auth
> backend?
> >>
> >> Niels
> >> Ps with an unsecured experimental api it is possible to trigger dags,
> list pools, delete pools, etc.
> >
>
>


4/17 Airflow Meetup Slides

2017-12-07 Thread Dan Davydov
Here 
are the presentations from the 4/17 Airflow meetup at Airbnb.

Unfortunately we weren't able to get the recording software working for the
talk.

Thank you all for coming!


Re: Switching minimum support version from python 3.4 -> 3.5

2017-12-19 Thread Dan Davydov
Sounds good to me

On Tue, Dec 19, 2017 at 10:18 AM Chris Riccomini 
wrote:

> :thumbsup:
>
> On Tue, Dec 19, 2017 at 5:19 AM, Bolke de Bruin  wrote:
>
> > Hi All,
> >
> > We have some issues on Travis with issues around distributed and task,
> > these are related to Python 3.4. As Python 3.4 is not very popular (
> > https://user-images.githubusercontent.com/306380/
> > 29750903-f891cb2a-8b15-11e7-84cc-e26ce5b1e095.png) I am switching the
> > builds to Python 3.5 as a minimum.
> >
> > Please let me know if you think that is a bad idea.
> >
> > Cheers
> > Bolke
> >
> >
> >
>


Re: 4/17 Airflow Meetup Slides

2017-12-27 Thread Dan Davydov
Thanks Sid!

On Wed, Dec 27, 2017 at 3:13 PM Sid Anand  wrote:

> Thx for posting the slides : I've added them to the Announcements page :
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements#Announcements-Nov1,2017
>
> On Thu, Dec 7, 2017 at 12:09 PM, Dan Davydov  .invalid
> > wrote:
>
> > Here <https://drive.google.com/open?id=154jnUADKfrHXLUDvJBQOl8aiBwpaw9-w
> >
> > are the presentations from the 4/17 Airflow meetup at Airbnb.
> >
> > Unfortunately we weren't able to get the recording software working for
> the
> > talk.
> >
> > Thank you all for coming!
> >
>


Re: Airflow Scalability with Local Executor

2018-03-28 Thread Dan Davydov
The LocalExecutor is great for running small numbers of DAGs/tasks, but it
is more of a starter executor meant to made Airflow work out of the box. I
would recommend switching to a different executor like the CeleryExecutor.

You are certainly right that there is room for reducing the memory
footprint of each Airflow process (though I'm not too sure how much can be
done about the CPU usage, could be a function of how your DAGs are parsed).
Even if you fix the current bottlenecks you will likely run into more.

On Wed, Mar 28, 2018 at 7:13 AM ramandu...@gmail.com 
wrote:

> Hi All,
> We have a use case to support 1000 concurrent DAGs. These dags would have
> have couple of Http task which would be submitting jobs to external
> services. Each DAG could run for couple of hours.
> HTTP tasks are periodically checking(with sleep 20) the job status.
> We tried running 1000 such dags(Parallelism set to 1000) with Airflow's
> LocalExecutor Mode but after 100 concurrent runs, tasks started failing due
> to
> --> OOM error
> --> Scheduler marked them failed because of lack of heartbeat.
> We are using 4 cores and 16 GB RAM. Each airflow worker is taking ~250 MB
> of Virtual memory and ~60 MB of RES memory which seems to be on higher
> side. CPU utilisation is also ~98%.
> Is there anything that can be done to optimise Memory/CPU for airflow
> worker.
> Any pointer to airflow benchmarking with LocalExecutor would also be
> helpful
>


  1   2   >