[jira] [Commented] (AIRFLOW-2272) Travis CI is failing builds due to oracle-java8-installer failing to install via apt-get

2018-04-02 Thread David Klosowski (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422974#comment-16422974
 ] 

David Klosowski commented on AIRFLOW-2272:
--

[~jinhyukch...@gmail.com] looks like there is an issue already open: 
https://github.com/travis-ci/travis-ci/issues/9419

> Travis CI is failing builds due to oracle-java8-installer failing to install 
> via apt-get
> 
>
> Key: AIRFLOW-2272
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2272
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ci
>Reporter: David Klosowski
>Priority: Critical
>  Labels: build-failure
>
> All the PR builds in TravisCI are failing with the following apt-get error:
>  
> {code:java}
> --2018-03-30 17:56:23-- (try: 5) 
> http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz
> Connecting to download.oracle.com (download.oracle.com)|23.45.144.164|:80... 
> failed: Connection timed out.
> Giving up.
> apt-get install failed
> $ cat ~/apt-get-update.log
> Ign:1 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty InRelease
> Hit:2 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-updates 
> InRelease
> Hit:3 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-backports 
> InRelease
> Hit:4 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty Release
> Ign:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease
> Ign:6 http://dl.google.com/linux/chrome/deb stable InRelease
> Hit:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release
> Hit:8 http://security.ubuntu.com/ubuntu trusty-security InRelease
> Ign:9 http://toolbelt.heroku.com/ubuntu ./ InRelease
> Get:10 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B]
> Hit:11 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty InRelease
> Hit:13 https://download.docker.com/linux/ubuntu trusty InRelease
> Hit:12 http://toolbelt.heroku.com/ubuntu ./ Release
> Hit:15 http://dl.google.com/linux/chrome/deb stable Release
> Hit:16 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease
> Hit:18 https://dl.hhvm.com/ubuntu trusty InRelease
> Ign:19 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease
> Hit:22 https://packagecloud.io/computology/apt-backport/ubuntu trusty 
> InRelease
> Hit:23 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease
> Hit:24 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty 
> InRelease
> Hit:25 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease
> Hit:26 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease
> Hit:27 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease
> Hit:28 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease
> Hit:29 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty Release
> Fetched 3,168 B in 2s (1,102 B/s)
> Reading package lists...
> W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: 
> Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest 
> algorithm (SHA1)
>  
> The command "sudo -E apt-get -yq --no-install-suggests 
> --no-install-recommends --force-yes install slapd ldap-utils openssh-server 
> mysql-server-5.6 mysql-client-core-5.6 mysql-client-5.6 krb5-user krb5-kdc 
> krb5-admin-server oracle-java8-installer python-selinux" failed and exited 
> with 100 during .{code}
>  
>  It looks like this is due to the configuration in the .travis.yml installing 
> {{oracle-java8-installer}}:
> {code}
>   apt:
> packages:
>   - slapd
>   - ldap-utils
>   - openssh-server
>   - mysql-server-5.6
>   - mysql-client-core-5.6
>   - mysql-client-5.6
>   - krb5-user
>   - krb5-kdc
>   - krb5-admin-server
>   - oracle-java8-installer
>   - python-selinux
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2272) Travis CI is failing builds due to oracle-java8-installer failing to install via apt-get

2018-04-02 Thread David Klosowski (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422970#comment-16422970
 ] 

David Klosowski commented on AIRFLOW-2272:
--

OK, everything succeeded besides flake8.  Wondering if this is still an issue 
or if jdk9 works and not jdk8.

> Travis CI is failing builds due to oracle-java8-installer failing to install 
> via apt-get
> 
>
> Key: AIRFLOW-2272
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2272
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ci
>Reporter: David Klosowski
>Priority: Critical
>  Labels: build-failure
>
> All the PR builds in TravisCI are failing with the following apt-get error:
>  
> {code:java}
> --2018-03-30 17:56:23-- (try: 5) 
> http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz
> Connecting to download.oracle.com (download.oracle.com)|23.45.144.164|:80... 
> failed: Connection timed out.
> Giving up.
> apt-get install failed
> $ cat ~/apt-get-update.log
> Ign:1 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty InRelease
> Hit:2 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-updates 
> InRelease
> Hit:3 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-backports 
> InRelease
> Hit:4 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty Release
> Ign:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease
> Ign:6 http://dl.google.com/linux/chrome/deb stable InRelease
> Hit:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release
> Hit:8 http://security.ubuntu.com/ubuntu trusty-security InRelease
> Ign:9 http://toolbelt.heroku.com/ubuntu ./ InRelease
> Get:10 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B]
> Hit:11 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty InRelease
> Hit:13 https://download.docker.com/linux/ubuntu trusty InRelease
> Hit:12 http://toolbelt.heroku.com/ubuntu ./ Release
> Hit:15 http://dl.google.com/linux/chrome/deb stable Release
> Hit:16 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease
> Hit:18 https://dl.hhvm.com/ubuntu trusty InRelease
> Ign:19 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease
> Hit:22 https://packagecloud.io/computology/apt-backport/ubuntu trusty 
> InRelease
> Hit:23 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease
> Hit:24 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty 
> InRelease
> Hit:25 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease
> Hit:26 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease
> Hit:27 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease
> Hit:28 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease
> Hit:29 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty Release
> Fetched 3,168 B in 2s (1,102 B/s)
> Reading package lists...
> W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: 
> Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest 
> algorithm (SHA1)
>  
> The command "sudo -E apt-get -yq --no-install-suggests 
> --no-install-recommends --force-yes install slapd ldap-utils openssh-server 
> mysql-server-5.6 mysql-client-core-5.6 mysql-client-5.6 krb5-user krb5-kdc 
> krb5-admin-server oracle-java8-installer python-selinux" failed and exited 
> with 100 during .{code}
>  
>  It looks like this is due to the configuration in the .travis.yml installing 
> {{oracle-java8-installer}}:
> {code}
>   apt:
> packages:
>   - slapd
>   - ldap-utils
>   - openssh-server
>   - mysql-server-5.6
>   - mysql-client-core-5.6
>   - mysql-client-5.6
>   - krb5-user
>   - krb5-kdc
>   - krb5-admin-server
>   - oracle-java8-installer
>   - python-selinux
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2272) Travis CI is failing builds due to oracle-java8-installer failing to install via apt-get

2018-04-02 Thread David Klosowski (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422865#comment-16422865
 ] 

David Klosowski commented on AIRFLOW-2272:
--

Trying in this PR: https://github.com/apache/incubator-airflow/pull/3182

> Travis CI is failing builds due to oracle-java8-installer failing to install 
> via apt-get
> 
>
> Key: AIRFLOW-2272
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2272
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ci
>Reporter: David Klosowski
>Priority: Critical
>  Labels: build-failure
>
> All the PR builds in TravisCI are failing with the following apt-get error:
>  
> {code:java}
> --2018-03-30 17:56:23-- (try: 5) 
> http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz
> Connecting to download.oracle.com (download.oracle.com)|23.45.144.164|:80... 
> failed: Connection timed out.
> Giving up.
> apt-get install failed
> $ cat ~/apt-get-update.log
> Ign:1 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty InRelease
> Hit:2 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-updates 
> InRelease
> Hit:3 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-backports 
> InRelease
> Hit:4 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty Release
> Ign:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease
> Ign:6 http://dl.google.com/linux/chrome/deb stable InRelease
> Hit:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release
> Hit:8 http://security.ubuntu.com/ubuntu trusty-security InRelease
> Ign:9 http://toolbelt.heroku.com/ubuntu ./ InRelease
> Get:10 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B]
> Hit:11 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty InRelease
> Hit:13 https://download.docker.com/linux/ubuntu trusty InRelease
> Hit:12 http://toolbelt.heroku.com/ubuntu ./ Release
> Hit:15 http://dl.google.com/linux/chrome/deb stable Release
> Hit:16 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease
> Hit:18 https://dl.hhvm.com/ubuntu trusty InRelease
> Ign:19 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease
> Hit:22 https://packagecloud.io/computology/apt-backport/ubuntu trusty 
> InRelease
> Hit:23 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease
> Hit:24 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty 
> InRelease
> Hit:25 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease
> Hit:26 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease
> Hit:27 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease
> Hit:28 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease
> Hit:29 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty Release
> Fetched 3,168 B in 2s (1,102 B/s)
> Reading package lists...
> W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: 
> Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest 
> algorithm (SHA1)
>  
> The command "sudo -E apt-get -yq --no-install-suggests 
> --no-install-recommends --force-yes install slapd ldap-utils openssh-server 
> mysql-server-5.6 mysql-client-core-5.6 mysql-client-5.6 krb5-user krb5-kdc 
> krb5-admin-server oracle-java8-installer python-selinux" failed and exited 
> with 100 during .{code}
>  
>  It looks like this is due to the configuration in the .travis.yml installing 
> {{oracle-java8-installer}}:
> {code}
>   apt:
> packages:
>   - slapd
>   - ldap-utils
>   - openssh-server
>   - mysql-server-5.6
>   - mysql-client-core-5.6
>   - mysql-client-5.6
>   - krb5-user
>   - krb5-kdc
>   - krb5-admin-server
>   - oracle-java8-installer
>   - python-selinux
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2272) Travis CI is failing builds due to oracle-java8-installer failing to install via apt-get

2018-04-02 Thread David Klosowski (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422853#comment-16422853
 ] 

David Klosowski commented on AIRFLOW-2272:
--

This other project seemed to state that the repo was unreliable.

[https://github.com/rakudo/rakudo/commit/c6d18b073e941d0d5a402024a9f9f3cded0f880c|https://github.com/rakudo/rakudo/issues/1646]

I'm going to try the 
oracle-java9-installer

> Travis CI is failing builds due to oracle-java8-installer failing to install 
> via apt-get
> 
>
> Key: AIRFLOW-2272
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2272
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ci
>Reporter: David Klosowski
>Priority: Critical
>  Labels: build-failure
>
> All the PR builds in TravisCI are failing with the following apt-get error:
>  
> {code:java}
> --2018-03-30 17:56:23-- (try: 5) 
> http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz
> Connecting to download.oracle.com (download.oracle.com)|23.45.144.164|:80... 
> failed: Connection timed out.
> Giving up.
> apt-get install failed
> $ cat ~/apt-get-update.log
> Ign:1 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty InRelease
> Hit:2 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-updates 
> InRelease
> Hit:3 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-backports 
> InRelease
> Hit:4 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty Release
> Ign:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease
> Ign:6 http://dl.google.com/linux/chrome/deb stable InRelease
> Hit:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release
> Hit:8 http://security.ubuntu.com/ubuntu trusty-security InRelease
> Ign:9 http://toolbelt.heroku.com/ubuntu ./ InRelease
> Get:10 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B]
> Hit:11 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty InRelease
> Hit:13 https://download.docker.com/linux/ubuntu trusty InRelease
> Hit:12 http://toolbelt.heroku.com/ubuntu ./ Release
> Hit:15 http://dl.google.com/linux/chrome/deb stable Release
> Hit:16 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease
> Hit:18 https://dl.hhvm.com/ubuntu trusty InRelease
> Ign:19 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease
> Hit:22 https://packagecloud.io/computology/apt-backport/ubuntu trusty 
> InRelease
> Hit:23 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease
> Hit:24 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty 
> InRelease
> Hit:25 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease
> Hit:26 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease
> Hit:27 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease
> Hit:28 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease
> Hit:29 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty Release
> Fetched 3,168 B in 2s (1,102 B/s)
> Reading package lists...
> W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: 
> Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest 
> algorithm (SHA1)
>  
> The command "sudo -E apt-get -yq --no-install-suggests 
> --no-install-recommends --force-yes install slapd ldap-utils openssh-server 
> mysql-server-5.6 mysql-client-core-5.6 mysql-client-5.6 krb5-user krb5-kdc 
> krb5-admin-server oracle-java8-installer python-selinux" failed and exited 
> with 100 during .{code}
>  
>  It looks like this is due to the configuration in the .travis.yml installing 
> {{oracle-java8-installer}}:
> {code}
>   apt:
> packages:
>   - slapd
>   - ldap-utils
>   - openssh-server
>   - mysql-server-5.6
>   - mysql-client-core-5.6
>   - mysql-client-5.6
>   - krb5-user
>   - krb5-kdc
>   - krb5-admin-server
>   - oracle-java8-installer
>   - python-selinux
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2272) Travis CI is failing builds due to oracle-java8-installer failing to install via apt-get

2018-04-02 Thread David Klosowski (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422841#comment-16422841
 ] 

David Klosowski commented on AIRFLOW-2272:
--

That's a good question, I don't have experience here either. 

I see references to an earlier issue on this in 2016:

[https://github.com/travis-ci/travis-ci/issues/6848]

It appears to have been an oracle issue.

> Travis CI is failing builds due to oracle-java8-installer failing to install 
> via apt-get
> 
>
> Key: AIRFLOW-2272
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2272
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ci
>Reporter: David Klosowski
>Priority: Critical
>  Labels: build-failure
>
> All the PR builds in TravisCI are failing with the following apt-get error:
>  
> {code:java}
> --2018-03-30 17:56:23-- (try: 5) 
> http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz
> Connecting to download.oracle.com (download.oracle.com)|23.45.144.164|:80... 
> failed: Connection timed out.
> Giving up.
> apt-get install failed
> $ cat ~/apt-get-update.log
> Ign:1 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty InRelease
> Hit:2 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-updates 
> InRelease
> Hit:3 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-backports 
> InRelease
> Hit:4 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty Release
> Ign:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease
> Ign:6 http://dl.google.com/linux/chrome/deb stable InRelease
> Hit:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release
> Hit:8 http://security.ubuntu.com/ubuntu trusty-security InRelease
> Ign:9 http://toolbelt.heroku.com/ubuntu ./ InRelease
> Get:10 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B]
> Hit:11 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty InRelease
> Hit:13 https://download.docker.com/linux/ubuntu trusty InRelease
> Hit:12 http://toolbelt.heroku.com/ubuntu ./ Release
> Hit:15 http://dl.google.com/linux/chrome/deb stable Release
> Hit:16 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease
> Hit:18 https://dl.hhvm.com/ubuntu trusty InRelease
> Ign:19 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease
> Hit:22 https://packagecloud.io/computology/apt-backport/ubuntu trusty 
> InRelease
> Hit:23 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease
> Hit:24 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty 
> InRelease
> Hit:25 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease
> Hit:26 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease
> Hit:27 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease
> Hit:28 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease
> Hit:29 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty Release
> Fetched 3,168 B in 2s (1,102 B/s)
> Reading package lists...
> W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: 
> Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest 
> algorithm (SHA1)
>  
> The command "sudo -E apt-get -yq --no-install-suggests 
> --no-install-recommends --force-yes install slapd ldap-utils openssh-server 
> mysql-server-5.6 mysql-client-core-5.6 mysql-client-5.6 krb5-user krb5-kdc 
> krb5-admin-server oracle-java8-installer python-selinux" failed and exited 
> with 100 during .{code}
>  
>  It looks like this is due to the configuration in the .travis.yml installing 
> {{oracle-java8-installer}}:
> {code}
>   apt:
> packages:
>   - slapd
>   - ldap-utils
>   - openssh-server
>   - mysql-server-5.6
>   - mysql-client-core-5.6
>   - mysql-client-5.6
>   - krb5-user
>   - krb5-kdc
>   - krb5-admin-server
>   - oracle-java8-installer
>   - python-selinux
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2272) Travis CI is failing builds due to oraclejdk8 failing to install

2018-03-30 Thread David Klosowski (JIRA)
David Klosowski created AIRFLOW-2272:


 Summary: Travis CI is failing builds due to oraclejdk8 failing to 
install
 Key: AIRFLOW-2272
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2272
 Project: Apache Airflow
  Issue Type: Bug
  Components: ci
Reporter: David Klosowski


All the PR builds in TravisCI are failing with the following apt-get error:
 
{code:java}
--2018-03-30 17:56:23-- (try: 5) 
http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz
Connecting to download.oracle.com (download.oracle.com)|23.45.144.164|:80... 
failed: Connection timed out.
Giving up.
apt-get install failed
$ cat ~/apt-get-update.log
Ign:1 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty InRelease
Hit:2 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-updates InRelease
Hit:3 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-backports 
InRelease
Hit:4 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty Release
Ign:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease
Ign:6 http://dl.google.com/linux/chrome/deb stable InRelease
Hit:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release
Hit:8 http://security.ubuntu.com/ubuntu trusty-security InRelease
Ign:9 http://toolbelt.heroku.com/ubuntu ./ InRelease
Get:10 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B]
Hit:11 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty InRelease
Hit:13 https://download.docker.com/linux/ubuntu trusty InRelease
Hit:12 http://toolbelt.heroku.com/ubuntu ./ Release
Hit:15 http://dl.google.com/linux/chrome/deb stable Release
Hit:16 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease
Hit:18 https://dl.hhvm.com/ubuntu trusty InRelease
Ign:19 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease
Hit:22 https://packagecloud.io/computology/apt-backport/ubuntu trusty InRelease
Hit:23 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease
Hit:24 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty InRelease
Hit:25 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease
Hit:26 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease
Hit:27 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease
Hit:28 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease
Hit:29 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty Release
Fetched 3,168 B in 2s (1,102 B/s)
Reading package lists...
W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: 
Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest 
algorithm (SHA1)
 
The command "sudo -E apt-get -yq --no-install-suggests --no-install-recommends 
--force-yes install slapd ldap-utils openssh-server mysql-server-5.6 
mysql-client-core-5.6 mysql-client-5.6 krb5-user krb5-kdc krb5-admin-server 
oracle-java8-installer python-selinux" failed and exited with 100 during .{code}
 
 It looks like this is due to the configuration in the .travis.yml installing 
{{oracle-java8-installer}}:

{code}
  apt:
packages:
  - slapd
  - ldap-utils
  - openssh-server
  - mysql-server-5.6
  - mysql-client-core-5.6
  - mysql-client-5.6
  - krb5-user
  - krb5-kdc
  - krb5-admin-server
  - oracle-java8-installer
  - python-selinux
{code}


 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2272) Travis CI is failing builds due to oracle-java8-installer failing to install via apt-get

2018-03-30 Thread David Klosowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Klosowski updated AIRFLOW-2272:
-
Summary: Travis CI is failing builds due to oracle-java8-installer failing 
to install via apt-get  (was: Travis CI is failing builds due to oraclejdk8 
failing to install)

> Travis CI is failing builds due to oracle-java8-installer failing to install 
> via apt-get
> 
>
> Key: AIRFLOW-2272
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2272
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ci
>Reporter: David Klosowski
>Priority: Critical
>  Labels: build-failure
>
> All the PR builds in TravisCI are failing with the following apt-get error:
>  
> {code:java}
> --2018-03-30 17:56:23-- (try: 5) 
> http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz
> Connecting to download.oracle.com (download.oracle.com)|23.45.144.164|:80... 
> failed: Connection timed out.
> Giving up.
> apt-get install failed
> $ cat ~/apt-get-update.log
> Ign:1 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty InRelease
> Hit:2 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-updates 
> InRelease
> Hit:3 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-backports 
> InRelease
> Hit:4 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty Release
> Ign:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease
> Ign:6 http://dl.google.com/linux/chrome/deb stable InRelease
> Hit:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release
> Hit:8 http://security.ubuntu.com/ubuntu trusty-security InRelease
> Ign:9 http://toolbelt.heroku.com/ubuntu ./ InRelease
> Get:10 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B]
> Hit:11 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty InRelease
> Hit:13 https://download.docker.com/linux/ubuntu trusty InRelease
> Hit:12 http://toolbelt.heroku.com/ubuntu ./ Release
> Hit:15 http://dl.google.com/linux/chrome/deb stable Release
> Hit:16 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease
> Hit:18 https://dl.hhvm.com/ubuntu trusty InRelease
> Ign:19 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease
> Hit:22 https://packagecloud.io/computology/apt-backport/ubuntu trusty 
> InRelease
> Hit:23 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease
> Hit:24 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty 
> InRelease
> Hit:25 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease
> Hit:26 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease
> Hit:27 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease
> Hit:28 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease
> Hit:29 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty Release
> Fetched 3,168 B in 2s (1,102 B/s)
> Reading package lists...
> W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: 
> Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest 
> algorithm (SHA1)
>  
> The command "sudo -E apt-get -yq --no-install-suggests 
> --no-install-recommends --force-yes install slapd ldap-utils openssh-server 
> mysql-server-5.6 mysql-client-core-5.6 mysql-client-5.6 krb5-user krb5-kdc 
> krb5-admin-server oracle-java8-installer python-selinux" failed and exited 
> with 100 during .{code}
>  
>  It looks like this is due to the configuration in the .travis.yml installing 
> {{oracle-java8-installer}}:
> {code}
>   apt:
> packages:
>   - slapd
>   - ldap-utils
>   - openssh-server
>   - mysql-server-5.6
>   - mysql-client-core-5.6
>   - mysql-client-5.6
>   - krb5-user
>   - krb5-kdc
>   - krb5-admin-server
>   - oracle-java8-installer
>   - python-selinux
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2178) Scheduler can't get past SLA check if SMTP settings are incorrect

2018-03-28 Thread David Klosowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Klosowski reassigned AIRFLOW-2178:


Assignee: David Klosowski

> Scheduler can't get past SLA check if SMTP settings are incorrect
> -
>
> Key: AIRFLOW-2178
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2178
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.9.0
> Environment: 16.04
>Reporter: James Meickle
>Assignee: David Klosowski
>Priority: Major
> Attachments: log.txt
>
>
> After testing Airflow for a while in staging, I provisioned our prod cluster 
> and enabled the first DAG on it. The "backfill" for this DAG performed just 
> fine, so I assumed everything was working and left it over the weekend.
> However, when the last "backfill" period completed and the scheduler 
> transitioned to the most recent execution date, it began failing in the 
> `manage_slas` method. Due to a configuration difference, SMTP was timing out 
> in production, preventing the SLA check from ever completing; this both 
> blocked SLA notifications, as well as prevented further tasks in this DAG 
> from ever getting scheduled.
> As an operator, I would expect AIrflow to treat scheduling tasks as a 
> higher-priority concern, and to do so even f the SLA feature fails to work. I 
> would also expect Airflow to notify me in the web UI that email sending is 
> not currently working.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2236) Airflow SLA is triggered for all backfilled tasks

2018-03-27 Thread David Klosowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Klosowski reassigned AIRFLOW-2236:


Assignee: (was: David Klosowski)

> Airflow SLA is triggered for all backfilled tasks
> -
>
> Key: AIRFLOW-2236
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2236
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: backfill
>Affects Versions: Airflow 1.8, 1.8.1, 1.8.0, 1.9.0, 1.8.2
>Reporter: barak schoster
>Priority: Major
>
> While executing a task with a historical execution date and a schedule 
> interval of 1 hour, and SLA of 1 hours - all backfill instances appear as 
> sla_miss - though the duration of each task was not exceeded.
>  
> I think that this is a result of comparing the tasks end time to utc.now at 
> https://github.com/apache/incubator-airflow/blob/7cc6d8a5645b8974f07e132cc3c4820e880fd3ce/airflow/jobs.py#L624



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2236) Airflow SLA is triggered for all backfilled tasks

2018-03-27 Thread David Klosowski (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416526#comment-16416526
 ] 

David Klosowski commented on AIRFLOW-2236:
--

Actually, this is not so straightforward.  SLA's would have to change 
significantly in order to support this.  SLAs are governed in a way that is 
disconnected from the actual task instances. 

> Airflow SLA is triggered for all backfilled tasks
> -
>
> Key: AIRFLOW-2236
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2236
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: backfill
>Affects Versions: Airflow 1.8, 1.8.1, 1.8.0, 1.9.0, 1.8.2
>Reporter: barak schoster
>Assignee: David Klosowski
>Priority: Major
>
> While executing a task with a historical execution date and a schedule 
> interval of 1 hour, and SLA of 1 hours - all backfill instances appear as 
> sla_miss - though the duration of each task was not exceeded.
>  
> I think that this is a result of comparing the tasks end time to utc.now at 
> https://github.com/apache/incubator-airflow/blob/7cc6d8a5645b8974f07e132cc3c4820e880fd3ce/airflow/jobs.py#L624



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2236) Airflow SLA is triggered for all backfilled tasks

2018-03-27 Thread David Klosowski (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416267#comment-16416267
 ] 

David Klosowski commented on AIRFLOW-2236:
--

An idea I had was to only apply the SLA to the most recent task run.  I think 
that's do-able and seems reasonable.

> Airflow SLA is triggered for all backfilled tasks
> -
>
> Key: AIRFLOW-2236
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2236
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: backfill
>Affects Versions: Airflow 1.8, 1.8.1, 1.8.0, 1.9.0, 1.8.2
>Reporter: barak schoster
>Assignee: David Klosowski
>Priority: Major
>
> While executing a task with a historical execution date and a schedule 
> interval of 1 hour, and SLA of 1 hours - all backfill instances appear as 
> sla_miss - though the duration of each task was not exceeded.
>  
> I think that this is a result of comparing the tasks end time to utc.now at 
> https://github.com/apache/incubator-airflow/blob/7cc6d8a5645b8974f07e132cc3c4820e880fd3ce/airflow/jobs.py#L624



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2236) Airflow SLA is triggered for all backfilled tasks

2018-03-26 Thread David Klosowski (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415045#comment-16415045
 ] 

David Klosowski commented on AIRFLOW-2236:
--

I agree, this is an issue that can be resolved.  The big question is if to 
handle the backfill case akin to the non-backfill (current) case or if treating 
them separately makes sense. The feeling is that this should be SLA_MISS = if 
(task_end_datetime - task_start_datetime) > sla_interval.  I'd say that the 
non-backfill case has an argument for this to be based on execution_date + 
schedule_interval (intended start_date, so just substitute for 
task_start_date).  I'll ponder this more but not a fan of logical bifurcation 
as it creates more complexity and potential confusion.

> Airflow SLA is triggered for all backfilled tasks
> -
>
> Key: AIRFLOW-2236
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2236
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: backfill
>Affects Versions: Airflow 1.8, 1.8.1, 1.8.0, 1.9.0, 1.8.2
>Reporter: barak schoster
>Assignee: David Klosowski
>Priority: Major
>
> While executing a task with a historical execution date and a schedule 
> interval of 1 hour, and SLA of 1 hours - all backfill instances appear as 
> sla_miss - though the duration of each task was not exceeded.
>  
> I think that this is a result of comparing the tasks end time to utc.now at 
> https://github.com/apache/incubator-airflow/blob/7cc6d8a5645b8974f07e132cc3c4820e880fd3ce/airflow/jobs.py#L624



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2236) Airflow SLA is triggered for all backfilled tasks

2018-03-26 Thread David Klosowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Klosowski reassigned AIRFLOW-2236:


Assignee: David Klosowski

> Airflow SLA is triggered for all backfilled tasks
> -
>
> Key: AIRFLOW-2236
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2236
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: backfill
>Affects Versions: Airflow 1.8, 1.8.1, 1.8.0, 1.9.0, 1.8.2
>Reporter: barak schoster
>Assignee: David Klosowski
>Priority: Major
>
> While executing a task with a historical execution date and a schedule 
> interval of 1 hour, and SLA of 1 hours - all backfill instances appear as 
> sla_miss - though the duration of each task was not exceeded.
>  
> I think that this is a result of comparing the tasks end time to utc.now at 
> https://github.com/apache/incubator-airflow/blob/7cc6d8a5645b8974f07e132cc3c4820e880fd3ce/airflow/jobs.py#L624



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2251) Add Thinknear as an Airflow user

2018-03-26 Thread David Klosowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Klosowski resolved AIRFLOW-2251.
--
Resolution: Done

> Add Thinknear as an Airflow user
> 
>
> Key: AIRFLOW-2251
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2251
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: project-management
>Reporter: David Klosowski
>Priority: Minor
>
> Add OfferUp as an Airflow user



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2251) Add Thinknear as an Airflow user

2018-03-26 Thread David Klosowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Klosowski reassigned AIRFLOW-2251:


Assignee: David Klosowski

> Add Thinknear as an Airflow user
> 
>
> Key: AIRFLOW-2251
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2251
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: project-management
>Reporter: David Klosowski
>Assignee: David Klosowski
>Priority: Minor
>
> Add OfferUp as an Airflow user



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2251) Add Thinknear as an Airflow user

2018-03-23 Thread David Klosowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Klosowski reassigned AIRFLOW-2251:


Assignee: (was: Jakob Homan)

> Add Thinknear as an Airflow user
> 
>
> Key: AIRFLOW-2251
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2251
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: project-management
>Reporter: David Klosowski
>Priority: Minor
>
> Add OfferUp as an Airflow user



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2251) Add Thinknear as an Airflow user

2018-03-23 Thread David Klosowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Klosowski updated AIRFLOW-2251:
-
External issue URL: https://github.com/apache/incubator-airflow/pull/3155  
(was: https://github.com/apache/incubator-airflow/pull/1814)

> Add Thinknear as an Airflow user
> 
>
> Key: AIRFLOW-2251
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2251
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: project-management
>Reporter: David Klosowski
>Priority: Minor
>
> Add OfferUp as an Airflow user



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2251) Add Thinknear as an Airflow user

2018-03-23 Thread David Klosowski (JIRA)
David Klosowski created AIRFLOW-2251:


 Summary: Add Thinknear as an Airflow user
 Key: AIRFLOW-2251
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2251
 Project: Apache Airflow
  Issue Type: Improvement
  Components: project-management
Reporter: David Klosowski
Assignee: Jakob Homan


Add OfferUp as an Airflow user



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-1157) Assigning a task to a pool that doesn't exist crashes the scheduler

2018-03-19 Thread David Klosowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Klosowski reassigned AIRFLOW-1157:


Assignee: Fokko Driesprong  (was: David Klosowski)

> Assigning a task to a pool that doesn't exist crashes the scheduler
> ---
>
> Key: AIRFLOW-1157
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1157
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.8
>Reporter: John Culver
>Assignee: Fokko Driesprong
>Priority: Critical
> Fix For: 2.0.0
>
>
> If a dag is run that contains a task using a pool that doesn't exist, the 
> scheduler will crash.
> Manually triggering the run of this dag on an environment without a pool 
> named 'a_non_existent_pool' will crash the scheduler:
> {code}
> from datetime import datetime
> from airflow.models import DAG
> from airflow.operators.dummy_operator import DummyOperator
> dag = DAG(dag_id='crash_scheduler',
>   start_date=datetime(2017,1,1),
>   schedule_interval=None)
> t1 = DummyOperator(task_id='crash',
>pool='a_non_existent_pool',
>dag=dag)
> {code}
> Here is the relevant log output on the scheduler:
> {noformat}
> [2017-04-27 19:31:24,816] {dag_processing.py:559} INFO - Processor for 
> /opt/airflow/dags/test-3.py finished
> [2017-04-27 19:31:24,817] {dag_processing.py:559} INFO - Processor for 
> /opt/airflow/dags/test_s3_file_move.py finished
> [2017-04-27 19:31:24,819] {dag_processing.py:627} INFO - Started a process 
> (PID: 124) to generate tasks for /opt/airflow/dags/crash_scheduler.py - 
> logging into /tmp/airflow/scheduler/logs/2017-04-27/crash_scheduler.py.log
> [2017-04-27 19:31:24,822] {dag_processing.py:627} INFO - Started a process 
> (PID: 125) to generate tasks for /opt/airflow/dags/configuration/constants.py 
> - logging into 
> /tmp/airflow/scheduler/logs/2017-04-27/configuration/constants.py.log
> [2017-04-27 19:31:24,847] {jobs.py:1007} INFO - Tasks up for execution:
>  19:31:22.298893 [scheduled]>
> [2017-04-27 19:31:24,849] {jobs.py:1030} INFO - Figuring out tasks to run in 
> Pool(name=None) with 128 open slots and 1 task instances in queue
> [2017-04-27 19:31:24,856] {jobs.py:1078} INFO - DAG move_s3_file_test has 
> 0/16 running tasks
> [2017-04-27 19:31:24,856] {jobs.py:1105} INFO - Sending to executor 
> (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, 
> 22, 298893)) with priority 1 and queue MVSANDBOX-airflow-DEV-dev
> [2017-04-27 19:31:24,859] {jobs.py:1116} INFO - Setting state of 
> (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, 
> 22, 298893)) to queued
> [2017-04-27 19:31:24,867] {base_executor.py:50} INFO - Adding to queue: 
> airflow run move_s3_file_test move_files 2017-04-27T19:31:22.298893 --local 
> -sd /opt/airflow/dags/test_s3_file_move.py
> [2017-04-27 19:31:24,867] {jobs.py:1440} INFO - Heartbeating the executor
> [2017-04-27 19:31:24,872] {celery_executor.py:78} INFO - [celery] queuing 
> (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, 
> 22, 298893)) through celery, queue=MVSANDBOX-airflow-DEV-dev
> [2017-04-27 19:31:25,974] {jobs.py:1404} INFO - Heartbeating the process 
> manager
> [2017-04-27 19:31:25,975] {dag_processing.py:559} INFO - Processor for 
> /opt/airflow/dags/crash_scheduler.py finished
> [2017-04-27 19:31:25,975] {dag_processing.py:559} INFO - Processor for 
> /opt/airflow/dags/configuration/constants.py finished
> [2017-04-27 19:31:25,977] {dag_processing.py:627} INFO - Started a process 
> (PID: 128) to generate tasks for /opt/airflow/dags/example_s3_sensor.py - 
> logging into /tmp/airflow/scheduler/logs/2017-04-27/example_s3_sensor.py.log
> [2017-04-27 19:31:25,980] {dag_processing.py:627} INFO - Started a process 
> (PID: 129) to generate tasks for /opt/airflow/dags/test-4.py - logging into 
> /tmp/airflow/scheduler/logs/2017-04-27/test-4.py.log
> [2017-04-27 19:31:26,004] {jobs.py:1007} INFO - Tasks up for execution:
>  [scheduled]>
> [2017-04-27 19:31:26,006] {jobs.py:1311} INFO - Exited execute loop
> [2017-04-27 19:31:26,008] {jobs.py:1325} INFO - Terminating child PID: 128
> [2017-04-27 19:31:26,008] {jobs.py:1325} INFO - Terminating child PID: 129
> [2017-04-27 19:31:26,008] {jobs.py:1329} INFO - Waiting up to 5s for 
> processes to exit...
> Traceback (most recent call last):
>   File "/usr/bin/airflow", line 28, in 
> args.func(args)
>   File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 839, in 
> scheduler
> job.run()
>   File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 200, in run
> self._execute()
>   File 

[jira] [Assigned] (AIRFLOW-1157) Assigning a task to a pool that doesn't exist crashes the scheduler

2018-02-06 Thread David Klosowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Klosowski reassigned AIRFLOW-1157:


Assignee: David Klosowski

> Assigning a task to a pool that doesn't exist crashes the scheduler
> ---
>
> Key: AIRFLOW-1157
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1157
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.8
>Reporter: John Culver
>Assignee: David Klosowski
>Priority: Critical
>
> If a dag is run that contains a task using a pool that doesn't exist, the 
> scheduler will crash.
> Manually triggering the run of this dag on an environment without a pool 
> named 'a_non_existent_pool' will crash the scheduler:
> {code}
> from datetime import datetime
> from airflow.models import DAG
> from airflow.operators.dummy_operator import DummyOperator
> dag = DAG(dag_id='crash_scheduler',
>   start_date=datetime(2017,1,1),
>   schedule_interval=None)
> t1 = DummyOperator(task_id='crash',
>pool='a_non_existent_pool',
>dag=dag)
> {code}
> Here is the relevant log output on the scheduler:
> {noformat}
> [2017-04-27 19:31:24,816] {dag_processing.py:559} INFO - Processor for 
> /opt/airflow/dags/test-3.py finished
> [2017-04-27 19:31:24,817] {dag_processing.py:559} INFO - Processor for 
> /opt/airflow/dags/test_s3_file_move.py finished
> [2017-04-27 19:31:24,819] {dag_processing.py:627} INFO - Started a process 
> (PID: 124) to generate tasks for /opt/airflow/dags/crash_scheduler.py - 
> logging into /tmp/airflow/scheduler/logs/2017-04-27/crash_scheduler.py.log
> [2017-04-27 19:31:24,822] {dag_processing.py:627} INFO - Started a process 
> (PID: 125) to generate tasks for /opt/airflow/dags/configuration/constants.py 
> - logging into 
> /tmp/airflow/scheduler/logs/2017-04-27/configuration/constants.py.log
> [2017-04-27 19:31:24,847] {jobs.py:1007} INFO - Tasks up for execution:
>  19:31:22.298893 [scheduled]>
> [2017-04-27 19:31:24,849] {jobs.py:1030} INFO - Figuring out tasks to run in 
> Pool(name=None) with 128 open slots and 1 task instances in queue
> [2017-04-27 19:31:24,856] {jobs.py:1078} INFO - DAG move_s3_file_test has 
> 0/16 running tasks
> [2017-04-27 19:31:24,856] {jobs.py:1105} INFO - Sending to executor 
> (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, 
> 22, 298893)) with priority 1 and queue MVSANDBOX-airflow-DEV-dev
> [2017-04-27 19:31:24,859] {jobs.py:1116} INFO - Setting state of 
> (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, 
> 22, 298893)) to queued
> [2017-04-27 19:31:24,867] {base_executor.py:50} INFO - Adding to queue: 
> airflow run move_s3_file_test move_files 2017-04-27T19:31:22.298893 --local 
> -sd /opt/airflow/dags/test_s3_file_move.py
> [2017-04-27 19:31:24,867] {jobs.py:1440} INFO - Heartbeating the executor
> [2017-04-27 19:31:24,872] {celery_executor.py:78} INFO - [celery] queuing 
> (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, 
> 22, 298893)) through celery, queue=MVSANDBOX-airflow-DEV-dev
> [2017-04-27 19:31:25,974] {jobs.py:1404} INFO - Heartbeating the process 
> manager
> [2017-04-27 19:31:25,975] {dag_processing.py:559} INFO - Processor for 
> /opt/airflow/dags/crash_scheduler.py finished
> [2017-04-27 19:31:25,975] {dag_processing.py:559} INFO - Processor for 
> /opt/airflow/dags/configuration/constants.py finished
> [2017-04-27 19:31:25,977] {dag_processing.py:627} INFO - Started a process 
> (PID: 128) to generate tasks for /opt/airflow/dags/example_s3_sensor.py - 
> logging into /tmp/airflow/scheduler/logs/2017-04-27/example_s3_sensor.py.log
> [2017-04-27 19:31:25,980] {dag_processing.py:627} INFO - Started a process 
> (PID: 129) to generate tasks for /opt/airflow/dags/test-4.py - logging into 
> /tmp/airflow/scheduler/logs/2017-04-27/test-4.py.log
> [2017-04-27 19:31:26,004] {jobs.py:1007} INFO - Tasks up for execution:
>  [scheduled]>
> [2017-04-27 19:31:26,006] {jobs.py:1311} INFO - Exited execute loop
> [2017-04-27 19:31:26,008] {jobs.py:1325} INFO - Terminating child PID: 128
> [2017-04-27 19:31:26,008] {jobs.py:1325} INFO - Terminating child PID: 129
> [2017-04-27 19:31:26,008] {jobs.py:1329} INFO - Waiting up to 5s for 
> processes to exit...
> Traceback (most recent call last):
>   File "/usr/bin/airflow", line 28, in 
> args.func(args)
>   File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 839, in 
> scheduler
> job.run()
>   File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 200, in run
> self._execute()
>   File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 1309, in 
> _execute
> 

[jira] [Updated] (AIRFLOW-1767) Airflow Scheduler no longer schedules DAGs

2017-10-31 Thread David Klosowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Klosowski updated AIRFLOW-1767:
-
Description: 
The Airflow Scheduler no longer schedules DAGs after this commit on master:

https://github.com/apache/incubator-airflow/commit/73549763eac74142b7c4018422bb2f8c897b45a8

Workers never receive any tasks and the scheduler never adjusts DAG state.




  was:
The Airflow Scheduler no longer schedules DAGs after this commit on master:

https://github.com/apache/incubator-airflow/commit/73549763eac74142b7c4018422bb2f8c897b45a8






> Airflow Scheduler no longer schedules DAGs
> --
>
> Key: AIRFLOW-1767
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1767
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.9.0, 1.10.0
> Environment: CeleryExecutor, Docker, 3 Workers
>Reporter: David Klosowski
>Priority: Blocker
>
> The Airflow Scheduler no longer schedules DAGs after this commit on master:
> https://github.com/apache/incubator-airflow/commit/73549763eac74142b7c4018422bb2f8c897b45a8
> Workers never receive any tasks and the scheduler never adjusts DAG state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1767) Airflow Scheduler no longer schedules DAGs

2017-10-31 Thread David Klosowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Klosowski updated AIRFLOW-1767:
-
Summary: Airflow Scheduler no longer schedules DAGs  (was: Airflow 
Scheduler no longer works)

> Airflow Scheduler no longer schedules DAGs
> --
>
> Key: AIRFLOW-1767
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1767
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.9.0, 1.10.0
> Environment: CeleryExecutor, Docker, 3 Workers
>Reporter: David Klosowski
>Priority: Blocker
>
> The Airflow Scheduler no longer schedules DAGs after this commit on master:
> https://github.com/apache/incubator-airflow/commit/73549763eac74142b7c4018422bb2f8c897b45a8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1767) Airflow Scheduler no longer works

2017-10-31 Thread David Klosowski (JIRA)
David Klosowski created AIRFLOW-1767:


 Summary: Airflow Scheduler no longer works
 Key: AIRFLOW-1767
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1767
 Project: Apache Airflow
  Issue Type: Bug
  Components: scheduler
Affects Versions: 1.9.0, 1.10.0
 Environment: CeleryExecutor, Docker, 3 Workers
Reporter: David Klosowski
Priority: Blocker


The Airflow Scheduler no longer schedules DAGs after this commit on master:

https://github.com/apache/incubator-airflow/commit/73549763eac74142b7c4018422bb2f8c897b45a8







--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-288) Make system timezone configurable

2017-03-06 Thread David Klosowski (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898590#comment-15898590
 ] 

David Klosowski commented on AIRFLOW-288:
-

Thoughts on support for this?  It would be nice to see this at the DAG level so 
individual DAGs could be triggered in a timezone aware way by the the 
scheduler.  

> Make system timezone configurable
> -
>
> Key: AIRFLOW-288
> URL: https://issues.apache.org/jira/browse/AIRFLOW-288
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Vineet Goel
>Assignee: Vineet Goel
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-392) DAG runs on strange schedule in the past when deployed

2016-08-04 Thread David Klosowski (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408416#comment-15408416
 ] 

David Klosowski commented on AIRFLOW-392:
-

So not sure what to say here.  There "was" (not sure of any additional 
strangeness) definitely some strangeness in the scheduler and there are a good 
number of fixes (including the one mentioned above).  Perhaps this can be 
closed?

> DAG runs on strange schedule in the past when deployed
> --
>
> Key: AIRFLOW-392
> URL: https://issues.apache.org/jira/browse/AIRFLOW-392
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.3
> Environment: AWS ElasticBeanstalk as a Docker application running in 
> an Ubuntu-based container
>Reporter: David Klosowski
>Assignee: Norman Mu
>
> Just deployed a new DAG ('weekly-no-track') that depends on 7 DAG task runs 
> of another DAG ('daily-no-track').  When the DAG is deployed the scheduler 
> schedules and runs multiple runs in the past (yesterday it ran for 6/12/2016 
> and 6/05/2016), despite the start date set to the deployment date.  
> It would be a bit difficult to include all the code being used in the DAG 
> since we have multiple libraries we've built in Python that are being 
> referenced here that we want to eventually open source.  I've included some 
> of the code here.  Let me know if this is all clear and what I can do to help 
> or if any insight can be provided as to what it is occurring and how we might 
> fix this.
> {code}
> from __future__ import division, print_function
> from airflow.models import DAG
> from airflow.operators import DummyOperator, ExternalTaskSensor, 
> TimeDeltaSensor
> from tn_etl_tools.aws.emr import EmrConfig, HiveConfig, read_cluster_templates
> from tn_etl_tools.aws.emr import EmrService, EmrServiceWrapper, 
> HiveStepBuilder
> from tn_etl_tools.datesupport import ts_add
> from tn_etl_tools.hive import HivePartitions
> from tn_etl_tools.yaml import YamlLoader
> from datetime import datetime, timedelta
> from dateutil.relativedelta import relativedelta, SU, MO , TU, WE, TH, FR, SA
> from common_args import merge_dicts, CommonHiveParams
> from operator_builders import add_tasks, emr_hive_operator
> import os
> # === configs
> config_dir = os.getenv('DAG_CONFIG_DIR', '/usr/local/airflow/config')
> alert_email = os.getenv('AIRFLOW_TO_EMAIL')
> app_properties = YamlLoader.load_yaml(config_dir + '/app.yml')
> emr_cluster_properties = YamlLoader.load_yaml(config_dir + 
> '/emr_clusters.yml')
> emr_config = EmrConfig.load(STAGE=app_properties['STAGE'], 
> **app_properties['EMR'])
> hive_config = HiveConfig.load(STAGE=app_properties['STAGE'], 
> **app_properties['HIVE'])
> emr_cluster_templates = read_cluster_templates(emr_cluster_properties)
> # === /configs
> # TODO: force execution_date = sunday?
> run_for_date = datetime(2016, 8, 8)
> emr_service = EmrService()
> emr_service_wrapper = EmrServiceWrapper(emr_service=emr_service,
> emr_config=emr_config, 
> cluster_templates=emr_cluster_templates)
> hive_step_builder = HiveStepBuilder(hive_config=hive_config)
> hive_params = CommonHiveParams(app_properties_hive=app_properties['HIVE'])
> args = {'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': run_for_date,
> 'email': [alert_email],
> 'email_on_failure': True,
> 'email_on_retry': False,
> 'retries': 1,
> 'trigger_rule' : 'all_success',
> 'emr_service_wrapper': emr_service_wrapper,
> 'hive_step_builder': hive_step_builder}
> user_defined_macros = {'hive_partitions': HivePartitions,
>'ts_add': ts_add}
> params = {'stage': app_properties['STAGE']}
> dag = DAG(dag_id='weekly_no_track', default_args=args, 
> user_defined_macros=user_defined_macros, params=params,
>   schedule_interval=timedelta(days=7),
>   max_active_runs=1)
> # === task definitions
> task_definitions = {
> 'wait-for-dailies': {
> 'operator_type': 'dummy_operator', # hub for custom defined 
> dependencies
> 'operator_args': {},
> 'depends_on': []
> },
> 'weekly-no-track': {
> 'operator_type': 'emr_hive_operator',
> 'operator_args': {
> 'hive_step': {
> 'script': 'weekly-no-track-airflow',  # temporary modified 
> script with separate output path
> 'cluster_name': 'geoprofile',
> 'script_vars': merge_dicts(hive_params.default_params(), 
> hive_params.rundate_params(), {
> 'PARTITIONS': '{{hive_partitions.by_day(ts_add(ts, 
> days=-6), ts_add(ts, days=1))}}',
> }),
> }
> },
> 'depends_on': 

[jira] [Comment Edited] (AIRFLOW-392) DAG runs on strange schedule in the past when deployed

2016-08-04 Thread David Klosowski (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408413#comment-15408413
 ] 

David Klosowski edited comment on AIRFLOW-392 at 8/4/16 8:09 PM:
-

OK, we fixed the issue.  I'll be quite honest, I'm not 100% sure what fixed the 
problem; however, the thought is inconsistency in dag_run state in the 
database; which was resolved by clearing the db and re-deploying the dag 
(changed the start_date).  We also adjusted the above creation of 
{{ExternalTaskSensor}} by changing delta to {{delta = 
relativedelta(weekday=weekday(1))} from -1 so it looks forward instead of 
backward.  The interesting thing is that behavior we were seeing is rather 
consistent with the existing code in version 1.7.1.3 (that we are running) in 
this block of {{jobs.py}}'s {{schedule_dag}} method:

{code}
   if dag.schedule_interval == '@once' and not last_scheduled_run:
next_run_date = datetime.now()
elif not last_scheduled_run:
# First run
TI = models.TaskInstance
latest_run = (
session.query(func.max(TI.execution_date))
.filter_by(dag_id=dag.dag_id)
.scalar()
)
if latest_run:
# Migrating from previous version
# make the past 5 runs active
next_run_date = dag.date_range(latest_run, -5)[0]
else:
task_start_dates = [t.start_date for t in dag.tasks]
if task_start_dates:
next_run_date = min(task_start_dates)
else:
next_run_date = None
{code}

Notice the block:

{code}
# Migrating from previous version
# make the past 5 runs active
next_run_date = dag.date_range(latest_run, -5)[0]
{code}

We were getting dag runs in the past.  My imagination is that the 
{{latest_run}} condition would be false if there are no {{TaskInstance}} s and 
thus no {{execution_date}} s for them on the first run of this (new Dag -> 
First DagRun); however, on subsequent invocations it would create the a 
{{DagRun}} from 5 prior schedule intervals.  Not sure why this didn't happen 
again though.

However, that logic has since changed in from AIRFLOW-168 (on master):

{code}
if not last_scheduled_run:
# First run
task_start_dates = [t.start_date for t in dag.tasks]
if task_start_dates:
next_run_date = min(task_start_dates)
else:
next_run_date = dag.following_schedule(last_scheduled_run)
{code}



was (Author: d3cay):
OK, we fixed the issue.  I'll be quite honest, I'm not 100% sure what fixed the 
problem; however, the thought is inconsistency in dag_run state in the 
database; which was resolved by clearing the db and re-deploying the dag 
(changed the start_date).  We also adjusted the above creation of 
{{ExternalTaskSensor}} by changing delta to {{delta = 
relativedelta(weekday=weekday(1))} from -1 so it looks forward instead of 
backward.  The interesting thing is that behavior we were seeing is rather 
consistent with the existing code in version 1.7.1.3 (that we are running) in 
this block of {{jobs.py}}'s {{schedule_dag}} method:

{code}
   if dag.schedule_interval == '@once' and not last_scheduled_run:
next_run_date = datetime.now()
elif not last_scheduled_run:
# First run
TI = models.TaskInstance
latest_run = (
session.query(func.max(TI.execution_date))
.filter_by(dag_id=dag.dag_id)
.scalar()
)
if latest_run:
# Migrating from previous version
# make the past 5 runs active
next_run_date = dag.date_range(latest_run, -5)[0]
else:
task_start_dates = [t.start_date for t in dag.tasks]
if task_start_dates:
next_run_date = min(task_start_dates)
else:
next_run_date = None
{code}

Notice the block:

{code}
# Migrating from previous version
# make the past 5 runs active
next_run_date = dag.date_range(latest_run, -5)[0]
{code}

We were getting dag runs in the past.  My imagination is that the 
{{latest_run}} condition would be false if there are no {{TaskInstance}}s and 
thus no {{execution_date}} s for them on the first run of this (new Dag -> 
First DagRun); however, on subsequent invocations it would create the a 
{{DagRun}} from 5 prior schedule intervals.  Not sure why this didn't happen 
again 

[jira] [Commented] (AIRFLOW-392) DAG runs on strange schedule in the past when deployed

2016-08-04 Thread David Klosowski (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408413#comment-15408413
 ] 

David Klosowski commented on AIRFLOW-392:
-

OK, we fixed the issue.  I'll be quite honest, I'm not 100% sure what fixed the 
problem; however, the thought is inconsistency in dag_run state in the 
database; which was resolved by clearing the db and re-deploying the dag 
(changed the start_date).  We also adjusted the above creation of 
{{ExternalTaskSensor}} by changing delta to {{delta = 
relativedelta(weekday=weekday(1))} from -1 so it looks forward instead of 
backward.  The interesting thing is that behavior we were seeing is rather 
consistent with the existing code in version 1.7.1.3 (that we are running) in 
this block of {{jobs.py}}'s {{schedule_dag}} method:

{code}
   if dag.schedule_interval == '@once' and not last_scheduled_run:
next_run_date = datetime.now()
elif not last_scheduled_run:
# First run
TI = models.TaskInstance
latest_run = (
session.query(func.max(TI.execution_date))
.filter_by(dag_id=dag.dag_id)
.scalar()
)
if latest_run:
# Migrating from previous version
# make the past 5 runs active
next_run_date = dag.date_range(latest_run, -5)[0]
else:
task_start_dates = [t.start_date for t in dag.tasks]
if task_start_dates:
next_run_date = min(task_start_dates)
else:
next_run_date = None
{code}

Notice the block:

{code}
# Migrating from previous version
# make the past 5 runs active
next_run_date = dag.date_range(latest_run, -5)[0]
{code}

We were getting dag runs in the past.  My imagination is that the 
{{latest_run}} condition would be false if there are no {{TaskInstance}}s and 
thus no {{execution_date}}s for them on the first run of this (new Dag -> First 
DagRun); however, on subsequent invocations it would create the a {{DagRun}} 
from 5 prior schedule intervals.  Not sure why this didn't happen again though.

However, that logic has since changed in from AIRFLOW-168 (on master):

{code}
if not last_scheduled_run:
# First run
task_start_dates = [t.start_date for t in dag.tasks]
if task_start_dates:
next_run_date = min(task_start_dates)
else:
next_run_date = dag.following_schedule(last_scheduled_run)
{code}


> DAG runs on strange schedule in the past when deployed
> --
>
> Key: AIRFLOW-392
> URL: https://issues.apache.org/jira/browse/AIRFLOW-392
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.3
> Environment: AWS ElasticBeanstalk as a Docker application running in 
> an Ubuntu-based container
>Reporter: David Klosowski
>Assignee: Norman Mu
>
> Just deployed a new DAG ('weekly-no-track') that depends on 7 DAG task runs 
> of another DAG ('daily-no-track').  When the DAG is deployed the scheduler 
> schedules and runs multiple runs in the past (yesterday it ran for 6/12/2016 
> and 6/05/2016), despite the start date set to the deployment date.  
> It would be a bit difficult to include all the code being used in the DAG 
> since we have multiple libraries we've built in Python that are being 
> referenced here that we want to eventually open source.  I've included some 
> of the code here.  Let me know if this is all clear and what I can do to help 
> or if any insight can be provided as to what it is occurring and how we might 
> fix this.
> {code}
> from __future__ import division, print_function
> from airflow.models import DAG
> from airflow.operators import DummyOperator, ExternalTaskSensor, 
> TimeDeltaSensor
> from tn_etl_tools.aws.emr import EmrConfig, HiveConfig, read_cluster_templates
> from tn_etl_tools.aws.emr import EmrService, EmrServiceWrapper, 
> HiveStepBuilder
> from tn_etl_tools.datesupport import ts_add
> from tn_etl_tools.hive import HivePartitions
> from tn_etl_tools.yaml import YamlLoader
> from datetime import datetime, timedelta
> from dateutil.relativedelta import relativedelta, SU, MO , TU, WE, TH, FR, SA
> from common_args import merge_dicts, CommonHiveParams
> from operator_builders import add_tasks, emr_hive_operator
> import os
> # === configs
> config_dir = os.getenv('DAG_CONFIG_DIR', '/usr/local/airflow/config')
> alert_email = os.getenv('AIRFLOW_TO_EMAIL')
> app_properties = YamlLoader.load_yaml(config_dir + '/app.yml')
> 

[jira] [Created] (AIRFLOW-392) DAG runs on strange schedule in the past when deployed

2016-08-03 Thread David Klosowski (JIRA)
David Klosowski created AIRFLOW-392:
---

 Summary: DAG runs on strange schedule in the past when deployed
 Key: AIRFLOW-392
 URL: https://issues.apache.org/jira/browse/AIRFLOW-392
 Project: Apache Airflow
  Issue Type: Bug
  Components: scheduler
Affects Versions: Airflow 1.7.1.3
 Environment: AWS ElasticBeanstalk as a Docker application running in 
an Ubuntu-based container
Reporter: David Klosowski
Assignee: Siddharth Anand


Just deployed a new DAG ('weekly-no-track') that depends on 7 DAG task runs of 
another DAG ('daily-no-track').  When the DAG is deployed the scheduler 
schedules and runs multiple runs in the past (yesterday it ran for 6/12/2016 
and 6/05/2016), despite the start date set to the deployment date.  

It would be a bit difficult to include all the code being used in the DAG since 
we have multiple libraries we've built in Python that are being referenced here 
that we want to eventually open source.  I've included some of the code here.  
Let me know if this is all clear and what I can do to help or if any insight 
can be provided as to what it is occurring and how we might fix this.

{code}
from __future__ import division, print_function
from airflow.models import DAG
from airflow.operators import DummyOperator, ExternalTaskSensor, TimeDeltaSensor
from tn_etl_tools.aws.emr import EmrConfig, HiveConfig, read_cluster_templates
from tn_etl_tools.aws.emr import EmrService, EmrServiceWrapper, HiveStepBuilder
from tn_etl_tools.datesupport import ts_add
from tn_etl_tools.hive import HivePartitions
from tn_etl_tools.yaml import YamlLoader
from datetime import datetime, timedelta
from dateutil.relativedelta import relativedelta, SU, MO , TU, WE, TH, FR, SA

from common_args import merge_dicts, CommonHiveParams
from operator_builders import add_tasks, emr_hive_operator

import os


# === configs

config_dir = os.getenv('DAG_CONFIG_DIR', '/usr/local/airflow/config')
alert_email = os.getenv('AIRFLOW_TO_EMAIL')
app_properties = YamlLoader.load_yaml(config_dir + '/app.yml')
emr_cluster_properties = YamlLoader.load_yaml(config_dir + '/emr_clusters.yml')

emr_config = EmrConfig.load(STAGE=app_properties['STAGE'], 
**app_properties['EMR'])
hive_config = HiveConfig.load(STAGE=app_properties['STAGE'], 
**app_properties['HIVE'])
emr_cluster_templates = read_cluster_templates(emr_cluster_properties)

# === /configs

# TODO: force execution_date = sunday?
run_for_date = datetime(2016, 8, 8)

emr_service = EmrService()
emr_service_wrapper = EmrServiceWrapper(emr_service=emr_service,
emr_config=emr_config, 
cluster_templates=emr_cluster_templates)
hive_step_builder = HiveStepBuilder(hive_config=hive_config)
hive_params = CommonHiveParams(app_properties_hive=app_properties['HIVE'])

args = {'owner': 'airflow',
'depends_on_past': False,
'start_date': run_for_date,
'email': [alert_email],
'email_on_failure': True,
'email_on_retry': False,
'retries': 1,
'trigger_rule' : 'all_success',
'emr_service_wrapper': emr_service_wrapper,
'hive_step_builder': hive_step_builder}
user_defined_macros = {'hive_partitions': HivePartitions,
   'ts_add': ts_add}
params = {'stage': app_properties['STAGE']}

dag = DAG(dag_id='weekly_no_track', default_args=args, 
user_defined_macros=user_defined_macros, params=params,
  schedule_interval=timedelta(days=7),
  max_active_runs=1)


# === task definitions
task_definitions = {
'wait-for-dailies': {
'operator_type': 'dummy_operator', # hub for custom defined dependencies
'operator_args': {},
'depends_on': []
},
'weekly-no-track': {
'operator_type': 'emr_hive_operator',
'operator_args': {
'hive_step': {
'script': 'weekly-no-track-airflow',  # temporary modified 
script with separate output path
'cluster_name': 'geoprofile',
'script_vars': merge_dicts(hive_params.default_params(), 
hive_params.rundate_params(), {
'PARTITIONS': '{{hive_partitions.by_day(ts_add(ts, 
days=-6), ts_add(ts, days=1))}}',
}),
}
},
'depends_on': ['wait-for-dailies']
}
}
# === /task definitions

operator_builders = {'emr_hive_operator': emr_hive_operator,
 'time_delta_sensor': TimeDeltaSensor,
 'dummy_operator': DummyOperator}
add_tasks(task_definitions, dag=dag, operator_builders=operator_builders)


# === custom tasks

downstream_task = dag.get_task('wait-for-dailies')
for weekday in [MO, TU, WE, TH, FR, SA, SU]:
task_id = 'wait-for-daily-{day}'.format(day=weekday)

# weekday(-1) subtracts 1 relative week from the given weekday, however if 
the calculated date is already Monday,
# for example, -1