[jira] [Commented] (AIRFLOW-2272) Travis CI is failing builds due to oracle-java8-installer failing to install via apt-get
[ https://issues.apache.org/jira/browse/AIRFLOW-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422974#comment-16422974 ] David Klosowski commented on AIRFLOW-2272: -- [~jinhyukch...@gmail.com] looks like there is an issue already open: https://github.com/travis-ci/travis-ci/issues/9419 > Travis CI is failing builds due to oracle-java8-installer failing to install > via apt-get > > > Key: AIRFLOW-2272 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2272 > Project: Apache Airflow > Issue Type: Bug > Components: ci >Reporter: David Klosowski >Priority: Critical > Labels: build-failure > > All the PR builds in TravisCI are failing with the following apt-get error: > > {code:java} > --2018-03-30 17:56:23-- (try: 5) > http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz > Connecting to download.oracle.com (download.oracle.com)|23.45.144.164|:80... > failed: Connection timed out. > Giving up. > apt-get install failed > $ cat ~/apt-get-update.log > Ign:1 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty InRelease > Hit:2 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-updates > InRelease > Hit:3 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-backports > InRelease > Hit:4 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty Release > Ign:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease > Ign:6 http://dl.google.com/linux/chrome/deb stable InRelease > Hit:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release > Hit:8 http://security.ubuntu.com/ubuntu trusty-security InRelease > Ign:9 http://toolbelt.heroku.com/ubuntu ./ InRelease > Get:10 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B] > Hit:11 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty InRelease > Hit:13 https://download.docker.com/linux/ubuntu trusty InRelease > Hit:12 http://toolbelt.heroku.com/ubuntu ./ Release > Hit:15 http://dl.google.com/linux/chrome/deb stable Release > Hit:16 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease > Hit:18 https://dl.hhvm.com/ubuntu trusty InRelease > Ign:19 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease > Hit:22 https://packagecloud.io/computology/apt-backport/ubuntu trusty > InRelease > Hit:23 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease > Hit:24 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty > InRelease > Hit:25 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease > Hit:26 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease > Hit:27 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease > Hit:28 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease > Hit:29 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty Release > Fetched 3,168 B in 2s (1,102 B/s) > Reading package lists... > W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: > Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest > algorithm (SHA1) > > The command "sudo -E apt-get -yq --no-install-suggests > --no-install-recommends --force-yes install slapd ldap-utils openssh-server > mysql-server-5.6 mysql-client-core-5.6 mysql-client-5.6 krb5-user krb5-kdc > krb5-admin-server oracle-java8-installer python-selinux" failed and exited > with 100 during .{code} > > It looks like this is due to the configuration in the .travis.yml installing > {{oracle-java8-installer}}: > {code} > apt: > packages: > - slapd > - ldap-utils > - openssh-server > - mysql-server-5.6 > - mysql-client-core-5.6 > - mysql-client-5.6 > - krb5-user > - krb5-kdc > - krb5-admin-server > - oracle-java8-installer > - python-selinux > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2272) Travis CI is failing builds due to oracle-java8-installer failing to install via apt-get
[ https://issues.apache.org/jira/browse/AIRFLOW-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422970#comment-16422970 ] David Klosowski commented on AIRFLOW-2272: -- OK, everything succeeded besides flake8. Wondering if this is still an issue or if jdk9 works and not jdk8. > Travis CI is failing builds due to oracle-java8-installer failing to install > via apt-get > > > Key: AIRFLOW-2272 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2272 > Project: Apache Airflow > Issue Type: Bug > Components: ci >Reporter: David Klosowski >Priority: Critical > Labels: build-failure > > All the PR builds in TravisCI are failing with the following apt-get error: > > {code:java} > --2018-03-30 17:56:23-- (try: 5) > http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz > Connecting to download.oracle.com (download.oracle.com)|23.45.144.164|:80... > failed: Connection timed out. > Giving up. > apt-get install failed > $ cat ~/apt-get-update.log > Ign:1 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty InRelease > Hit:2 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-updates > InRelease > Hit:3 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-backports > InRelease > Hit:4 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty Release > Ign:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease > Ign:6 http://dl.google.com/linux/chrome/deb stable InRelease > Hit:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release > Hit:8 http://security.ubuntu.com/ubuntu trusty-security InRelease > Ign:9 http://toolbelt.heroku.com/ubuntu ./ InRelease > Get:10 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B] > Hit:11 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty InRelease > Hit:13 https://download.docker.com/linux/ubuntu trusty InRelease > Hit:12 http://toolbelt.heroku.com/ubuntu ./ Release > Hit:15 http://dl.google.com/linux/chrome/deb stable Release > Hit:16 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease > Hit:18 https://dl.hhvm.com/ubuntu trusty InRelease > Ign:19 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease > Hit:22 https://packagecloud.io/computology/apt-backport/ubuntu trusty > InRelease > Hit:23 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease > Hit:24 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty > InRelease > Hit:25 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease > Hit:26 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease > Hit:27 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease > Hit:28 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease > Hit:29 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty Release > Fetched 3,168 B in 2s (1,102 B/s) > Reading package lists... > W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: > Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest > algorithm (SHA1) > > The command "sudo -E apt-get -yq --no-install-suggests > --no-install-recommends --force-yes install slapd ldap-utils openssh-server > mysql-server-5.6 mysql-client-core-5.6 mysql-client-5.6 krb5-user krb5-kdc > krb5-admin-server oracle-java8-installer python-selinux" failed and exited > with 100 during .{code} > > It looks like this is due to the configuration in the .travis.yml installing > {{oracle-java8-installer}}: > {code} > apt: > packages: > - slapd > - ldap-utils > - openssh-server > - mysql-server-5.6 > - mysql-client-core-5.6 > - mysql-client-5.6 > - krb5-user > - krb5-kdc > - krb5-admin-server > - oracle-java8-installer > - python-selinux > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2272) Travis CI is failing builds due to oracle-java8-installer failing to install via apt-get
[ https://issues.apache.org/jira/browse/AIRFLOW-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422865#comment-16422865 ] David Klosowski commented on AIRFLOW-2272: -- Trying in this PR: https://github.com/apache/incubator-airflow/pull/3182 > Travis CI is failing builds due to oracle-java8-installer failing to install > via apt-get > > > Key: AIRFLOW-2272 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2272 > Project: Apache Airflow > Issue Type: Bug > Components: ci >Reporter: David Klosowski >Priority: Critical > Labels: build-failure > > All the PR builds in TravisCI are failing with the following apt-get error: > > {code:java} > --2018-03-30 17:56:23-- (try: 5) > http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz > Connecting to download.oracle.com (download.oracle.com)|23.45.144.164|:80... > failed: Connection timed out. > Giving up. > apt-get install failed > $ cat ~/apt-get-update.log > Ign:1 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty InRelease > Hit:2 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-updates > InRelease > Hit:3 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-backports > InRelease > Hit:4 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty Release > Ign:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease > Ign:6 http://dl.google.com/linux/chrome/deb stable InRelease > Hit:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release > Hit:8 http://security.ubuntu.com/ubuntu trusty-security InRelease > Ign:9 http://toolbelt.heroku.com/ubuntu ./ InRelease > Get:10 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B] > Hit:11 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty InRelease > Hit:13 https://download.docker.com/linux/ubuntu trusty InRelease > Hit:12 http://toolbelt.heroku.com/ubuntu ./ Release > Hit:15 http://dl.google.com/linux/chrome/deb stable Release > Hit:16 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease > Hit:18 https://dl.hhvm.com/ubuntu trusty InRelease > Ign:19 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease > Hit:22 https://packagecloud.io/computology/apt-backport/ubuntu trusty > InRelease > Hit:23 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease > Hit:24 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty > InRelease > Hit:25 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease > Hit:26 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease > Hit:27 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease > Hit:28 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease > Hit:29 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty Release > Fetched 3,168 B in 2s (1,102 B/s) > Reading package lists... > W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: > Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest > algorithm (SHA1) > > The command "sudo -E apt-get -yq --no-install-suggests > --no-install-recommends --force-yes install slapd ldap-utils openssh-server > mysql-server-5.6 mysql-client-core-5.6 mysql-client-5.6 krb5-user krb5-kdc > krb5-admin-server oracle-java8-installer python-selinux" failed and exited > with 100 during .{code} > > It looks like this is due to the configuration in the .travis.yml installing > {{oracle-java8-installer}}: > {code} > apt: > packages: > - slapd > - ldap-utils > - openssh-server > - mysql-server-5.6 > - mysql-client-core-5.6 > - mysql-client-5.6 > - krb5-user > - krb5-kdc > - krb5-admin-server > - oracle-java8-installer > - python-selinux > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2272) Travis CI is failing builds due to oracle-java8-installer failing to install via apt-get
[ https://issues.apache.org/jira/browse/AIRFLOW-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422853#comment-16422853 ] David Klosowski commented on AIRFLOW-2272: -- This other project seemed to state that the repo was unreliable. [https://github.com/rakudo/rakudo/commit/c6d18b073e941d0d5a402024a9f9f3cded0f880c|https://github.com/rakudo/rakudo/issues/1646] I'm going to try the oracle-java9-installer > Travis CI is failing builds due to oracle-java8-installer failing to install > via apt-get > > > Key: AIRFLOW-2272 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2272 > Project: Apache Airflow > Issue Type: Bug > Components: ci >Reporter: David Klosowski >Priority: Critical > Labels: build-failure > > All the PR builds in TravisCI are failing with the following apt-get error: > > {code:java} > --2018-03-30 17:56:23-- (try: 5) > http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz > Connecting to download.oracle.com (download.oracle.com)|23.45.144.164|:80... > failed: Connection timed out. > Giving up. > apt-get install failed > $ cat ~/apt-get-update.log > Ign:1 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty InRelease > Hit:2 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-updates > InRelease > Hit:3 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-backports > InRelease > Hit:4 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty Release > Ign:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease > Ign:6 http://dl.google.com/linux/chrome/deb stable InRelease > Hit:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release > Hit:8 http://security.ubuntu.com/ubuntu trusty-security InRelease > Ign:9 http://toolbelt.heroku.com/ubuntu ./ InRelease > Get:10 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B] > Hit:11 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty InRelease > Hit:13 https://download.docker.com/linux/ubuntu trusty InRelease > Hit:12 http://toolbelt.heroku.com/ubuntu ./ Release > Hit:15 http://dl.google.com/linux/chrome/deb stable Release > Hit:16 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease > Hit:18 https://dl.hhvm.com/ubuntu trusty InRelease > Ign:19 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease > Hit:22 https://packagecloud.io/computology/apt-backport/ubuntu trusty > InRelease > Hit:23 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease > Hit:24 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty > InRelease > Hit:25 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease > Hit:26 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease > Hit:27 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease > Hit:28 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease > Hit:29 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty Release > Fetched 3,168 B in 2s (1,102 B/s) > Reading package lists... > W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: > Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest > algorithm (SHA1) > > The command "sudo -E apt-get -yq --no-install-suggests > --no-install-recommends --force-yes install slapd ldap-utils openssh-server > mysql-server-5.6 mysql-client-core-5.6 mysql-client-5.6 krb5-user krb5-kdc > krb5-admin-server oracle-java8-installer python-selinux" failed and exited > with 100 during .{code} > > It looks like this is due to the configuration in the .travis.yml installing > {{oracle-java8-installer}}: > {code} > apt: > packages: > - slapd > - ldap-utils > - openssh-server > - mysql-server-5.6 > - mysql-client-core-5.6 > - mysql-client-5.6 > - krb5-user > - krb5-kdc > - krb5-admin-server > - oracle-java8-installer > - python-selinux > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2272) Travis CI is failing builds due to oracle-java8-installer failing to install via apt-get
[ https://issues.apache.org/jira/browse/AIRFLOW-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422841#comment-16422841 ] David Klosowski commented on AIRFLOW-2272: -- That's a good question, I don't have experience here either. I see references to an earlier issue on this in 2016: [https://github.com/travis-ci/travis-ci/issues/6848] It appears to have been an oracle issue. > Travis CI is failing builds due to oracle-java8-installer failing to install > via apt-get > > > Key: AIRFLOW-2272 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2272 > Project: Apache Airflow > Issue Type: Bug > Components: ci >Reporter: David Klosowski >Priority: Critical > Labels: build-failure > > All the PR builds in TravisCI are failing with the following apt-get error: > > {code:java} > --2018-03-30 17:56:23-- (try: 5) > http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz > Connecting to download.oracle.com (download.oracle.com)|23.45.144.164|:80... > failed: Connection timed out. > Giving up. > apt-get install failed > $ cat ~/apt-get-update.log > Ign:1 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty InRelease > Hit:2 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-updates > InRelease > Hit:3 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-backports > InRelease > Hit:4 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty Release > Ign:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease > Ign:6 http://dl.google.com/linux/chrome/deb stable InRelease > Hit:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release > Hit:8 http://security.ubuntu.com/ubuntu trusty-security InRelease > Ign:9 http://toolbelt.heroku.com/ubuntu ./ InRelease > Get:10 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B] > Hit:11 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty InRelease > Hit:13 https://download.docker.com/linux/ubuntu trusty InRelease > Hit:12 http://toolbelt.heroku.com/ubuntu ./ Release > Hit:15 http://dl.google.com/linux/chrome/deb stable Release > Hit:16 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease > Hit:18 https://dl.hhvm.com/ubuntu trusty InRelease > Ign:19 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease > Hit:22 https://packagecloud.io/computology/apt-backport/ubuntu trusty > InRelease > Hit:23 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease > Hit:24 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty > InRelease > Hit:25 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease > Hit:26 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease > Hit:27 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease > Hit:28 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease > Hit:29 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty Release > Fetched 3,168 B in 2s (1,102 B/s) > Reading package lists... > W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: > Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest > algorithm (SHA1) > > The command "sudo -E apt-get -yq --no-install-suggests > --no-install-recommends --force-yes install slapd ldap-utils openssh-server > mysql-server-5.6 mysql-client-core-5.6 mysql-client-5.6 krb5-user krb5-kdc > krb5-admin-server oracle-java8-installer python-selinux" failed and exited > with 100 during .{code} > > It looks like this is due to the configuration in the .travis.yml installing > {{oracle-java8-installer}}: > {code} > apt: > packages: > - slapd > - ldap-utils > - openssh-server > - mysql-server-5.6 > - mysql-client-core-5.6 > - mysql-client-5.6 > - krb5-user > - krb5-kdc > - krb5-admin-server > - oracle-java8-installer > - python-selinux > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2272) Travis CI is failing builds due to oraclejdk8 failing to install
David Klosowski created AIRFLOW-2272: Summary: Travis CI is failing builds due to oraclejdk8 failing to install Key: AIRFLOW-2272 URL: https://issues.apache.org/jira/browse/AIRFLOW-2272 Project: Apache Airflow Issue Type: Bug Components: ci Reporter: David Klosowski All the PR builds in TravisCI are failing with the following apt-get error: {code:java} --2018-03-30 17:56:23-- (try: 5) http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz Connecting to download.oracle.com (download.oracle.com)|23.45.144.164|:80... failed: Connection timed out. Giving up. apt-get install failed $ cat ~/apt-get-update.log Ign:1 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty InRelease Hit:2 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-updates InRelease Hit:3 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-backports InRelease Hit:4 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty Release Ign:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease Ign:6 http://dl.google.com/linux/chrome/deb stable InRelease Hit:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release Hit:8 http://security.ubuntu.com/ubuntu trusty-security InRelease Ign:9 http://toolbelt.heroku.com/ubuntu ./ InRelease Get:10 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B] Hit:11 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty InRelease Hit:13 https://download.docker.com/linux/ubuntu trusty InRelease Hit:12 http://toolbelt.heroku.com/ubuntu ./ Release Hit:15 http://dl.google.com/linux/chrome/deb stable Release Hit:16 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease Hit:18 https://dl.hhvm.com/ubuntu trusty InRelease Ign:19 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease Hit:22 https://packagecloud.io/computology/apt-backport/ubuntu trusty InRelease Hit:23 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease Hit:24 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty InRelease Hit:25 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease Hit:26 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease Hit:27 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease Hit:28 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease Hit:29 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty Release Fetched 3,168 B in 2s (1,102 B/s) Reading package lists... W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest algorithm (SHA1) The command "sudo -E apt-get -yq --no-install-suggests --no-install-recommends --force-yes install slapd ldap-utils openssh-server mysql-server-5.6 mysql-client-core-5.6 mysql-client-5.6 krb5-user krb5-kdc krb5-admin-server oracle-java8-installer python-selinux" failed and exited with 100 during .{code} It looks like this is due to the configuration in the .travis.yml installing {{oracle-java8-installer}}: {code} apt: packages: - slapd - ldap-utils - openssh-server - mysql-server-5.6 - mysql-client-core-5.6 - mysql-client-5.6 - krb5-user - krb5-kdc - krb5-admin-server - oracle-java8-installer - python-selinux {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2272) Travis CI is failing builds due to oracle-java8-installer failing to install via apt-get
[ https://issues.apache.org/jira/browse/AIRFLOW-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Klosowski updated AIRFLOW-2272: - Summary: Travis CI is failing builds due to oracle-java8-installer failing to install via apt-get (was: Travis CI is failing builds due to oraclejdk8 failing to install) > Travis CI is failing builds due to oracle-java8-installer failing to install > via apt-get > > > Key: AIRFLOW-2272 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2272 > Project: Apache Airflow > Issue Type: Bug > Components: ci >Reporter: David Klosowski >Priority: Critical > Labels: build-failure > > All the PR builds in TravisCI are failing with the following apt-get error: > > {code:java} > --2018-03-30 17:56:23-- (try: 5) > http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz > Connecting to download.oracle.com (download.oracle.com)|23.45.144.164|:80... > failed: Connection timed out. > Giving up. > apt-get install failed > $ cat ~/apt-get-update.log > Ign:1 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty InRelease > Hit:2 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-updates > InRelease > Hit:3 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-backports > InRelease > Hit:4 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty Release > Ign:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease > Ign:6 http://dl.google.com/linux/chrome/deb stable InRelease > Hit:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release > Hit:8 http://security.ubuntu.com/ubuntu trusty-security InRelease > Ign:9 http://toolbelt.heroku.com/ubuntu ./ InRelease > Get:10 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B] > Hit:11 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty InRelease > Hit:13 https://download.docker.com/linux/ubuntu trusty InRelease > Hit:12 http://toolbelt.heroku.com/ubuntu ./ Release > Hit:15 http://dl.google.com/linux/chrome/deb stable Release > Hit:16 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease > Hit:18 https://dl.hhvm.com/ubuntu trusty InRelease > Ign:19 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease > Hit:22 https://packagecloud.io/computology/apt-backport/ubuntu trusty > InRelease > Hit:23 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease > Hit:24 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty > InRelease > Hit:25 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease > Hit:26 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease > Hit:27 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease > Hit:28 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease > Hit:29 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty Release > Fetched 3,168 B in 2s (1,102 B/s) > Reading package lists... > W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: > Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest > algorithm (SHA1) > > The command "sudo -E apt-get -yq --no-install-suggests > --no-install-recommends --force-yes install slapd ldap-utils openssh-server > mysql-server-5.6 mysql-client-core-5.6 mysql-client-5.6 krb5-user krb5-kdc > krb5-admin-server oracle-java8-installer python-selinux" failed and exited > with 100 during .{code} > > It looks like this is due to the configuration in the .travis.yml installing > {{oracle-java8-installer}}: > {code} > apt: > packages: > - slapd > - ldap-utils > - openssh-server > - mysql-server-5.6 > - mysql-client-core-5.6 > - mysql-client-5.6 > - krb5-user > - krb5-kdc > - krb5-admin-server > - oracle-java8-installer > - python-selinux > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2178) Scheduler can't get past SLA check if SMTP settings are incorrect
[ https://issues.apache.org/jira/browse/AIRFLOW-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Klosowski reassigned AIRFLOW-2178: Assignee: David Klosowski > Scheduler can't get past SLA check if SMTP settings are incorrect > - > > Key: AIRFLOW-2178 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2178 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.9.0 > Environment: 16.04 >Reporter: James Meickle >Assignee: David Klosowski >Priority: Major > Attachments: log.txt > > > After testing Airflow for a while in staging, I provisioned our prod cluster > and enabled the first DAG on it. The "backfill" for this DAG performed just > fine, so I assumed everything was working and left it over the weekend. > However, when the last "backfill" period completed and the scheduler > transitioned to the most recent execution date, it began failing in the > `manage_slas` method. Due to a configuration difference, SMTP was timing out > in production, preventing the SLA check from ever completing; this both > blocked SLA notifications, as well as prevented further tasks in this DAG > from ever getting scheduled. > As an operator, I would expect AIrflow to treat scheduling tasks as a > higher-priority concern, and to do so even f the SLA feature fails to work. I > would also expect Airflow to notify me in the web UI that email sending is > not currently working. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2236) Airflow SLA is triggered for all backfilled tasks
[ https://issues.apache.org/jira/browse/AIRFLOW-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Klosowski reassigned AIRFLOW-2236: Assignee: (was: David Klosowski) > Airflow SLA is triggered for all backfilled tasks > - > > Key: AIRFLOW-2236 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2236 > Project: Apache Airflow > Issue Type: Bug > Components: backfill >Affects Versions: Airflow 1.8, 1.8.1, 1.8.0, 1.9.0, 1.8.2 >Reporter: barak schoster >Priority: Major > > While executing a task with a historical execution date and a schedule > interval of 1 hour, and SLA of 1 hours - all backfill instances appear as > sla_miss - though the duration of each task was not exceeded. > > I think that this is a result of comparing the tasks end time to utc.now at > https://github.com/apache/incubator-airflow/blob/7cc6d8a5645b8974f07e132cc3c4820e880fd3ce/airflow/jobs.py#L624 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2236) Airflow SLA is triggered for all backfilled tasks
[ https://issues.apache.org/jira/browse/AIRFLOW-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416526#comment-16416526 ] David Klosowski commented on AIRFLOW-2236: -- Actually, this is not so straightforward. SLA's would have to change significantly in order to support this. SLAs are governed in a way that is disconnected from the actual task instances. > Airflow SLA is triggered for all backfilled tasks > - > > Key: AIRFLOW-2236 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2236 > Project: Apache Airflow > Issue Type: Bug > Components: backfill >Affects Versions: Airflow 1.8, 1.8.1, 1.8.0, 1.9.0, 1.8.2 >Reporter: barak schoster >Assignee: David Klosowski >Priority: Major > > While executing a task with a historical execution date and a schedule > interval of 1 hour, and SLA of 1 hours - all backfill instances appear as > sla_miss - though the duration of each task was not exceeded. > > I think that this is a result of comparing the tasks end time to utc.now at > https://github.com/apache/incubator-airflow/blob/7cc6d8a5645b8974f07e132cc3c4820e880fd3ce/airflow/jobs.py#L624 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2236) Airflow SLA is triggered for all backfilled tasks
[ https://issues.apache.org/jira/browse/AIRFLOW-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416267#comment-16416267 ] David Klosowski commented on AIRFLOW-2236: -- An idea I had was to only apply the SLA to the most recent task run. I think that's do-able and seems reasonable. > Airflow SLA is triggered for all backfilled tasks > - > > Key: AIRFLOW-2236 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2236 > Project: Apache Airflow > Issue Type: Bug > Components: backfill >Affects Versions: Airflow 1.8, 1.8.1, 1.8.0, 1.9.0, 1.8.2 >Reporter: barak schoster >Assignee: David Klosowski >Priority: Major > > While executing a task with a historical execution date and a schedule > interval of 1 hour, and SLA of 1 hours - all backfill instances appear as > sla_miss - though the duration of each task was not exceeded. > > I think that this is a result of comparing the tasks end time to utc.now at > https://github.com/apache/incubator-airflow/blob/7cc6d8a5645b8974f07e132cc3c4820e880fd3ce/airflow/jobs.py#L624 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2236) Airflow SLA is triggered for all backfilled tasks
[ https://issues.apache.org/jira/browse/AIRFLOW-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415045#comment-16415045 ] David Klosowski commented on AIRFLOW-2236: -- I agree, this is an issue that can be resolved. The big question is if to handle the backfill case akin to the non-backfill (current) case or if treating them separately makes sense. The feeling is that this should be SLA_MISS = if (task_end_datetime - task_start_datetime) > sla_interval. I'd say that the non-backfill case has an argument for this to be based on execution_date + schedule_interval (intended start_date, so just substitute for task_start_date). I'll ponder this more but not a fan of logical bifurcation as it creates more complexity and potential confusion. > Airflow SLA is triggered for all backfilled tasks > - > > Key: AIRFLOW-2236 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2236 > Project: Apache Airflow > Issue Type: Bug > Components: backfill >Affects Versions: Airflow 1.8, 1.8.1, 1.8.0, 1.9.0, 1.8.2 >Reporter: barak schoster >Assignee: David Klosowski >Priority: Major > > While executing a task with a historical execution date and a schedule > interval of 1 hour, and SLA of 1 hours - all backfill instances appear as > sla_miss - though the duration of each task was not exceeded. > > I think that this is a result of comparing the tasks end time to utc.now at > https://github.com/apache/incubator-airflow/blob/7cc6d8a5645b8974f07e132cc3c4820e880fd3ce/airflow/jobs.py#L624 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2236) Airflow SLA is triggered for all backfilled tasks
[ https://issues.apache.org/jira/browse/AIRFLOW-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Klosowski reassigned AIRFLOW-2236: Assignee: David Klosowski > Airflow SLA is triggered for all backfilled tasks > - > > Key: AIRFLOW-2236 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2236 > Project: Apache Airflow > Issue Type: Bug > Components: backfill >Affects Versions: Airflow 1.8, 1.8.1, 1.8.0, 1.9.0, 1.8.2 >Reporter: barak schoster >Assignee: David Klosowski >Priority: Major > > While executing a task with a historical execution date and a schedule > interval of 1 hour, and SLA of 1 hours - all backfill instances appear as > sla_miss - though the duration of each task was not exceeded. > > I think that this is a result of comparing the tasks end time to utc.now at > https://github.com/apache/incubator-airflow/blob/7cc6d8a5645b8974f07e132cc3c4820e880fd3ce/airflow/jobs.py#L624 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2251) Add Thinknear as an Airflow user
[ https://issues.apache.org/jira/browse/AIRFLOW-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Klosowski resolved AIRFLOW-2251. -- Resolution: Done > Add Thinknear as an Airflow user > > > Key: AIRFLOW-2251 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2251 > Project: Apache Airflow > Issue Type: Improvement > Components: project-management >Reporter: David Klosowski >Priority: Minor > > Add OfferUp as an Airflow user -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2251) Add Thinknear as an Airflow user
[ https://issues.apache.org/jira/browse/AIRFLOW-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Klosowski reassigned AIRFLOW-2251: Assignee: David Klosowski > Add Thinknear as an Airflow user > > > Key: AIRFLOW-2251 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2251 > Project: Apache Airflow > Issue Type: Improvement > Components: project-management >Reporter: David Klosowski >Assignee: David Klosowski >Priority: Minor > > Add OfferUp as an Airflow user -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2251) Add Thinknear as an Airflow user
[ https://issues.apache.org/jira/browse/AIRFLOW-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Klosowski reassigned AIRFLOW-2251: Assignee: (was: Jakob Homan) > Add Thinknear as an Airflow user > > > Key: AIRFLOW-2251 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2251 > Project: Apache Airflow > Issue Type: Improvement > Components: project-management >Reporter: David Klosowski >Priority: Minor > > Add OfferUp as an Airflow user -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2251) Add Thinknear as an Airflow user
[ https://issues.apache.org/jira/browse/AIRFLOW-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Klosowski updated AIRFLOW-2251: - External issue URL: https://github.com/apache/incubator-airflow/pull/3155 (was: https://github.com/apache/incubator-airflow/pull/1814) > Add Thinknear as an Airflow user > > > Key: AIRFLOW-2251 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2251 > Project: Apache Airflow > Issue Type: Improvement > Components: project-management >Reporter: David Klosowski >Priority: Minor > > Add OfferUp as an Airflow user -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2251) Add Thinknear as an Airflow user
David Klosowski created AIRFLOW-2251: Summary: Add Thinknear as an Airflow user Key: AIRFLOW-2251 URL: https://issues.apache.org/jira/browse/AIRFLOW-2251 Project: Apache Airflow Issue Type: Improvement Components: project-management Reporter: David Klosowski Assignee: Jakob Homan Add OfferUp as an Airflow user -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1157) Assigning a task to a pool that doesn't exist crashes the scheduler
[ https://issues.apache.org/jira/browse/AIRFLOW-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Klosowski reassigned AIRFLOW-1157: Assignee: Fokko Driesprong (was: David Klosowski) > Assigning a task to a pool that doesn't exist crashes the scheduler > --- > > Key: AIRFLOW-1157 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1157 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.8 >Reporter: John Culver >Assignee: Fokko Driesprong >Priority: Critical > Fix For: 2.0.0 > > > If a dag is run that contains a task using a pool that doesn't exist, the > scheduler will crash. > Manually triggering the run of this dag on an environment without a pool > named 'a_non_existent_pool' will crash the scheduler: > {code} > from datetime import datetime > from airflow.models import DAG > from airflow.operators.dummy_operator import DummyOperator > dag = DAG(dag_id='crash_scheduler', > start_date=datetime(2017,1,1), > schedule_interval=None) > t1 = DummyOperator(task_id='crash', >pool='a_non_existent_pool', >dag=dag) > {code} > Here is the relevant log output on the scheduler: > {noformat} > [2017-04-27 19:31:24,816] {dag_processing.py:559} INFO - Processor for > /opt/airflow/dags/test-3.py finished > [2017-04-27 19:31:24,817] {dag_processing.py:559} INFO - Processor for > /opt/airflow/dags/test_s3_file_move.py finished > [2017-04-27 19:31:24,819] {dag_processing.py:627} INFO - Started a process > (PID: 124) to generate tasks for /opt/airflow/dags/crash_scheduler.py - > logging into /tmp/airflow/scheduler/logs/2017-04-27/crash_scheduler.py.log > [2017-04-27 19:31:24,822] {dag_processing.py:627} INFO - Started a process > (PID: 125) to generate tasks for /opt/airflow/dags/configuration/constants.py > - logging into > /tmp/airflow/scheduler/logs/2017-04-27/configuration/constants.py.log > [2017-04-27 19:31:24,847] {jobs.py:1007} INFO - Tasks up for execution: > 19:31:22.298893 [scheduled]> > [2017-04-27 19:31:24,849] {jobs.py:1030} INFO - Figuring out tasks to run in > Pool(name=None) with 128 open slots and 1 task instances in queue > [2017-04-27 19:31:24,856] {jobs.py:1078} INFO - DAG move_s3_file_test has > 0/16 running tasks > [2017-04-27 19:31:24,856] {jobs.py:1105} INFO - Sending to executor > (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, > 22, 298893)) with priority 1 and queue MVSANDBOX-airflow-DEV-dev > [2017-04-27 19:31:24,859] {jobs.py:1116} INFO - Setting state of > (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, > 22, 298893)) to queued > [2017-04-27 19:31:24,867] {base_executor.py:50} INFO - Adding to queue: > airflow run move_s3_file_test move_files 2017-04-27T19:31:22.298893 --local > -sd /opt/airflow/dags/test_s3_file_move.py > [2017-04-27 19:31:24,867] {jobs.py:1440} INFO - Heartbeating the executor > [2017-04-27 19:31:24,872] {celery_executor.py:78} INFO - [celery] queuing > (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, > 22, 298893)) through celery, queue=MVSANDBOX-airflow-DEV-dev > [2017-04-27 19:31:25,974] {jobs.py:1404} INFO - Heartbeating the process > manager > [2017-04-27 19:31:25,975] {dag_processing.py:559} INFO - Processor for > /opt/airflow/dags/crash_scheduler.py finished > [2017-04-27 19:31:25,975] {dag_processing.py:559} INFO - Processor for > /opt/airflow/dags/configuration/constants.py finished > [2017-04-27 19:31:25,977] {dag_processing.py:627} INFO - Started a process > (PID: 128) to generate tasks for /opt/airflow/dags/example_s3_sensor.py - > logging into /tmp/airflow/scheduler/logs/2017-04-27/example_s3_sensor.py.log > [2017-04-27 19:31:25,980] {dag_processing.py:627} INFO - Started a process > (PID: 129) to generate tasks for /opt/airflow/dags/test-4.py - logging into > /tmp/airflow/scheduler/logs/2017-04-27/test-4.py.log > [2017-04-27 19:31:26,004] {jobs.py:1007} INFO - Tasks up for execution: > [scheduled]> > [2017-04-27 19:31:26,006] {jobs.py:1311} INFO - Exited execute loop > [2017-04-27 19:31:26,008] {jobs.py:1325} INFO - Terminating child PID: 128 > [2017-04-27 19:31:26,008] {jobs.py:1325} INFO - Terminating child PID: 129 > [2017-04-27 19:31:26,008] {jobs.py:1329} INFO - Waiting up to 5s for > processes to exit... > Traceback (most recent call last): > File "/usr/bin/airflow", line 28, in > args.func(args) > File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 839, in > scheduler > job.run() > File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 200, in run > self._execute() > File
[jira] [Assigned] (AIRFLOW-1157) Assigning a task to a pool that doesn't exist crashes the scheduler
[ https://issues.apache.org/jira/browse/AIRFLOW-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Klosowski reassigned AIRFLOW-1157: Assignee: David Klosowski > Assigning a task to a pool that doesn't exist crashes the scheduler > --- > > Key: AIRFLOW-1157 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1157 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.8 >Reporter: John Culver >Assignee: David Klosowski >Priority: Critical > > If a dag is run that contains a task using a pool that doesn't exist, the > scheduler will crash. > Manually triggering the run of this dag on an environment without a pool > named 'a_non_existent_pool' will crash the scheduler: > {code} > from datetime import datetime > from airflow.models import DAG > from airflow.operators.dummy_operator import DummyOperator > dag = DAG(dag_id='crash_scheduler', > start_date=datetime(2017,1,1), > schedule_interval=None) > t1 = DummyOperator(task_id='crash', >pool='a_non_existent_pool', >dag=dag) > {code} > Here is the relevant log output on the scheduler: > {noformat} > [2017-04-27 19:31:24,816] {dag_processing.py:559} INFO - Processor for > /opt/airflow/dags/test-3.py finished > [2017-04-27 19:31:24,817] {dag_processing.py:559} INFO - Processor for > /opt/airflow/dags/test_s3_file_move.py finished > [2017-04-27 19:31:24,819] {dag_processing.py:627} INFO - Started a process > (PID: 124) to generate tasks for /opt/airflow/dags/crash_scheduler.py - > logging into /tmp/airflow/scheduler/logs/2017-04-27/crash_scheduler.py.log > [2017-04-27 19:31:24,822] {dag_processing.py:627} INFO - Started a process > (PID: 125) to generate tasks for /opt/airflow/dags/configuration/constants.py > - logging into > /tmp/airflow/scheduler/logs/2017-04-27/configuration/constants.py.log > [2017-04-27 19:31:24,847] {jobs.py:1007} INFO - Tasks up for execution: > 19:31:22.298893 [scheduled]> > [2017-04-27 19:31:24,849] {jobs.py:1030} INFO - Figuring out tasks to run in > Pool(name=None) with 128 open slots and 1 task instances in queue > [2017-04-27 19:31:24,856] {jobs.py:1078} INFO - DAG move_s3_file_test has > 0/16 running tasks > [2017-04-27 19:31:24,856] {jobs.py:1105} INFO - Sending to executor > (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, > 22, 298893)) with priority 1 and queue MVSANDBOX-airflow-DEV-dev > [2017-04-27 19:31:24,859] {jobs.py:1116} INFO - Setting state of > (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, > 22, 298893)) to queued > [2017-04-27 19:31:24,867] {base_executor.py:50} INFO - Adding to queue: > airflow run move_s3_file_test move_files 2017-04-27T19:31:22.298893 --local > -sd /opt/airflow/dags/test_s3_file_move.py > [2017-04-27 19:31:24,867] {jobs.py:1440} INFO - Heartbeating the executor > [2017-04-27 19:31:24,872] {celery_executor.py:78} INFO - [celery] queuing > (u'move_s3_file_test', u'move_files', datetime.datetime(2017, 4, 27, 19, 31, > 22, 298893)) through celery, queue=MVSANDBOX-airflow-DEV-dev > [2017-04-27 19:31:25,974] {jobs.py:1404} INFO - Heartbeating the process > manager > [2017-04-27 19:31:25,975] {dag_processing.py:559} INFO - Processor for > /opt/airflow/dags/crash_scheduler.py finished > [2017-04-27 19:31:25,975] {dag_processing.py:559} INFO - Processor for > /opt/airflow/dags/configuration/constants.py finished > [2017-04-27 19:31:25,977] {dag_processing.py:627} INFO - Started a process > (PID: 128) to generate tasks for /opt/airflow/dags/example_s3_sensor.py - > logging into /tmp/airflow/scheduler/logs/2017-04-27/example_s3_sensor.py.log > [2017-04-27 19:31:25,980] {dag_processing.py:627} INFO - Started a process > (PID: 129) to generate tasks for /opt/airflow/dags/test-4.py - logging into > /tmp/airflow/scheduler/logs/2017-04-27/test-4.py.log > [2017-04-27 19:31:26,004] {jobs.py:1007} INFO - Tasks up for execution: > [scheduled]> > [2017-04-27 19:31:26,006] {jobs.py:1311} INFO - Exited execute loop > [2017-04-27 19:31:26,008] {jobs.py:1325} INFO - Terminating child PID: 128 > [2017-04-27 19:31:26,008] {jobs.py:1325} INFO - Terminating child PID: 129 > [2017-04-27 19:31:26,008] {jobs.py:1329} INFO - Waiting up to 5s for > processes to exit... > Traceback (most recent call last): > File "/usr/bin/airflow", line 28, in > args.func(args) > File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 839, in > scheduler > job.run() > File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 200, in run > self._execute() > File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 1309, in > _execute >
[jira] [Updated] (AIRFLOW-1767) Airflow Scheduler no longer schedules DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Klosowski updated AIRFLOW-1767: - Description: The Airflow Scheduler no longer schedules DAGs after this commit on master: https://github.com/apache/incubator-airflow/commit/73549763eac74142b7c4018422bb2f8c897b45a8 Workers never receive any tasks and the scheduler never adjusts DAG state. was: The Airflow Scheduler no longer schedules DAGs after this commit on master: https://github.com/apache/incubator-airflow/commit/73549763eac74142b7c4018422bb2f8c897b45a8 > Airflow Scheduler no longer schedules DAGs > -- > > Key: AIRFLOW-1767 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1767 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.9.0, 1.10.0 > Environment: CeleryExecutor, Docker, 3 Workers >Reporter: David Klosowski >Priority: Blocker > > The Airflow Scheduler no longer schedules DAGs after this commit on master: > https://github.com/apache/incubator-airflow/commit/73549763eac74142b7c4018422bb2f8c897b45a8 > Workers never receive any tasks and the scheduler never adjusts DAG state. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AIRFLOW-1767) Airflow Scheduler no longer schedules DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Klosowski updated AIRFLOW-1767: - Summary: Airflow Scheduler no longer schedules DAGs (was: Airflow Scheduler no longer works) > Airflow Scheduler no longer schedules DAGs > -- > > Key: AIRFLOW-1767 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1767 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.9.0, 1.10.0 > Environment: CeleryExecutor, Docker, 3 Workers >Reporter: David Klosowski >Priority: Blocker > > The Airflow Scheduler no longer schedules DAGs after this commit on master: > https://github.com/apache/incubator-airflow/commit/73549763eac74142b7c4018422bb2f8c897b45a8 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1767) Airflow Scheduler no longer works
David Klosowski created AIRFLOW-1767: Summary: Airflow Scheduler no longer works Key: AIRFLOW-1767 URL: https://issues.apache.org/jira/browse/AIRFLOW-1767 Project: Apache Airflow Issue Type: Bug Components: scheduler Affects Versions: 1.9.0, 1.10.0 Environment: CeleryExecutor, Docker, 3 Workers Reporter: David Klosowski Priority: Blocker The Airflow Scheduler no longer schedules DAGs after this commit on master: https://github.com/apache/incubator-airflow/commit/73549763eac74142b7c4018422bb2f8c897b45a8 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-288) Make system timezone configurable
[ https://issues.apache.org/jira/browse/AIRFLOW-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898590#comment-15898590 ] David Klosowski commented on AIRFLOW-288: - Thoughts on support for this? It would be nice to see this at the DAG level so individual DAGs could be triggered in a timezone aware way by the the scheduler. > Make system timezone configurable > - > > Key: AIRFLOW-288 > URL: https://issues.apache.org/jira/browse/AIRFLOW-288 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Vineet Goel >Assignee: Vineet Goel > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-392) DAG runs on strange schedule in the past when deployed
[ https://issues.apache.org/jira/browse/AIRFLOW-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408416#comment-15408416 ] David Klosowski commented on AIRFLOW-392: - So not sure what to say here. There "was" (not sure of any additional strangeness) definitely some strangeness in the scheduler and there are a good number of fixes (including the one mentioned above). Perhaps this can be closed? > DAG runs on strange schedule in the past when deployed > -- > > Key: AIRFLOW-392 > URL: https://issues.apache.org/jira/browse/AIRFLOW-392 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.3 > Environment: AWS ElasticBeanstalk as a Docker application running in > an Ubuntu-based container >Reporter: David Klosowski >Assignee: Norman Mu > > Just deployed a new DAG ('weekly-no-track') that depends on 7 DAG task runs > of another DAG ('daily-no-track'). When the DAG is deployed the scheduler > schedules and runs multiple runs in the past (yesterday it ran for 6/12/2016 > and 6/05/2016), despite the start date set to the deployment date. > It would be a bit difficult to include all the code being used in the DAG > since we have multiple libraries we've built in Python that are being > referenced here that we want to eventually open source. I've included some > of the code here. Let me know if this is all clear and what I can do to help > or if any insight can be provided as to what it is occurring and how we might > fix this. > {code} > from __future__ import division, print_function > from airflow.models import DAG > from airflow.operators import DummyOperator, ExternalTaskSensor, > TimeDeltaSensor > from tn_etl_tools.aws.emr import EmrConfig, HiveConfig, read_cluster_templates > from tn_etl_tools.aws.emr import EmrService, EmrServiceWrapper, > HiveStepBuilder > from tn_etl_tools.datesupport import ts_add > from tn_etl_tools.hive import HivePartitions > from tn_etl_tools.yaml import YamlLoader > from datetime import datetime, timedelta > from dateutil.relativedelta import relativedelta, SU, MO , TU, WE, TH, FR, SA > from common_args import merge_dicts, CommonHiveParams > from operator_builders import add_tasks, emr_hive_operator > import os > # === configs > config_dir = os.getenv('DAG_CONFIG_DIR', '/usr/local/airflow/config') > alert_email = os.getenv('AIRFLOW_TO_EMAIL') > app_properties = YamlLoader.load_yaml(config_dir + '/app.yml') > emr_cluster_properties = YamlLoader.load_yaml(config_dir + > '/emr_clusters.yml') > emr_config = EmrConfig.load(STAGE=app_properties['STAGE'], > **app_properties['EMR']) > hive_config = HiveConfig.load(STAGE=app_properties['STAGE'], > **app_properties['HIVE']) > emr_cluster_templates = read_cluster_templates(emr_cluster_properties) > # === /configs > # TODO: force execution_date = sunday? > run_for_date = datetime(2016, 8, 8) > emr_service = EmrService() > emr_service_wrapper = EmrServiceWrapper(emr_service=emr_service, > emr_config=emr_config, > cluster_templates=emr_cluster_templates) > hive_step_builder = HiveStepBuilder(hive_config=hive_config) > hive_params = CommonHiveParams(app_properties_hive=app_properties['HIVE']) > args = {'owner': 'airflow', > 'depends_on_past': False, > 'start_date': run_for_date, > 'email': [alert_email], > 'email_on_failure': True, > 'email_on_retry': False, > 'retries': 1, > 'trigger_rule' : 'all_success', > 'emr_service_wrapper': emr_service_wrapper, > 'hive_step_builder': hive_step_builder} > user_defined_macros = {'hive_partitions': HivePartitions, >'ts_add': ts_add} > params = {'stage': app_properties['STAGE']} > dag = DAG(dag_id='weekly_no_track', default_args=args, > user_defined_macros=user_defined_macros, params=params, > schedule_interval=timedelta(days=7), > max_active_runs=1) > # === task definitions > task_definitions = { > 'wait-for-dailies': { > 'operator_type': 'dummy_operator', # hub for custom defined > dependencies > 'operator_args': {}, > 'depends_on': [] > }, > 'weekly-no-track': { > 'operator_type': 'emr_hive_operator', > 'operator_args': { > 'hive_step': { > 'script': 'weekly-no-track-airflow', # temporary modified > script with separate output path > 'cluster_name': 'geoprofile', > 'script_vars': merge_dicts(hive_params.default_params(), > hive_params.rundate_params(), { > 'PARTITIONS': '{{hive_partitions.by_day(ts_add(ts, > days=-6), ts_add(ts, days=1))}}', > }), > } > }, > 'depends_on':
[jira] [Comment Edited] (AIRFLOW-392) DAG runs on strange schedule in the past when deployed
[ https://issues.apache.org/jira/browse/AIRFLOW-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408413#comment-15408413 ] David Klosowski edited comment on AIRFLOW-392 at 8/4/16 8:09 PM: - OK, we fixed the issue. I'll be quite honest, I'm not 100% sure what fixed the problem; however, the thought is inconsistency in dag_run state in the database; which was resolved by clearing the db and re-deploying the dag (changed the start_date). We also adjusted the above creation of {{ExternalTaskSensor}} by changing delta to {{delta = relativedelta(weekday=weekday(1))} from -1 so it looks forward instead of backward. The interesting thing is that behavior we were seeing is rather consistent with the existing code in version 1.7.1.3 (that we are running) in this block of {{jobs.py}}'s {{schedule_dag}} method: {code} if dag.schedule_interval == '@once' and not last_scheduled_run: next_run_date = datetime.now() elif not last_scheduled_run: # First run TI = models.TaskInstance latest_run = ( session.query(func.max(TI.execution_date)) .filter_by(dag_id=dag.dag_id) .scalar() ) if latest_run: # Migrating from previous version # make the past 5 runs active next_run_date = dag.date_range(latest_run, -5)[0] else: task_start_dates = [t.start_date for t in dag.tasks] if task_start_dates: next_run_date = min(task_start_dates) else: next_run_date = None {code} Notice the block: {code} # Migrating from previous version # make the past 5 runs active next_run_date = dag.date_range(latest_run, -5)[0] {code} We were getting dag runs in the past. My imagination is that the {{latest_run}} condition would be false if there are no {{TaskInstance}} s and thus no {{execution_date}} s for them on the first run of this (new Dag -> First DagRun); however, on subsequent invocations it would create the a {{DagRun}} from 5 prior schedule intervals. Not sure why this didn't happen again though. However, that logic has since changed in from AIRFLOW-168 (on master): {code} if not last_scheduled_run: # First run task_start_dates = [t.start_date for t in dag.tasks] if task_start_dates: next_run_date = min(task_start_dates) else: next_run_date = dag.following_schedule(last_scheduled_run) {code} was (Author: d3cay): OK, we fixed the issue. I'll be quite honest, I'm not 100% sure what fixed the problem; however, the thought is inconsistency in dag_run state in the database; which was resolved by clearing the db and re-deploying the dag (changed the start_date). We also adjusted the above creation of {{ExternalTaskSensor}} by changing delta to {{delta = relativedelta(weekday=weekday(1))} from -1 so it looks forward instead of backward. The interesting thing is that behavior we were seeing is rather consistent with the existing code in version 1.7.1.3 (that we are running) in this block of {{jobs.py}}'s {{schedule_dag}} method: {code} if dag.schedule_interval == '@once' and not last_scheduled_run: next_run_date = datetime.now() elif not last_scheduled_run: # First run TI = models.TaskInstance latest_run = ( session.query(func.max(TI.execution_date)) .filter_by(dag_id=dag.dag_id) .scalar() ) if latest_run: # Migrating from previous version # make the past 5 runs active next_run_date = dag.date_range(latest_run, -5)[0] else: task_start_dates = [t.start_date for t in dag.tasks] if task_start_dates: next_run_date = min(task_start_dates) else: next_run_date = None {code} Notice the block: {code} # Migrating from previous version # make the past 5 runs active next_run_date = dag.date_range(latest_run, -5)[0] {code} We were getting dag runs in the past. My imagination is that the {{latest_run}} condition would be false if there are no {{TaskInstance}}s and thus no {{execution_date}} s for them on the first run of this (new Dag -> First DagRun); however, on subsequent invocations it would create the a {{DagRun}} from 5 prior schedule intervals. Not sure why this didn't happen again
[jira] [Commented] (AIRFLOW-392) DAG runs on strange schedule in the past when deployed
[ https://issues.apache.org/jira/browse/AIRFLOW-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408413#comment-15408413 ] David Klosowski commented on AIRFLOW-392: - OK, we fixed the issue. I'll be quite honest, I'm not 100% sure what fixed the problem; however, the thought is inconsistency in dag_run state in the database; which was resolved by clearing the db and re-deploying the dag (changed the start_date). We also adjusted the above creation of {{ExternalTaskSensor}} by changing delta to {{delta = relativedelta(weekday=weekday(1))} from -1 so it looks forward instead of backward. The interesting thing is that behavior we were seeing is rather consistent with the existing code in version 1.7.1.3 (that we are running) in this block of {{jobs.py}}'s {{schedule_dag}} method: {code} if dag.schedule_interval == '@once' and not last_scheduled_run: next_run_date = datetime.now() elif not last_scheduled_run: # First run TI = models.TaskInstance latest_run = ( session.query(func.max(TI.execution_date)) .filter_by(dag_id=dag.dag_id) .scalar() ) if latest_run: # Migrating from previous version # make the past 5 runs active next_run_date = dag.date_range(latest_run, -5)[0] else: task_start_dates = [t.start_date for t in dag.tasks] if task_start_dates: next_run_date = min(task_start_dates) else: next_run_date = None {code} Notice the block: {code} # Migrating from previous version # make the past 5 runs active next_run_date = dag.date_range(latest_run, -5)[0] {code} We were getting dag runs in the past. My imagination is that the {{latest_run}} condition would be false if there are no {{TaskInstance}}s and thus no {{execution_date}}s for them on the first run of this (new Dag -> First DagRun); however, on subsequent invocations it would create the a {{DagRun}} from 5 prior schedule intervals. Not sure why this didn't happen again though. However, that logic has since changed in from AIRFLOW-168 (on master): {code} if not last_scheduled_run: # First run task_start_dates = [t.start_date for t in dag.tasks] if task_start_dates: next_run_date = min(task_start_dates) else: next_run_date = dag.following_schedule(last_scheduled_run) {code} > DAG runs on strange schedule in the past when deployed > -- > > Key: AIRFLOW-392 > URL: https://issues.apache.org/jira/browse/AIRFLOW-392 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.3 > Environment: AWS ElasticBeanstalk as a Docker application running in > an Ubuntu-based container >Reporter: David Klosowski >Assignee: Norman Mu > > Just deployed a new DAG ('weekly-no-track') that depends on 7 DAG task runs > of another DAG ('daily-no-track'). When the DAG is deployed the scheduler > schedules and runs multiple runs in the past (yesterday it ran for 6/12/2016 > and 6/05/2016), despite the start date set to the deployment date. > It would be a bit difficult to include all the code being used in the DAG > since we have multiple libraries we've built in Python that are being > referenced here that we want to eventually open source. I've included some > of the code here. Let me know if this is all clear and what I can do to help > or if any insight can be provided as to what it is occurring and how we might > fix this. > {code} > from __future__ import division, print_function > from airflow.models import DAG > from airflow.operators import DummyOperator, ExternalTaskSensor, > TimeDeltaSensor > from tn_etl_tools.aws.emr import EmrConfig, HiveConfig, read_cluster_templates > from tn_etl_tools.aws.emr import EmrService, EmrServiceWrapper, > HiveStepBuilder > from tn_etl_tools.datesupport import ts_add > from tn_etl_tools.hive import HivePartitions > from tn_etl_tools.yaml import YamlLoader > from datetime import datetime, timedelta > from dateutil.relativedelta import relativedelta, SU, MO , TU, WE, TH, FR, SA > from common_args import merge_dicts, CommonHiveParams > from operator_builders import add_tasks, emr_hive_operator > import os > # === configs > config_dir = os.getenv('DAG_CONFIG_DIR', '/usr/local/airflow/config') > alert_email = os.getenv('AIRFLOW_TO_EMAIL') > app_properties = YamlLoader.load_yaml(config_dir + '/app.yml') >
[jira] [Created] (AIRFLOW-392) DAG runs on strange schedule in the past when deployed
David Klosowski created AIRFLOW-392: --- Summary: DAG runs on strange schedule in the past when deployed Key: AIRFLOW-392 URL: https://issues.apache.org/jira/browse/AIRFLOW-392 Project: Apache Airflow Issue Type: Bug Components: scheduler Affects Versions: Airflow 1.7.1.3 Environment: AWS ElasticBeanstalk as a Docker application running in an Ubuntu-based container Reporter: David Klosowski Assignee: Siddharth Anand Just deployed a new DAG ('weekly-no-track') that depends on 7 DAG task runs of another DAG ('daily-no-track'). When the DAG is deployed the scheduler schedules and runs multiple runs in the past (yesterday it ran for 6/12/2016 and 6/05/2016), despite the start date set to the deployment date. It would be a bit difficult to include all the code being used in the DAG since we have multiple libraries we've built in Python that are being referenced here that we want to eventually open source. I've included some of the code here. Let me know if this is all clear and what I can do to help or if any insight can be provided as to what it is occurring and how we might fix this. {code} from __future__ import division, print_function from airflow.models import DAG from airflow.operators import DummyOperator, ExternalTaskSensor, TimeDeltaSensor from tn_etl_tools.aws.emr import EmrConfig, HiveConfig, read_cluster_templates from tn_etl_tools.aws.emr import EmrService, EmrServiceWrapper, HiveStepBuilder from tn_etl_tools.datesupport import ts_add from tn_etl_tools.hive import HivePartitions from tn_etl_tools.yaml import YamlLoader from datetime import datetime, timedelta from dateutil.relativedelta import relativedelta, SU, MO , TU, WE, TH, FR, SA from common_args import merge_dicts, CommonHiveParams from operator_builders import add_tasks, emr_hive_operator import os # === configs config_dir = os.getenv('DAG_CONFIG_DIR', '/usr/local/airflow/config') alert_email = os.getenv('AIRFLOW_TO_EMAIL') app_properties = YamlLoader.load_yaml(config_dir + '/app.yml') emr_cluster_properties = YamlLoader.load_yaml(config_dir + '/emr_clusters.yml') emr_config = EmrConfig.load(STAGE=app_properties['STAGE'], **app_properties['EMR']) hive_config = HiveConfig.load(STAGE=app_properties['STAGE'], **app_properties['HIVE']) emr_cluster_templates = read_cluster_templates(emr_cluster_properties) # === /configs # TODO: force execution_date = sunday? run_for_date = datetime(2016, 8, 8) emr_service = EmrService() emr_service_wrapper = EmrServiceWrapper(emr_service=emr_service, emr_config=emr_config, cluster_templates=emr_cluster_templates) hive_step_builder = HiveStepBuilder(hive_config=hive_config) hive_params = CommonHiveParams(app_properties_hive=app_properties['HIVE']) args = {'owner': 'airflow', 'depends_on_past': False, 'start_date': run_for_date, 'email': [alert_email], 'email_on_failure': True, 'email_on_retry': False, 'retries': 1, 'trigger_rule' : 'all_success', 'emr_service_wrapper': emr_service_wrapper, 'hive_step_builder': hive_step_builder} user_defined_macros = {'hive_partitions': HivePartitions, 'ts_add': ts_add} params = {'stage': app_properties['STAGE']} dag = DAG(dag_id='weekly_no_track', default_args=args, user_defined_macros=user_defined_macros, params=params, schedule_interval=timedelta(days=7), max_active_runs=1) # === task definitions task_definitions = { 'wait-for-dailies': { 'operator_type': 'dummy_operator', # hub for custom defined dependencies 'operator_args': {}, 'depends_on': [] }, 'weekly-no-track': { 'operator_type': 'emr_hive_operator', 'operator_args': { 'hive_step': { 'script': 'weekly-no-track-airflow', # temporary modified script with separate output path 'cluster_name': 'geoprofile', 'script_vars': merge_dicts(hive_params.default_params(), hive_params.rundate_params(), { 'PARTITIONS': '{{hive_partitions.by_day(ts_add(ts, days=-6), ts_add(ts, days=1))}}', }), } }, 'depends_on': ['wait-for-dailies'] } } # === /task definitions operator_builders = {'emr_hive_operator': emr_hive_operator, 'time_delta_sensor': TimeDeltaSensor, 'dummy_operator': DummyOperator} add_tasks(task_definitions, dag=dag, operator_builders=operator_builders) # === custom tasks downstream_task = dag.get_task('wait-for-dailies') for weekday in [MO, TU, WE, TH, FR, SA, SU]: task_id = 'wait-for-daily-{day}'.format(day=weekday) # weekday(-1) subtracts 1 relative week from the given weekday, however if the calculated date is already Monday, # for example, -1