[GitHub] ckljohn commented on issue #3227: [AIRFLOW-2299] Add S3 Select functionarity to S3FileTransformOperator
ckljohn commented on issue #3227: [AIRFLOW-2299] Add S3 Select functionarity to S3FileTransformOperator URL: https://github.com/apache/incubator-airflow/pull/3227#issuecomment-432093892 @sekikn if the file storing encoded string, the `Payload` returned is bytes. At https://github.com/sekikn/incubator-airflow/blob/288fca445ffcad718d39f413eddd8712a18dbf85/airflow/hooks/S3_hook.py#L248, `''.join()` will raise exception. ``` File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", line 249, in select_key for event in response['Payload'] TypeError: sequence item 0: expected str instance, bytes found ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3243) UI task and dag clear feature cannot pick up dag parameters
chengningzhang created AIRFLOW-3243: --- Summary: UI task and dag clear feature cannot pick up dag parameters Key: AIRFLOW-3243 URL: https://issues.apache.org/jira/browse/AIRFLOW-3243 Project: Apache Airflow Issue Type: Improvement Reporter: chengningzhang Hi, I meet an issue with airflow UI dags and tasks "clear" feature. When I clear the tasks from the UI, the dag parameters will not be picked up by the the cleared tasks. For example, I have "max_active_runs=1" in my dag parameter, but when I manually clear the tasks, this parameter will not be picked up. The same cleared tasks with different schedule time will run in parallel. Is there way we can improve this, as we may want to backfill some data and just clear the past tasks from airflow UI. Thanks, Chengning -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] msumit closed pull request #4081: add Neoway to companies list
msumit closed pull request #4081: add Neoway to companies list URL: https://github.com/apache/incubator-airflow/pull/4081 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/README.md b/README.md index 266e30f677..271d6a74f7 100644 --- a/README.md +++ b/README.md @@ -224,6 +224,7 @@ Currently **officially** using Airflow: 1. [New Relic](https://www.newrelic.com) [[@marcweil](https://github.com/marcweil)] 1. [Newzoo](https://www.newzoo.com) [[@newzoo-nexus](https://github.com/newzoo-nexus)] 1. [Nextdoor](https://nextdoor.com) [[@SivaPandeti](https://github.com/SivaPandeti), [@zshapiro](https://github.com/zshapiro) & [@jthomas123](https://github.com/jthomas123)] +1. [Neoway](https://www.neoway.com.br/) [[@neowaylabs](https://github.com/orgs/NeowayLabs/people)] 1. [OdysseyPrime](https://www.goprime.io/) [[@davideberdin](https://github.com/davideberdin)] 1. [OfferUp](https://offerupnow.com) 1. [OneFineStay](https://www.onefinestay.com) [[@slangwald](https://github.com/slangwald)] This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4083: Airflow 3211
codecov-io edited a comment on issue #4083: Airflow 3211 URL: https://github.com/apache/incubator-airflow/pull/4083#issuecomment-432075536 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4083?src=pr=h1) Report > Merging [#4083](https://codecov.io/gh/apache/incubator-airflow/pull/4083?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/829086c8718920b350728e2a126da5db08dea541?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4083/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4083?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#4083 +/- ## === Coverage 77.91% 77.91% === Files 199 199 Lines 1595815958 === Hits1243312433 Misses 3525 3525 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4083?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4083?src=pr=footer). Last update [829086c...0b34f56](https://codecov.io/gh/apache/incubator-airflow/pull/4083?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #4083: Airflow 3211
codecov-io commented on issue #4083: Airflow 3211 URL: https://github.com/apache/incubator-airflow/pull/4083#issuecomment-432075536 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4083?src=pr=h1) Report > Merging [#4083](https://codecov.io/gh/apache/incubator-airflow/pull/4083?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/829086c8718920b350728e2a126da5db08dea541?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4083/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4083?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#4083 +/- ## === Coverage 77.91% 77.91% === Files 199 199 Lines 1595815958 === Hits1243312433 Misses 3525 3525 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4083?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4083?src=pr=footer). Last update [829086c...0b34f56](https://codecov.io/gh/apache/incubator-airflow/pull/4083?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jj-ian opened a new pull request #4083: Airflow 3211
jj-ian opened a new pull request #4083: Airflow 3211 URL: https://github.com/apache/incubator-airflow/pull/4083 This change allows Airflow to reattach to existing Dataproc jobs upon scheduler restart. Previously, if the Airflow scheduler restarts while it's running a job on GCP Dataproc, it'll lose track of that job, mark the task as failed, and eventually retry. However, the jobs may still be running on Dataproc and maybe even finish successfully. So when Airflow retries and reruns the job, the same job will run twice. This can result in issues like delayed workflows, increased costs, and duplicate data. Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. - https://issues.apache.org/jira/browse/AIRFLOW-3211 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: My change has Airflow query the Dataproc API before submitting a job to see if the job is already running on the cluster. If a job with a matching task ID is already running on the cluster AND is in a recoverable state (like RUNNING or DONE), then Airflow will reattach itself to the existing job on Dataproc instead of resubmitting a new job to the cluster. If the job on the cluster is in an irrecoverable state like ERROR, Airflow will resubmit the job. To see this change in action: Setup: 1. Set up a GCP Project with the Dataproc API enabled 2. Install Airflow. 3. In the box that's running Airflow, `pip install google-api-python-client oauth2client` 4. Start the Airflow webserver. In the Airflow UI, Go to Admin->Connections, edit the `google_cloud_default` connection, and fill in the Project Id field with your project ID. To reproduce: 1. Install this DAG in the Airflow instance: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/b80895ed88ba86fce223df27a48bf481007ca708/composer/workflows/quickstart.py Set up the Airflow variables as instructed at the top of the file. 2. Start the Airflow scheduler and webserver if they're not running already. Kick off a run of the above DAG through the Airflow UI. Wait for the cluster to spin up and the job to start running on Dataproc. 3. While the job's running, kill the scheduler. Wait 5 seconds or so, and then start it back up. 4. Airflow will retry the task and reattach to the existing task already on Dataproc. Look at the Airflow logs to observe "Reattaching to previously-started DataProc job [JOB NAME HERE] (in state RUNNING)." Click on the cluster in Dataproc to observe that only the single job is running; a duplicate job has not been submitted. 5. Observe that, when the job finishes, Airflow detects the completion successfully and runs the downstream cluster delete operation. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Added the following tests to `tests/contrib/hooks/test_gcp_dataproc_hook.py`: When submitting a new job to Dataproc: - If a job with the same task ID is already running on the cluster, don't resubmit the job. - If the first matching job found on the cluster is in an irrecoverable state, keep looking for a job in a recoverable state to reattach to on the cluster. This ensures that Airflow will prioritize recoverable jobs when looking for jobs to reattach to on the cluster. - If there are jobs running on the cluster, but none of them have the same task ID as the job we're about to submit, then submit the new job. - If there are no other jobs already running on the cluster, then submit the job. - If a job with the same task ID finished with error on the cluster, then resubmit the job for retry. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message,
[GitHub] codecov-io commented on issue #4082: [AIRFLOW-2865] Call success_callback before updating task state
codecov-io commented on issue #4082: [AIRFLOW-2865] Call success_callback before updating task state URL: https://github.com/apache/incubator-airflow/pull/4082#issuecomment-432029716 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4082?src=pr=h1) Report > Merging [#4082](https://codecov.io/gh/apache/incubator-airflow/pull/4082?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/829086c8718920b350728e2a126da5db08dea541?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4082/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4082?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4082 +/- ## == + Coverage 77.91% 77.91% +<.01% == Files 199 199 Lines 1595815957 -1 == Hits1243312433 + Misses 3525 3524 -1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4082?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/4082/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `92.2% <100%> (+0.03%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4082?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4082?src=pr=footer). Last update [829086c...8a41998](https://codecov.io/gh/apache/incubator-airflow/pull/4082?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #4081: add Neoway to companies list
codecov-io commented on issue #4081: add Neoway to companies list URL: https://github.com/apache/incubator-airflow/pull/4081#issuecomment-432028987 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4081?src=pr=h1) Report > Merging [#4081](https://codecov.io/gh/apache/incubator-airflow/pull/4081?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/829086c8718920b350728e2a126da5db08dea541?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4081/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4081?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#4081 +/- ## === Coverage 77.91% 77.91% === Files 199 199 Lines 1595815958 === Hits1243312433 Misses 3525 3525 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4081?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4081?src=pr=footer). Last update [829086c...00dafa7](https://codecov.io/gh/apache/incubator-airflow/pull/4081?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4081: add Neoway to companies list
codecov-io edited a comment on issue #4081: add Neoway to companies list URL: https://github.com/apache/incubator-airflow/pull/4081#issuecomment-432028987 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4081?src=pr=h1) Report > Merging [#4081](https://codecov.io/gh/apache/incubator-airflow/pull/4081?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/829086c8718920b350728e2a126da5db08dea541?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4081/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4081?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#4081 +/- ## === Coverage 77.91% 77.91% === Files 199 199 Lines 1595815958 === Hits1243312433 Misses 3525 3525 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4081?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4081?src=pr=footer). Last update [829086c...00dafa7](https://codecov.io/gh/apache/incubator-airflow/pull/4081?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3242) execution_date for TriggerDagRunOperator should be based from Triggering dag
Feng Zhou created AIRFLOW-3242: -- Summary: execution_date for TriggerDagRunOperator should be based from Triggering dag Key: AIRFLOW-3242 URL: https://issues.apache.org/jira/browse/AIRFLOW-3242 Project: Apache Airflow Issue Type: Bug Components: DagRun Affects Versions: 1.10.0, 1.9.0, 1.8.2 Environment: any linux / mac os Reporter: Feng Zhou TriggerDagRunOperator should pick up execute_date from context instead just default to today. This broke back filling logic if TriggerDagRunOperator is used. Could simply add one line to address this issue, see red highlighted line below: def execute(self, context): dr = trigger_dag.create_dagrun( run_id=dro.run_id, state=State.RUNNING, *_{color:#FF}execution_date=context['execution_date'],{color}_* ## around line#70 conf=dro.payload, external_trigger=True) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] ultrabug commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ultrabug commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431993101 Well @ron819 @ashb thanks for your updates but I've never heard back from upstream and any interest in this so I will gladly rebase myself from the CLI implementation (which looks very close on a quick glance) if there's a chance for it to go in... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2865) Race condition between on_success_callback and LocalTaskJob's cleanup
[ https://issues.apache.org/jira/browse/AIRFLOW-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659699#comment-16659699 ] ASF GitHub Bot commented on AIRFLOW-2865: - evizitei opened a new pull request #4082: [AIRFLOW-2865] Call success_callback before updating task state URL: https://github.com/apache/incubator-airflow/pull/4082 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW-2865/) issues and references them in the PR title. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: In cases where the success callback takes variable time, it's possible for it to interrupted by the heartbeat process. This is because the heartbeat process looks for tasks that are no longer in the "running" state but are still executing and reaps them. This commit reverses the order of callback invocation and state updating so that the "SUCCESS" state for the task isn't committed to the database until after the success callback has finished. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: TaskInstanceTest.test_success_callbak_no_race_condition ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Race condition between on_success_callback and LocalTaskJob's cleanup > - > > Key: AIRFLOW-2865 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2865 > Project: Apache Airflow > Issue Type: Bug >Reporter: Marcin Mejran >Priority: Minor > > The TaskInstance's run_raw_task method first records SUCCESS for the task > instance and then runs the on_success_callback function. > The LocalTaskJob's heartbeat_callback checks for any TI's with a SUCCESS > state and terminates their processes. > As such it's possible for the TI process to be terminated before the > on_success_callback function finishes running. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] evizitei opened a new pull request #4082: [AIRFLOW-2865] Call success_callback before updating task state
evizitei opened a new pull request #4082: [AIRFLOW-2865] Call success_callback before updating task state URL: https://github.com/apache/incubator-airflow/pull/4082 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW-2865/) issues and references them in the PR title. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: In cases where the success callback takes variable time, it's possible for it to interrupted by the heartbeat process. This is because the heartbeat process looks for tasks that are no longer in the "running" state but are still executing and reaps them. This commit reverses the order of callback invocation and state updating so that the "SUCCESS" state for the task isn't committed to the database until after the success callback has finished. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: TaskInstanceTest.test_success_callbak_no_race_condition ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] gseva opened a new pull request #4080: [AIRFLOW-XXX] Make hmsclient import optional
gseva opened a new pull request #4080: [AIRFLOW-XXX] Make hmsclient import optional URL: https://github.com/apache/incubator-airflow/pull/4080 ### Jira - No jira issue ### Description Currently to use anything from hive_hooks.py you must have hmsclient installed, which is inconsistent: thrift imports are made inside the `get_metastore_client` method, and hnsclient is imported outside (and it does thrift imports internally). ### Tests Not sure if tests needed for this change. ### Documentation ### Code Quality This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jghoman commented on issue #4079: [AIRFLOW-XXX] Add Surfline to companies list
jghoman commented on issue #4079: [AIRFLOW-XXX] Add Surfline to companies list URL: https://github.com/apache/incubator-airflow/pull/4079#issuecomment-431966893 +1. Looks good. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jghoman closed pull request #4079: [AIRFLOW-XXX] Add Surfline to companies list
jghoman closed pull request #4079: [AIRFLOW-XXX] Add Surfline to companies list URL: https://github.com/apache/incubator-airflow/pull/4079 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/README.md b/README.md index 4dad5f8327..266e30f677 100644 --- a/README.md +++ b/README.md @@ -262,6 +262,7 @@ Currently **officially** using Airflow: 1. [Stripe](https://stripe.com) [[@jbalogh](https://github.com/jbalogh)] 1. [Strongmind](https://www.strongmind.com) [[@tomchapin](https://github.com/tomchapin) & [@wongstein](https://github.com/wongstein)] 1. [Square](https://squareup.com/) +1. [Surfline](https://www.surfline.com/) [[@jawang35](https://github.com/jawang35)] 1. [Tails.com](https://tails.com/) [[@alanmcruickshank](https://github.com/alanmcruickshank)] 1. [Tesla](https://www.tesla.com/) [[@thoralf-gutierrez](https://github.com/thoralf-gutierrez)] 1. [The Home Depot](https://www.homedepot.com/)[[@apekshithr](https://github.com/apekshithr)] This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jason-udacity commented on issue #4073: [AIRFLOW-3238] Fix models.DAG.deactivate_unknown_dags
jason-udacity commented on issue #4073: [AIRFLOW-3238] Fix models.DAG.deactivate_unknown_dags URL: https://github.com/apache/incubator-airflow/pull/4073#issuecomment-431965073 @ashb `upgradedb` does not invoke `deactivate_unknown_dags` as far as I'm aware. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jawang35 opened a new pull request #4079: [AIRFLOW-XXX] Add Surfline to companies list
jawang35 opened a new pull request #4079: [AIRFLOW-XXX] Add Surfline to companies list URL: https://github.com/apache/incubator-airflow/pull/4079 ### Jira - No Jira issue. Add Surfline to companies list. ### Description - This PR adds Surfline to the companies list in the `README.md`. ### Tests - No tests required. No code changes. ### Documentation - No code changes. ### Code Quality - No code changes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] BasPH commented on issue #4071: [AIRFLOW-3237] Refactor example DAGs
BasPH commented on issue #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#issuecomment-431914227 I processed your changes except for @KimchaC's suggestion for using the DAG context manager, because I'm unsure what the Airflow team thinks about it? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Eronarn commented on issue #3584: [AIRFLOW-249] Refactor the SLA mechanism
Eronarn commented on issue #3584: [AIRFLOW-249] Refactor the SLA mechanism URL: https://github.com/apache/incubator-airflow/pull/3584#issuecomment-431896371 What's the recommended way to proceed at this point? I'm glad to write more test code, but I'd love to have an end goal of where this needs to be to be mergeable. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] oelesinsc24 commented on a change in pull request #4068: [AIRFLOW-2310]: Add AWS Glue Job Compatibility to Airflow
oelesinsc24 commented on a change in pull request #4068: [AIRFLOW-2310]: Add AWS Glue Job Compatibility to Airflow URL: https://github.com/apache/incubator-airflow/pull/4068#discussion_r227038736 ## File path: airflow/contrib/hooks/aws_glue_job_hook.py ## @@ -0,0 +1,130 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + + +from airflow.exceptions import AirflowException +from airflow.contrib.hooks.aws_hook import AwsHook +import time + + +class AwsGlueJobHook(AwsHook): +""" +Interact with AWS Glue - create job, trigger, crawler + +:param job_name: unique job name per AWS account +:type str +:param desc: job description +:type str +:param region_name: aws region name (example: us-east-1) +:type region_name: str Review comment: Can explain what you mean, please? Other AWS service implementations have it like this already: airflow/contrib/operators/awsbatch_operator.py#L65-71 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] oelesinsc24 commented on a change in pull request #4068: [AIRFLOW-2310]: Add AWS Glue Job Compatibility to Airflow
oelesinsc24 commented on a change in pull request #4068: [AIRFLOW-2310]: Add AWS Glue Job Compatibility to Airflow URL: https://github.com/apache/incubator-airflow/pull/4068#discussion_r227036055 ## File path: airflow/contrib/hooks/aws_glue_job_hook.py ## @@ -0,0 +1,130 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + + +from airflow.exceptions import AirflowException +from airflow.contrib.hooks.aws_hook import AwsHook +import time + + +class AwsGlueJobHook(AwsHook): +""" +Interact with AWS Glue - create job, trigger, crawler + +:param job_name: unique job name per AWS account +:type str +:param desc: job description +:type str +:param region_name: aws region name (example: us-east-1) +:type region_name: str +""" + +def __init__(self, + job_name=None, + desc=None, + aws_conn_id='aws_default', + region_name=None, *args, **kwargs): +self.job_name = job_name +self.desc = desc +self.aws_conn_id = aws_conn_id +self.region_name = region_name +super(AwsGlueJobHook, self).__init__(*args, **kwargs) + +def get_conn(self): +conn = self.get_client_type('glue', self.region_name) +return conn + +def list_jobs(self): +conn = self.get_conn() +return conn.get_jobs() + +def initialize_job(self, script_arguments=None): +""" +Initializes connection with AWS Glue +to run job +:return: +""" +glue_client = self.get_conn() + +try: +job_response = self.get_glue_job() +job_name = job_response['Name'] +job_run = glue_client.start_job_run( +JobName=job_name, +Arguments=script_arguments +) +return self.job_completion(job_name, job_run['JobRunId']) +except Exception as general_error: +raise AirflowException( +'Failed to run aws glue job, error: {error}'.format( +error=str(general_error) +) +) + +def job_completion(self, job_name=None, run_id=None): +""" +:param job_name: +:param run_id: +:return: +""" +glue_client = self.get_conn() +job_status = glue_client.get_job_run( +JobName=job_name, +RunId=run_id, +PredecessorsIncluded=True +) +job_run_state = job_status['JobRun']['JobRunState'] +failed = job_run_state == 'FAILED' +stopped = job_run_state == 'STOPPED' +completed = job_run_state == 'SUCCEEDED' + +while True: +if failed or stopped or completed: +self.log.info("Exiting Job {} Run State: {}" + .format(run_id, job_run_state)) +return {'JobRunState': job_run_state, 'JobRunId': run_id} +else: +self.log.info("Polling for AWS Glue Job {} current run state" + .format(job_name)) +time.sleep(6) Review comment: @ashb, run job and poll job for completion are two separate methods already. @Fokko, 6 seconds was a random number having looked at sleep time in some existing implementations. e.g: 1. BigQuery `5 seconds` 2. Google Dataproc `5 seconds` (airflow/contrib/hooks/gcp_dataproc_hook.py#L70) 3. Google Dataproc `10 seconds` (another section: airflow/contrib/hooks/gcp_dataproc_hook.py#L164) However, this can be changed. In addition, from my experience with AWS Glue, 6 seconds is enough to poll for job status This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] oelesinsc24 commented on a change in pull request #4068: [AIRFLOW-2310]: Add AWS Glue Job Compatibility to Airflow
oelesinsc24 commented on a change in pull request #4068: [AIRFLOW-2310]: Add AWS Glue Job Compatibility to Airflow URL: https://github.com/apache/incubator-airflow/pull/4068#discussion_r227033014 ## File path: tests/contrib/hooks/test_aws_glue_job_hook.py ## @@ -0,0 +1,84 @@ +# -*- coding: utf-8 -*- Review comment: Currently, moto does not support glue jobs yet. I opened a feature request (https://github.com/spulec/moto/issues/1561) for this. I could have contributed but not sure I have much time now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3234) Enable alerting for dagbag import errors
[ https://issues.apache.org/jira/browse/AIRFLOW-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659408#comment-16659408 ] ASF GitHub Bot commented on AIRFLOW-3234: - ajbosco opened a new pull request #4078: [AIRFLOW-3234] add dagbag_import_failure_handler URL: https://github.com/apache/incubator-airflow/pull/4078 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Enable alerting for dagbag import errors > > > Key: AIRFLOW-3234 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3234 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Adam Boscarino >Assignee: Adam Boscarino >Priority: Minor > > If a task fails due to being unable to load the DagBag it is set to `failed` > without being able to use callbacks or retries. This creates the possibility > for "silent" failures. We should have the ability to handle these failures > with some other functionality (similar to the SLA callbacks). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] ron819 commented on issue #3563: [AIRFLOW-2698] Simplify Kerberos code
ron819 commented on issue #3563: [AIRFLOW-2698] Simplify Kerberos code URL: https://github.com/apache/incubator-airflow/pull/3563#issuecomment-431841517 @Fokko @gglanzani any updates on this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ajbosco opened a new pull request #4078: [AIRFLOW-3234] add dagbag_import_failure_handler
ajbosco opened a new pull request #4078: [AIRFLOW-3234] add dagbag_import_failure_handler URL: https://github.com/apache/incubator-airflow/pull/4078 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KimchaC commented on issue #4071: [AIRFLOW-3237] Refactor example DAGs
KimchaC commented on issue #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#issuecomment-431833674 Sorry, github is wonky for me today. First it wouldn't let me post comments and then it ended up with many duplicates and when I tried to clean them up it deleted all of them. I previously wrote: > Consider also changing the code to use the with context manager so that you don't have to repeat the dag=dag parameter on each task: > ``` > dag = DAG( > 'my_dag', > start_date=datetime(2016, 1, 1)) > with dag: > op = DummyOperator('op') > > op.dag is dag # True > ``` To which @BasPH replied... > @KimchaC I generally see the 50/50 usage of passing dag object vs using dag context manager in Airflow code. All example DAGs pass the dag object to the operators. Is there a preference for either by the Airflow community? I think the airflow team should decide on a preference for the community. One of the reasons that it is not used by everyone is probably because not everyone is aware of this feature. One of the reasons for that is that the examples are not using it :) Personally I think the with statement is awesome, makes the DAG code _much_ cleaner and reduces repetition. I'd suggest adding a comment to the examples like... ``` # The with statement allows you to omit the dag parameter when initializing tasks. with dag: ... ``` I also think the clode is clearer when the DAG is initiated separately and not inside the with statement. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] marengaz commented on issue #4056: [AIRFLOW-3207] option to stop task pushing result to xcom
marengaz commented on issue #4056: [AIRFLOW-3207] option to stop task pushing result to xcom URL: https://github.com/apache/incubator-airflow/pull/4056#issuecomment-431832931 @ashb - we cant use a flag `BaseOperator.xcom_push` because this conflicts with the method `BaseOperator.xcom_push()`. I'll change the flag to be `BaseOperator.do_xcom_push`. it's already like this in some operators This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] BasPH commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs
BasPH commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#discussion_r226978756 ## File path: airflow/example_dags/example_bash_operator.py ## @@ -17,48 +17,54 @@ # specific language governing permissions and limitations # under the License. -import airflow from builtins import range -from airflow.operators.bash_operator import BashOperator -from airflow.operators.dummy_operator import DummyOperator -from airflow.models import DAG from datetime import timedelta +import airflow +from airflow.models import DAG +from airflow.operators.bash_operator import BashOperator +from airflow.operators.dummy_operator import DummyOperator -args = { -'owner': 'airflow', -'start_date': airflow.utils.dates.days_ago(2) -} +args = {'owner': 'airflow', 'start_date': airflow.utils.dates.days_ago(2)} Review comment: @kaxil will revert back to each pair on separate line. @KimchaC I didn't refactor the start_date in any of the example DAGs. I imagine dynamic start_date was used to always have the example DAGs start from a recent date instead of a fixed date resulting in a large number of DAG runs when loading the examples. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] BasPH commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs
BasPH commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#discussion_r226977084 ## File path: airflow/example_dags/example_trigger_target_dag.py ## @@ -62,12 +59,13 @@ def run_this_func(ds, **kwargs): task_id='run_this', provide_context=True, python_callable=run_this_func, -dag=dag) - +dag=dag, +) # You can also access the DagRun object in templates bash_task = BashOperator( task_id="bash_task", bash_command='echo "Here is the message: ' - '{{ dag_run.conf["message"] if dag_run else "" }}" ', -dag=dag) +'{{ dag_run.conf["message"] if dag_run else "" }}" ', Review comment: My auto formatter (Black) done it that way. Will correct. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] BasPH commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs
BasPH commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#discussion_r226976404 ## File path: airflow/example_dags/example_branch_python_dop_operator_3.py ## @@ -46,17 +48,9 @@ def should_run(ds, **kwargs): cond = BranchPythonOperator( -task_id='condition', -provide_context=True, -python_callable=should_run, -dag=dag) - -oper_1 = DummyOperator( -task_id='oper_1', -dag=dag) -oper_1.set_upstream(cond) - -oper_2 = DummyOperator( -task_id='oper_2', -dag=dag) -oper_2.set_upstream(cond) +task_id='condition', provide_context=True, python_callable=should_run, dag=dag +) + +oper_1 = DummyOperator(task_id='oper_1', dag=dag) Review comment: I didn't refactor the task names but agree it makes more sense. Will fix. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] BasPH commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs
BasPH commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#discussion_r226975344 ## File path: airflow/example_dags/example_skip_dag.py ## @@ -37,23 +33,15 @@ def execute(self, context): raise AirflowSkipException -dag = DAG(dag_id='example_skip_dag', default_args=args) - - def create_test_pipeline(suffix, trigger_rule, dag): - skip_operator = DummySkipOperator(task_id='skip_operator_{}'.format(suffix), dag=dag) - always_true = DummyOperator(task_id='always_true_{}'.format(suffix), dag=dag) - join = DummyOperator(task_id=trigger_rule, dag=dag, trigger_rule=trigger_rule) - -join.set_upstream(skip_operator) -join.set_upstream(always_true) - final = DummyOperator(task_id='final_{}'.format(suffix), dag=dag) -final.set_upstream(join) + +[skip_operator, always_true] >> join >> final Review comment: Whoops my bad. Checked the `BaseOperator.__rshift__` and that accepts a single task or List of tasks for the `other` object. But obviously calling rshift on a Python List doesn't work :-) Will fix. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431765827 There is a CLI command merged to master by @XD-DENG that shows the next execution date https://github.com/apache/incubator-airflow/pull/3834 Maybe this PR can use the logic already merged to master to populate the values to the UI. Also duplicate Jira ticket for this: https://issues.apache.org/jira/browse/AIRFLOW-2618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431763957 There is a CLI command merged to master by @XD-DENG that shows the next execution date https://github.com/apache/incubator-airflow/pull/3834 Maybe this PR can use the logic already merged to master to populate the values to the UI. Also duplicate Jira ticket for this: https://issues.apache.org/jira/browse/AIRFLOW-2618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431765691 There is a CLI command merged to master by @XD-DENG that shows the next execution date https://github.com/apache/incubator-airflow/pull/3834 Maybe this PR can use the logic already merged to master to populate the values to the UI. Also duplicate Jira ticket for this: https://issues.apache.org/jira/browse/AIRFLOW-2618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431778262 There is a CLI command merged to master by @XD-DENG that shows the next execution date https://github.com/apache/incubator-airflow/pull/3834 Maybe this PR can use the logic already merged to master to populate the values to the UI. Also duplicate Jira ticket for this: https://issues.apache.org/jira/browse/AIRFLOW-2618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431766806 There is a CLI command merged to master by @XD-DENG that shows the next execution date https://github.com/apache/incubator-airflow/pull/3834 Maybe this PR can use the logic already merged to master to populate the values to the UI. Also duplicate Jira ticket for this: https://issues.apache.org/jira/browse/AIRFLOW-2618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431763613 There is a CLI command merged to master by @XD-DENG that shows the next execution date https://github.com/apache/incubator-airflow/pull/3834 Maybe this PR can use the logic already merged to master to populate the values to the UI. Also duplicate Jira ticket for this: https://issues.apache.org/jira/browse/AIRFLOW-2618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431764312 There is a CLI command merged to master by @XD-DENG that shows the next execution date https://github.com/apache/incubator-airflow/pull/3834 Maybe this PR can use the logic already merged to master to populate the values to the UI. Also duplicate Jira ticket for this: https://issues.apache.org/jira/browse/AIRFLOW-2618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs
Fokko commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#discussion_r226970148 ## File path: airflow/example_dags/example_skip_dag.py ## @@ -37,23 +33,15 @@ def execute(self, context): raise AirflowSkipException -dag = DAG(dag_id='example_skip_dag', default_args=args) - - def create_test_pipeline(suffix, trigger_rule, dag): - skip_operator = DummySkipOperator(task_id='skip_operator_{}'.format(suffix), dag=dag) - always_true = DummyOperator(task_id='always_true_{}'.format(suffix), dag=dag) - join = DummyOperator(task_id=trigger_rule, dag=dag, trigger_rule=trigger_rule) - -join.set_upstream(skip_operator) -join.set_upstream(always_true) - final = DummyOperator(task_id='final_{}'.format(suffix), dag=dag) -final.set_upstream(join) + +[skip_operator, always_true] >> join >> final Review comment: Is this allowed? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs
Fokko commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#discussion_r226969880 ## File path: airflow/example_dags/example_branch_python_dop_operator_3.py ## @@ -46,17 +48,9 @@ def should_run(ds, **kwargs): cond = BranchPythonOperator( -task_id='condition', -provide_context=True, -python_callable=should_run, -dag=dag) - -oper_1 = DummyOperator( -task_id='oper_1', -dag=dag) -oper_1.set_upstream(cond) - -oper_2 = DummyOperator( -task_id='oper_2', -dag=dag) -oper_2.set_upstream(cond) +task_id='condition', provide_context=True, python_callable=should_run, dag=dag +) + +oper_1 = DummyOperator(task_id='oper_1', dag=dag) Review comment: Maybe change `oper_1` to `dummy_task_1`? This makes more sense when it shows up in the UI. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KimchaC removed a comment on issue #4071: [AIRFLOW-3237] Refactor example DAGs
KimchaC removed a comment on issue #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#issuecomment-431792918 Consider also changing the code to use the with context manager so that you don't have to repeat the dag=dag parameter on each task: ``` dag = DAG( 'my_dag', start_date=datetime(2016, 1, 1)) with dag: op = DummyOperator('op') op.dag is dag # True This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs
Fokko commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#discussion_r226970342 ## File path: airflow/example_dags/example_trigger_target_dag.py ## @@ -62,12 +59,13 @@ def run_this_func(ds, **kwargs): task_id='run_this', provide_context=True, python_callable=run_this_func, -dag=dag) - +dag=dag, +) # You can also access the DagRun object in templates bash_task = BashOperator( task_id="bash_task", bash_command='echo "Here is the message: ' - '{{ dag_run.conf["message"] if dag_run else "" }}" ', -dag=dag) +'{{ dag_run.conf["message"] if dag_run else "" }}" ', Review comment: I prefer the original indentation. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KimchaC removed a comment on issue #4071: [AIRFLOW-3237] Refactor example DAGs
KimchaC removed a comment on issue #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#issuecomment-431793561 Consider also changing the code to use the with context manager so that you don't have to repeat the dag=dag parameter on each task: ``` dag = DAG( 'my_dag', start_date=datetime(2016, 1, 1)) with dag: op = DummyOperator('op') op.dag is dag # True This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KimchaC removed a comment on issue #4071: [AIRFLOW-3237] Refactor example DAGs
KimchaC removed a comment on issue #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#issuecomment-431795378 Consider also changing the code to use the with context manager so that you don't have to repeat the dag=dag parameter on each task: ``` dag = DAG( 'my_dag', start_date=datetime(2016, 1, 1)) with dag: op = DummyOperator('op') op.dag is dag # True This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KimchaC removed a comment on issue #4071: [AIRFLOW-3237] Refactor example DAGs
KimchaC removed a comment on issue #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#issuecomment-431793039 Consider also changing the code to use the with context manager so that you don't have to repeat the dag=dag parameter on each task: ``` dag = DAG( 'my_dag', start_date=datetime(2016, 1, 1)) with dag: op = DummyOperator('op') op.dag is dag # True This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KimchaC commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs
KimchaC commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#discussion_r226949341 ## File path: airflow/example_dags/example_bash_operator.py ## @@ -17,48 +17,54 @@ # specific language governing permissions and limitations # under the License. -import airflow from builtins import range -from airflow.operators.bash_operator import BashOperator -from airflow.operators.dummy_operator import DummyOperator -from airflow.models import DAG from datetime import timedelta +import airflow +from airflow.models import DAG +from airflow.operators.bash_operator import BashOperator +from airflow.operators.dummy_operator import DummyOperator -args = { -'owner': 'airflow', -'start_date': airflow.utils.dates.days_ago(2) -} +args = {'owner': 'airflow', 'start_date': airflow.utils.dates.days_ago(2)} Review comment: Also, shouldn't the start_date be a fixed date? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KimchaC commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs
KimchaC commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#discussion_r226954622 ## File path: airflow/example_dags/example_bash_operator.py ## @@ -17,48 +17,54 @@ # specific language governing permissions and limitations # under the License. -import airflow from builtins import range -from airflow.operators.bash_operator import BashOperator -from airflow.operators.dummy_operator import DummyOperator -from airflow.models import DAG from datetime import timedelta +import airflow +from airflow.models import DAG +from airflow.operators.bash_operator import BashOperator +from airflow.operators.dummy_operator import DummyOperator -args = { -'owner': 'airflow', -'start_date': airflow.utils.dates.days_ago(2) -} +args = {'owner': 'airflow', 'start_date': airflow.utils.dates.days_ago(2)} Review comment: Also, shouldn't the start_date be a fixed date? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KimchaC commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs
KimchaC commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#discussion_r226949567 ## File path: airflow/example_dags/example_bash_operator.py ## @@ -17,48 +17,54 @@ # specific language governing permissions and limitations # under the License. -import airflow from builtins import range -from airflow.operators.bash_operator import BashOperator -from airflow.operators.dummy_operator import DummyOperator -from airflow.models import DAG from datetime import timedelta +import airflow +from airflow.models import DAG +from airflow.operators.bash_operator import BashOperator +from airflow.operators.dummy_operator import DummyOperator -args = { -'owner': 'airflow', -'start_date': airflow.utils.dates.days_ago(2) -} +args = {'owner': 'airflow', 'start_date': airflow.utils.dates.days_ago(2)} Review comment: Also, shouldn't the start_date be a fixed date? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KimchaC commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs
KimchaC commented on a change in pull request #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#discussion_r226955684 ## File path: airflow/example_dags/example_bash_operator.py ## @@ -17,48 +17,54 @@ # specific language governing permissions and limitations # under the License. -import airflow from builtins import range -from airflow.operators.bash_operator import BashOperator -from airflow.operators.dummy_operator import DummyOperator -from airflow.models import DAG from datetime import timedelta +import airflow +from airflow.models import DAG +from airflow.operators.bash_operator import BashOperator +from airflow.operators.dummy_operator import DummyOperator -args = { -'owner': 'airflow', -'start_date': airflow.utils.dates.days_ago(2) -} +args = {'owner': 'airflow', 'start_date': airflow.utils.dates.days_ago(2)} Review comment: Also, shouldn't the start_date be a fixed date? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KimchaC removed a comment on issue #4071: [AIRFLOW-3237] Refactor example DAGs
KimchaC removed a comment on issue #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#issuecomment-431792968 Consider also changing the code to use the with context manager so that you don't have to repeat the dag=dag parameter on each task: ``` dag = DAG( 'my_dag', start_date=datetime(2016, 1, 1)) with dag: op = DummyOperator('op') op.dag is dag # True This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] BasPH commented on issue #4071: [AIRFLOW-3237] Refactor example DAGs
BasPH commented on issue #4071: [AIRFLOW-3237] Refactor example DAGs URL: https://github.com/apache/incubator-airflow/pull/4071#issuecomment-431818293 @KimchaC I generally see the 50/50 usage of passing dag object vs using dag context manager in Airflow code. All example DAGs pass the dag object to the operators. Is there a preference for either by the Airflow community? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #4077: [AIRFLOW-461] Restore parameter position for BQ run_load method
kaxil commented on issue #4077: [AIRFLOW-461] Restore parameter position for BQ run_load method URL: https://github.com/apache/incubator-airflow/pull/4077#issuecomment-431817142 Well ofcourse.. It was stupid of me.. :D Reverted this ``` non-default argument follows default argument ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on issue #4077: [AIRFLOW-461] Restore parameter position for BQ run_load method
ashb commented on issue #4077: [AIRFLOW-461] Restore parameter position for BQ run_load method URL: https://github.com/apache/incubator-airflow/pull/4077#issuecomment-431810456 This failed flake8 tests with a syntax error. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] felipegasparini edited a comment on issue #4036: [AIRFLOW-2744] Allow RBAC to accept plugins for views and links.
felipegasparini edited a comment on issue #4036: [AIRFLOW-2744] Allow RBAC to accept plugins for views and links. URL: https://github.com/apache/incubator-airflow/pull/4036#issuecomment-431781833 hey guys, is there any ETA for this PR? Btw, could you also update the documentation and release notes for 1.10 to clarify that plugins are not supported at this moment on the new RBAC UI? I put some effort on the migration to 1.10 only to rollback because of the lack of support for plugins. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2397) Support affinity policies for Kubernetes executor/operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor updated AIRFLOW-2397: --- Fix Version/s: (was: 1.10.0) 1.10.1 > Support affinity policies for Kubernetes executor/operator > -- > > Key: AIRFLOW-2397 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2397 > Project: Apache Airflow > Issue Type: Sub-task >Reporter: Sergio B >Assignee: roc chan >Priority: Major > Fix For: 2.0.0, 1.10.1 > > > In order to be able to have a fine control in the workload distribution > implement the ability to set affinity policies in kubernetes would solve > complex problems > https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#affinity-v1-core -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2854) kubernetes_pod_operator add more configuration items
[ https://issues.apache.org/jira/browse/AIRFLOW-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor updated AIRFLOW-2854: --- Fix Version/s: 1.10.1 > kubernetes_pod_operator add more configuration items > > > Key: AIRFLOW-2854 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2854 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Affects Versions: 2.0.0 >Reporter: pengchen >Assignee: pengchen >Priority: Minor > Fix For: 2.0.0, 1.10.1 > > > kubernetes_pod_operator is missing kubernetes pods related configuration > items, as follows: > 1. image_pull_secrets > _Pull secrets_ are used to _pull_ private container _images_ from registries. > In this case, we need to configure the image_pull_secrets in pod spec file > 2. service_account_name > When kubernetes is running on rbac Authorization. If it is a job that needs > to operate on kubernetes resources, we need to configure service account. > 3. is_delete_operator_pod > This option can be given to the user to decide whether to delete the job pod > created by pod_operator, which is currently not processed. > 4. hostnetwork -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2662) support affinity & nodeSelector policies for kubernetes executor/operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor updated AIRFLOW-2662: --- Fix Version/s: 1.10.1 > support affinity & nodeSelector policies for kubernetes executor/operator > - > > Key: AIRFLOW-2662 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2662 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Affects Versions: 2.0.0 >Reporter: pengchen >Assignee: pengchen >Priority: Minor > Fix For: 2.0.0, 1.10.1 > > > In this issue(https://issues.apache.org/jira/browse/AIRFLOW-2397), only the > affinity function of the kubernetes operator pod is provided, and the > affinity function of the kubernetes executor pod is not supported. The full > affinity and nodeselector function are provided here. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1753) Can't install on windows 10
[ https://issues.apache.org/jira/browse/AIRFLOW-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659062#comment-16659062 ] jack commented on AIRFLOW-1753: --- [~ashb] I think it's worth mentioning on the docs (at least in the FAQ section) that Airflow currently can't be installed on Windows. Seen this question on many places. > Can't install on windows 10 > --- > > Key: AIRFLOW-1753 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1753 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.8.0 >Reporter: Lakshman Udayakantha >Priority: Major > > When I installed airflow using "pip install airflow command" two errors pop > up. > 1. link.exe failed with exit status 1158 > 2.\x86_amd64\\cl.exe' failed with exit status 2 > first issue can be solved by reffering > https://stackoverflow.com/questions/43858836/python-installing-clarifai-vs14-0-link-exe-failed-with-exit-status-1158/44563421#44563421. > But second issue is still there. there was no any solution by googling also. > how to prevent that issue and install airflow on windows 10 X64. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2141) Cannot create airflow variables when there is a list of dictionary as a value
[ https://issues.apache.org/jira/browse/AIRFLOW-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658904#comment-16658904 ] jack commented on AIRFLOW-2141: --- This is related to a similar ticket I opened: https://issues.apache.org/jira/browse/AIRFLOW-3157 > Cannot create airflow variables when there is a list of dictionary as a value > - > > Key: AIRFLOW-2141 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2141 > Project: Apache Airflow > Issue Type: Bug > Components: aws >Affects Versions: 1.8.0 >Reporter: Soundar >Priority: Major > Labels: beginner, newbie > Attachments: airflow_cli.png, airflow_cli2_crop.png > > > I'm trying to create Airflow variables using a json file. I am trying to > import airflow variables using UI(webserver) when I upload the json file I > get this error "Missing file or syntax error" and when I try to upload using > airflow cli not all the variables gets uploaded properly. The catch is that I > have a list of dictionary in my json file, say > ex: > { > "demo_archivedir": "/home/ubuntu/folders/archive", > "demo_filepattern": [ > { "id": "reference", "pattern": "Sample Data.xlsx" } > , > { "id": "sale", "pattern": "Sales.xlsx" } > ], > "demo_sourcepath": "/home/ubuntu/folders/input", > "demo_workdir": "/home/ubuntu/folders/working" > } > I've attached two images > img1. Using airflow variables cli command I was able to create partial > variables from my json file(airflow_cli.png)img2. After inserting logs in the > "airflow/bin/cli.py" file, I got this error. (airflow_cli2_crop.png) > The thing is I gave this value through the Admin UI one by one and it worked. > Then I exported those same variable using "airflow variables" cli command and > tried importing them, still it failed and the above mentioned error still > occurs. > Note: > I am using Python 3.5 with Airflow version 1.8 > The stack trace is as follows > .compute-1.amazonaws.com:22] out: 0 of 4 variables successfully updated. > .compute-1.amazonaws.com:22] out: Traceback (most recent call last): > .compute-1.amazonaws.com:22] out: File "/home/ubuntu/Env/bin/airflow", line > 28, in > .compute-1.amazonaws.com:22] out: args.func(args) > .compute-1.amazonaws.com:22] out: File > "/home/ubuntu/Env/lib/python3.5/site-packages/airflow/bin/cli.py", line 242, > in variables > .compute-1.amazonaws.com:22] out: import_helper(imp) > .compute-1.amazonaws.com:22] out: File > "/home/ubuntu/Env/lib/python3.5/site-packages/airflow/bin/cli.py", line 273, > in import_helper > .compute-1.amazonaws.com:22] out: Variable.set(k, v) > .compute-1.amazonaws.com:22] out: File > "/home/ubuntu/Env/lib/python3.5/site-packages/airflow/utils/db.py", line 53, > in wrapper > .compute-1.amazonaws.com:22] out: result = func(*args, **kwargs) > .compute-1.amazonaws.com:22] out: File > "/home/ubuntu/Env/lib/python3.5/site-packages/airflow/models.py", line 3615, > in set > .compute-1.amazonaws.com:22] out: session.add(Variable(key=key, > val=stored_value)) > .compute-1.amazonaws.com:22] out: File "", line 4, in __init__ > .compute-1.amazonaws.com:22] out: File > "/home/ubuntu/Env/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line > 417, in _initialize_instance > .compute-1.amazonaws.com:22] out: manager.dispatch.init_failure(self, > args, kwargs) > .compute-1.amazonaws.com:22] out: File > "/home/ubuntu/Env/lib/python3.5/site-packages/sqlalchemy/util/langhelpers.py", > line 66, in __exit__ > .compute-1.amazonaws.com:22] out: compat.reraise(exc_type, exc_value, > exc_tb) > .compute-1.amazonaws.com:22] out: File > "/home/ubuntu/Env/lib/python3.5/site-packages/sqlalchemy/util/compat.py", > line 187, in reraise > .compute-1.amazonaws.com:22] out: raise value > .compute-1.amazonaws.com:22] out: File > "/home/ubuntu/Env/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line > 414, in _initialize_instance > .compute-1.amazonaws.com:22] out: return > manager.original_init(*mixed[1:], **kwargs) > .compute-1.amazonaws.com:22] out: File > "/home/ubuntu/Env/lib/python3.5/site-packages/sqlalchemy/ext/declarative/base.py", > line 700, in _declarative_constructor > .compute-1.amazonaws.com:22] out: setattr(self, k, kwargs[k]) > compute-1.amazonaws.com:22] out: File "", line 1, in __set__ > .compute-1.amazonaws.com:22] out: File > "/home/ubuntu/Env/lib/python3.5/site-packages/airflow/models.py", line 3550, > in set_val > .compute-1.amazonaws.com:22] out: self._val = FERNET.encrypt(bytes(value, > 'utf-8')).decode() > .compute-1.amazonaws.com:22] out: TypeError: encoding without a string > argument > .compute-1.amazonaws.com:22] out: -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2323) Should we replace the librabbitmq with other library in setup.py for Apache Airflow 1.9+?
[ https://issues.apache.org/jira/browse/AIRFLOW-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658901#comment-16658901 ] jack commented on AIRFLOW-2323: --- It doesn't seems like the librabbitmq lib is going to fix the problems. It's barely maintained. > Should we replace the librabbitmq with other library in setup.py for Apache > Airflow 1.9+? > - > > Key: AIRFLOW-2323 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2323 > Project: Apache Airflow > Issue Type: Bug >Reporter: A.Quasimodo >Priority: Major > > As we know, latest librabbitmq is still can't support Python3,so, when I > executed the command *pip install apache-airflow[rabbitmq]*, some errors > happened. > So, should we replace the librabbitmq with other libraries like > amqplib,py-amqp,.etc? > Thank you -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3239) Test discovery partial fails due to incorrect name of the test files
[ https://issues.apache.org/jira/browse/AIRFLOW-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658889#comment-16658889 ] Xiaodong DENG commented on AIRFLOW-3239: Hi [~ashb], thanks for checking on this as well. For - tests/operators/bash_operator.py - tests/operators/operator.py I'm aware of them, but got CI failures when I tried to simply rename them (prepend with "test_"), so haven't fixed them yet in my two earlier PRs. May continue on them later. If you have got the solution to fix them, kindly proceed. Cheers > Test discovery partial fails due to incorrect name of the test files > > > Key: AIRFLOW-3239 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3239 > Project: Apache Airflow > Issue Type: Bug > Components: tests >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Major > > In PR [https://github.com/apache/incubator-airflow/pull/4049,] I have fixed > the incorrect name of some test files (resulting in partial failure in test > discovery). > There are some other scripts with this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (AIRFLOW-3239) Test discovery partial fails due to incorrect name of the test files
[ https://issues.apache.org/jira/browse/AIRFLOW-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor reopened AIRFLOW-3239: > Test discovery partial fails due to incorrect name of the test files > > > Key: AIRFLOW-3239 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3239 > Project: Apache Airflow > Issue Type: Bug > Components: tests >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Major > > In PR [https://github.com/apache/incubator-airflow/pull/4049,] I have fixed > the incorrect name of some test files (resulting in partial failure in test > discovery). > There are some other scripts with this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3239) Test discovery partial fails due to incorrect name of the test files
[ https://issues.apache.org/jira/browse/AIRFLOW-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658880#comment-16658880 ] Ash Berlin-Taylor commented on AIRFLOW-3239: A couple more files: tests/api/common/experimental/mark_tasks.py tests/api/common/experimental/trigger_dag_tests.py tests/impersonation.py tests/jobs.py tests/models.py tests/plugins_manager.py tests/utils.py tests/operators/bash_operator.py tests/operators/operator.py I think the ones in models.py are being loaded from tests/\_\_init\_\_.py so are being run. But we should remove the need for imports in tests/\_\_init\_\_.py et al and name the rest of the files properly > Test discovery partial fails due to incorrect name of the test files > > > Key: AIRFLOW-3239 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3239 > Project: Apache Airflow > Issue Type: Bug > Components: tests >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Major > > In PR [https://github.com/apache/incubator-airflow/pull/4049,] I have fixed > the incorrect name of some test files (resulting in partial failure in test > discovery). > There are some other scripts with this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2925) gcp dataflow hook doesn't show traceback
[ https://issues.apache.org/jira/browse/AIRFLOW-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658863#comment-16658863 ] jack commented on AIRFLOW-2925: --- [~xnuinside] where does the log shows the exception message? "DataFlow failed with return code..." > gcp dataflow hook doesn't show traceback > > > Key: AIRFLOW-2925 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2925 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: jack >Priority: Major > Labels: easyfix > > The gcp_dataflow_hook.py has: > > {code:java} > if self._proc.returncode is not 0: > raise Exception("DataFlow failed with return code > {}".format(self._proc.returncode)) > {code} > > This does not show the full trace of the error which makes it harder to > understand the problem. > [https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataflow_hook.py#L171] > > > reported on gitter by Oscar Carlsson -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-461) BigQuery: Support autodetection of schemas
[ https://issues.apache.org/jira/browse/AIRFLOW-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658861#comment-16658861 ] ASF GitHub Bot commented on AIRFLOW-461: kaxil closed pull request #4077: [AIRFLOW-461] Restore parameter position for BQ run_load method URL: https://github.com/apache/incubator-airflow/pull/4077 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > BigQuery: Support autodetection of schemas > -- > > Key: AIRFLOW-461 > URL: https://issues.apache.org/jira/browse/AIRFLOW-461 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, gcp >Reporter: Jeremiah Lowin >Assignee: Iuliia Volkova >Priority: Major > Fix For: 2.0.0 > > > Add support for autodetecting schemas. Autodetect behavior is described in > the documentation for federated data sources here: > https://cloud.google.com/bigquery/federated-data-sources#auto-detect but is > actually available when loading any CSV or JSON data (not just for federated > tables). See the API: > https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.autodetect -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-3240) Airflow dags are not working (not starting t1)
[ https://issues.apache.org/jira/browse/AIRFLOW-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-3240: -- Assignee: (was: Ivan Vitoria) > Airflow dags are not working (not starting t1) > -- > > Key: AIRFLOW-3240 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3240 > Project: Apache Airflow > Issue Type: Task > Components: DAG, DagRun >Affects Versions: 1.8.0 >Reporter: Pandu >Priority: Critical > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil closed pull request #4077: [AIRFLOW-461] Restore parameter position for BQ run_load method
kaxil closed pull request #4077: [AIRFLOW-461] Restore parameter position for BQ run_load method URL: https://github.com/apache/incubator-airflow/pull/4077 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ashb commented on a change in pull request #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#discussion_r226942255 ## File path: airflow/models.py ## @@ -3055,6 +3055,37 @@ def latest_execution_date(self): session.close() return execution_date +@property +def next_run_date(self): +""" +Returns the next run date for which the dag will be scheduled +""" +next_run_date = None +if not self.latest_execution_date: +# First run +task_start_dates = [t.start_date for t in self.tasks] +if task_start_dates: +next_run_date = self.normalize_schedule(min(task_start_dates)) +else: +next_run_date = self.following_schedule(self.latest_execution_date) +return next_run_date + +@property +def next_execution_date(self): +""" +Returns the next execution date at which the dag will be scheduled by Review comment: It's not clear to me how these two methods differ. What's the intent behind these two methods? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2722) ECSOperator requires network configuration parameter when FARGATE launch type is used
[ https://issues.apache.org/jira/browse/AIRFLOW-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658857#comment-16658857 ] jack commented on AIRFLOW-2722: --- [~ThomasVdBerge] this refers to your PR > ECSOperator requires network configuration parameter when FARGATE launch type > is used > - > > Key: AIRFLOW-2722 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2722 > Project: Apache Airflow > Issue Type: Bug > Components: aws >Affects Versions: 2.0.0 >Reporter: Craig Forster >Priority: Major > > The 'FARGATE' launch type was added in AIRFLOW-2435, however when using that > launch mode the following error is returned: > {noformat} > Network Configuration must be provided when networkMode 'awsvpc' is specified. > {noformat} > Fargate-launched tasks use the "awsvpc" networking type, and as per the > [boto3 > documentation|http://boto3.readthedocs.io/en/latest/reference/services/ecs.html#ECS.Client.run_task] > for run_task: > {quote}This parameter is required for task definitions that use the awsvpc > network mode to receive their own Elastic Network Interface, and it is not > supported for other network modes. > {quote} > As it's currently implemented, the Fargate launch type is unusable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] felipegasparini commented on issue #4036: [AIRFLOW-2744] Allow RBAC to accept plugins for views and links.
felipegasparini commented on issue #4036: [AIRFLOW-2744] Allow RBAC to accept plugins for views and links. URL: https://github.com/apache/incubator-airflow/pull/4036#issuecomment-431781833 hey guys, is there any ETA for this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil opened a new pull request #4077: [AIRFLOW-461] Restore parameter position for BQ run_load method
kaxil opened a new pull request #4077: [AIRFLOW-461] Restore parameter position for BQ run_load method URL: https://github.com/apache/incubator-airflow/pull/4077 This is just a follow-up PR to https://github.com/apache/incubator-airflow/pull/3880 as I didn't have write permission to forked repo. This PR just restores parameter position for `schema_fields` param Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-461 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-461) BigQuery: Support autodetection of schemas
[ https://issues.apache.org/jira/browse/AIRFLOW-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658845#comment-16658845 ] ASF GitHub Bot commented on AIRFLOW-461: kaxil opened a new pull request #4077: [AIRFLOW-461] Restore parameter position for BQ run_load method URL: https://github.com/apache/incubator-airflow/pull/4077 This is just a follow-up PR to https://github.com/apache/incubator-airflow/pull/3880 as I didn't have write permission to forked repo. This PR just restores parameter position for `schema_fields` param Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-461 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > BigQuery: Support autodetection of schemas > -- > > Key: AIRFLOW-461 > URL: https://issues.apache.org/jira/browse/AIRFLOW-461 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, gcp >Reporter: Jeremiah Lowin >Assignee: Iuliia Volkova >Priority: Major > Fix For: 2.0.0 > > > Add support for autodetecting schemas. Autodetect behavior is described in > the documentation for federated data sources here: > https://cloud.google.com/bigquery/federated-data-sources#auto-detect but is > actually available when loading any CSV or JSON data (not just for federated > tables). See the API: > https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.autodetect -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] ashb commented on issue #4073: [AIRFLOW-3238] Fix models.DAG.deactivate_unknown_dags
ashb commented on issue #4073: [AIRFLOW-3238] Fix models.DAG.deactivate_unknown_dags URL: https://github.com/apache/incubator-airflow/pull/4073#issuecomment-431780472 Does this also apply to `upgradedb`? I counsel people against running initdb in production (because it creates all the sample connections which is often not what people want. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #3880: [AIRFLOW-461] Support autodetected schemas in BigQuery run_load
kaxil commented on issue #3880: [AIRFLOW-461] Support autodetected schemas in BigQuery run_load URL: https://github.com/apache/incubator-airflow/pull/3880#issuecomment-431779597 @xnuinside I think there are many people facing the issue so I think we should get this in. Thanks @xnuinside :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil closed pull request #3880: [AIRFLOW-461] Support autodetected schemas in BigQuery run_load
kaxil closed pull request #3880: [AIRFLOW-461] Support autodetected schemas in BigQuery run_load URL: https://github.com/apache/incubator-airflow/pull/3880 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/bigquery_hook.py b/airflow/contrib/hooks/bigquery_hook.py index 7a1631b53a..a4d91769c6 100644 --- a/airflow/contrib/hooks/bigquery_hook.py +++ b/airflow/contrib/hooks/bigquery_hook.py @@ -851,8 +851,8 @@ def run_copy(self, def run_load(self, destination_project_dataset_table, - schema_fields, source_uris, + schema_fields=None, source_format='CSV', create_disposition='CREATE_IF_NEEDED', skip_leading_rows=0, @@ -866,7 +866,8 @@ def run_load(self, schema_update_options=(), src_fmt_configs=None, time_partitioning=None, - cluster_fields=None): + cluster_fields=None, + autodetect=False): """ Executes a BigQuery load command to load data from Google Cloud Storage to BigQuery. See here: @@ -884,7 +885,11 @@ def run_load(self, :type destination_project_dataset_table: str :param schema_fields: The schema field list as defined here: https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load +Required if autodetect=False; optional if autodetect=True. :type schema_fields: list +:param autodetect: Attempt to autodetect the schema for CSV and JSON +source files. +:type autodetect: bool :param source_uris: The source Google Cloud Storage URI (e.g. gs://some-bucket/some-file.txt). A single wild per-object name can be used. @@ -941,6 +946,11 @@ def run_load(self, # if it's not, we raise a ValueError # Refer to this link for more details: # https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.tableDefinitions.(key).sourceFormat + +if schema_fields is None and not autodetect: +raise ValueError( +'You must either pass a schema or autodetect=True.') + if src_fmt_configs is None: src_fmt_configs = {} @@ -975,6 +985,7 @@ def run_load(self, configuration = { 'load': { +'autodetect': autodetect, 'createDisposition': create_disposition, 'destinationTable': { 'projectId': destination_project, @@ -1592,7 +1603,7 @@ def _split_tablename(table_input, default_project_id, var_name=None): if '.' not in table_input: raise ValueError( -'Expected deletion_dataset_table name in the format of ' +'Expected target table name in the format of ' '.. Got: {}'.format(table_input)) if not default_project_id: diff --git a/airflow/contrib/operators/bigquery_operator.py b/airflow/contrib/operators/bigquery_operator.py index fec877db05..caed3befed 100644 --- a/airflow/contrib/operators/bigquery_operator.py +++ b/airflow/contrib/operators/bigquery_operator.py @@ -308,7 +308,7 @@ def __init__(self, project_id=None, schema_fields=None, gcs_schema_object=None, - time_partitioning={}, + time_partitioning=None, bigquery_conn_id='bigquery_default', google_cloud_storage_conn_id='google_cloud_default', delegate_to=None, @@ -325,7 +325,7 @@ def __init__(self, self.bigquery_conn_id = bigquery_conn_id self.google_cloud_storage_conn_id = google_cloud_storage_conn_id self.delegate_to = delegate_to -self.time_partitioning = time_partitioning +self.time_partitioning = {} if time_partitioning is None else time_partitioning self.labels = labels def execute(self, context): diff --git a/airflow/contrib/operators/gcs_to_bq.py b/airflow/contrib/operators/gcs_to_bq.py index 39dff21606..a98e15a8d6 100644 --- a/airflow/contrib/operators/gcs_to_bq.py +++ b/airflow/contrib/operators/gcs_to_bq.py @@ -152,6 +152,7 @@ def __init__(self, external_table=False, time_partitioning=None, cluster_fields=None, + autodetect=False, *args, **kwargs): super(GoogleCloudStorageToBigQueryOperator, self).__init__(*args, **kwargs) @@ -190,20 +191,24 @@ def __init__(self, self.src_fmt_configs = src_fmt_configs self.time_partitioning = time_partitioning
[jira] [Commented] (AIRFLOW-461) BigQuery: Support autodetection of schemas
[ https://issues.apache.org/jira/browse/AIRFLOW-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658823#comment-16658823 ] ASF GitHub Bot commented on AIRFLOW-461: kaxil closed pull request #3880: [AIRFLOW-461] Support autodetected schemas in BigQuery run_load URL: https://github.com/apache/incubator-airflow/pull/3880 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/bigquery_hook.py b/airflow/contrib/hooks/bigquery_hook.py index 7a1631b53a..a4d91769c6 100644 --- a/airflow/contrib/hooks/bigquery_hook.py +++ b/airflow/contrib/hooks/bigquery_hook.py @@ -851,8 +851,8 @@ def run_copy(self, def run_load(self, destination_project_dataset_table, - schema_fields, source_uris, + schema_fields=None, source_format='CSV', create_disposition='CREATE_IF_NEEDED', skip_leading_rows=0, @@ -866,7 +866,8 @@ def run_load(self, schema_update_options=(), src_fmt_configs=None, time_partitioning=None, - cluster_fields=None): + cluster_fields=None, + autodetect=False): """ Executes a BigQuery load command to load data from Google Cloud Storage to BigQuery. See here: @@ -884,7 +885,11 @@ def run_load(self, :type destination_project_dataset_table: str :param schema_fields: The schema field list as defined here: https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load +Required if autodetect=False; optional if autodetect=True. :type schema_fields: list +:param autodetect: Attempt to autodetect the schema for CSV and JSON +source files. +:type autodetect: bool :param source_uris: The source Google Cloud Storage URI (e.g. gs://some-bucket/some-file.txt). A single wild per-object name can be used. @@ -941,6 +946,11 @@ def run_load(self, # if it's not, we raise a ValueError # Refer to this link for more details: # https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.tableDefinitions.(key).sourceFormat + +if schema_fields is None and not autodetect: +raise ValueError( +'You must either pass a schema or autodetect=True.') + if src_fmt_configs is None: src_fmt_configs = {} @@ -975,6 +985,7 @@ def run_load(self, configuration = { 'load': { +'autodetect': autodetect, 'createDisposition': create_disposition, 'destinationTable': { 'projectId': destination_project, @@ -1592,7 +1603,7 @@ def _split_tablename(table_input, default_project_id, var_name=None): if '.' not in table_input: raise ValueError( -'Expected deletion_dataset_table name in the format of ' +'Expected target table name in the format of ' '.. Got: {}'.format(table_input)) if not default_project_id: diff --git a/airflow/contrib/operators/bigquery_operator.py b/airflow/contrib/operators/bigquery_operator.py index fec877db05..caed3befed 100644 --- a/airflow/contrib/operators/bigquery_operator.py +++ b/airflow/contrib/operators/bigquery_operator.py @@ -308,7 +308,7 @@ def __init__(self, project_id=None, schema_fields=None, gcs_schema_object=None, - time_partitioning={}, + time_partitioning=None, bigquery_conn_id='bigquery_default', google_cloud_storage_conn_id='google_cloud_default', delegate_to=None, @@ -325,7 +325,7 @@ def __init__(self, self.bigquery_conn_id = bigquery_conn_id self.google_cloud_storage_conn_id = google_cloud_storage_conn_id self.delegate_to = delegate_to -self.time_partitioning = time_partitioning +self.time_partitioning = {} if time_partitioning is None else time_partitioning self.labels = labels def execute(self, context): diff --git a/airflow/contrib/operators/gcs_to_bq.py b/airflow/contrib/operators/gcs_to_bq.py index 39dff21606..a98e15a8d6 100644 --- a/airflow/contrib/operators/gcs_to_bq.py +++ b/airflow/contrib/operators/gcs_to_bq.py @@ -152,6 +152,7 @@ def __init__(self, external_table=False, time_partitioning=None, cluster_fields=None, + autodetect=False, *args, **kwargs):
[GitHub] ron819 commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ron819 commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431778262 There is a CLI command merged to master by @XD-DENG that shows the next execution date https://github.com/apache/incubator-airflow/pull/3834 Maybe this PR can use the logic already merged to master to populate the values to the UI. Also duplicate Jira ticket for this: https://issues.apache.org/jira/browse/AIRFLOW-2618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431765946 There is a CLI command merged to master by @XD-DENG that shows the next execution date https://github.com/apache/incubator-airflow/pull/3834 Maybe this PR can use the logic already merged to master to populate the values to the UI. Also duplicate Jira ticket for this: https://issues.apache.org/jira/browse/AIRFLOW-2618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] xnuinside opened a new pull request #3880: [AIRFLOW-461] Support autodetected schemas in BigQuery run_load
xnuinside opened a new pull request #3880: [AIRFLOW-461] Support autodetected schemas in BigQuery run_load URL: https://github.com/apache/incubator-airflow/pull/3880 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-461 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: added autodetect to run_load in BigQuery hook and gcs_to_bq Operator ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3025) Allow to specify dns and dns-search parameters for DockerOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor updated AIRFLOW-3025: --- Fix Version/s: 1.10.1 > Allow to specify dns and dns-search parameters for DockerOperator > - > > Key: AIRFLOW-3025 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3025 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Konrad Gołuchowski >Assignee: Konrad Gołuchowski >Priority: Minor > Fix For: 2.0.0, 1.10.1 > > > Docker allows to specify dns and dns-search options when starting a > container. It would be useful to enable DockerOperator to use these two > options. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-3240) Airflow dags are not working (not starting t1)
[ https://issues.apache.org/jira/browse/AIRFLOW-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-3240: -- Assignee: Ivan Vitoria > Airflow dags are not working (not starting t1) > -- > > Key: AIRFLOW-3240 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3240 > Project: Apache Airflow > Issue Type: Task > Components: DAG, DagRun >Affects Versions: 1.8.0 >Reporter: Pandu >Assignee: Ivan Vitoria >Priority: Critical > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2574) initdb fails when mysql password contains percent sign
[ https://issues.apache.org/jira/browse/AIRFLOW-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-2574. Resolution: Fixed > initdb fails when mysql password contains percent sign > -- > > Key: AIRFLOW-2574 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2574 > Project: Apache Airflow > Issue Type: Bug > Components: db >Affects Versions: 1.9.0, 1.10.0 >Reporter: Zihao Zhang >Priority: Minor > Fix For: 1.10.1 > > > [db.py|https://github.com/apache/incubator-airflow/blob/3358551c8e73d9019900f7a85f18ebfd88591450/airflow/utils/db.py#L345] > uses > [config.set_main_option|http://alembic.zzzcomputing.com/en/latest/api/config.html#alembic.config.Config.set_main_option] > which says "A raw percent sign not part of an interpolation symbol must > therefore be escaped" > When there is a percent sign in database connection string, this will crash > due to bad interpolation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2574) initdb fails when mysql password contains percent sign
[ https://issues.apache.org/jira/browse/AIRFLOW-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658796#comment-16658796 ] jack commented on AIRFLOW-2574: --- This was fixed and merged. Can be closed? > initdb fails when mysql password contains percent sign > -- > > Key: AIRFLOW-2574 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2574 > Project: Apache Airflow > Issue Type: Bug > Components: db >Affects Versions: 1.9.0, 1.10.0 >Reporter: Zihao Zhang >Priority: Minor > Fix For: 1.10.1 > > > [db.py|https://github.com/apache/incubator-airflow/blob/3358551c8e73d9019900f7a85f18ebfd88591450/airflow/utils/db.py#L345] > uses > [config.set_main_option|http://alembic.zzzcomputing.com/en/latest/api/config.html#alembic.config.Config.set_main_option] > which says "A raw percent sign not part of an interpolation symbol must > therefore be escaped" > When there is a percent sign in database connection string, this will crash > due to bad interpolation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (AIRFLOW-2574) initdb fails when mysql password contains percent sign
[ https://issues.apache.org/jira/browse/AIRFLOW-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jack updated AIRFLOW-2574: -- Comment: was deleted (was: This was fixed and merged. Can be closed?) > initdb fails when mysql password contains percent sign > -- > > Key: AIRFLOW-2574 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2574 > Project: Apache Airflow > Issue Type: Bug > Components: db >Affects Versions: 1.9.0, 1.10.0 >Reporter: Zihao Zhang >Priority: Minor > Fix For: 1.10.1 > > > [db.py|https://github.com/apache/incubator-airflow/blob/3358551c8e73d9019900f7a85f18ebfd88591450/airflow/utils/db.py#L345] > uses > [config.set_main_option|http://alembic.zzzcomputing.com/en/latest/api/config.html#alembic.config.Config.set_main_option] > which says "A raw percent sign not part of an interpolation symbol must > therefore be escaped" > When there is a percent sign in database connection string, this will crash > due to bad interpolation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2421) HTTPHook and SimpleHTTPOperator do not verify certificates by default
[ https://issues.apache.org/jira/browse/AIRFLOW-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor updated AIRFLOW-2421: --- Fix Version/s: 1.10.1 Component/s: security > HTTPHook and SimpleHTTPOperator do not verify certificates by default > - > > Key: AIRFLOW-2421 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2421 > Project: Apache Airflow > Issue Type: Bug > Components: hooks, security >Affects Versions: 1.8.0 >Reporter: David Adrian >Priority: Major > Fix For: 1.10.1 > > > To verify HTTPS certificates when using anything built with an HTTP hook, you > have to explicitly pass the undocumented {{extra_options = \{"verify": True} > }}. The offending line is at > https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/http_hook.py#L103. > {code} > response = session.send( > > verify=extra_options.get("verify", False), > > ) > {code} > Not only is this the opposite default of what is expected, the necessary > requirements to verify certificates (e.g certifi), are already installed as > part of Airflow. I haven't dug through all of the code yet, but I'm concerned > that any other connections, operators or hooks built using HTTP hook don't > pass this option in. > Instead, the HTTP hook should default to {{verify=True}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2421) HTTPHook and SimpleHTTPOperator do not verify certificates by default
[ https://issues.apache.org/jira/browse/AIRFLOW-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658789#comment-16658789 ] Ash Berlin-Taylor commented on AIRFLOW-2421: I think we should change the default to verify true - not verifying is the wrong default value. Additionally I think the "default" value for the extra options should come from the connection extra field, and merge in any extra settings from the per-function dict. > HTTPHook and SimpleHTTPOperator do not verify certificates by default > - > > Key: AIRFLOW-2421 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2421 > Project: Apache Airflow > Issue Type: Bug > Components: hooks, security >Affects Versions: 1.8.0 >Reporter: David Adrian >Priority: Major > Fix For: 1.10.1 > > > To verify HTTPS certificates when using anything built with an HTTP hook, you > have to explicitly pass the undocumented {{extra_options = \{"verify": True} > }}. The offending line is at > https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/http_hook.py#L103. > {code} > response = session.send( > > verify=extra_options.get("verify", False), > > ) > {code} > Not only is this the opposite default of what is expected, the necessary > requirements to verify certificates (e.g certifi), are already installed as > part of Airflow. I haven't dug through all of the code yet, but I'm concerned > that any other connections, operators or hooks built using HTTP hook don't > pass this option in. > Instead, the HTTP hook should default to {{verify=True}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431764042 There is a CLI command merged to master by @XD-DENG that shows the next execution date https://github.com/apache/incubator-airflow/pull/3834 Maybe this PR can use the logic already merged to master to populate the values to the UI. Also duplicate Jira ticket for this: https://issues.apache.org/jira/browse/AIRFLOW-2618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431762443 There is a CLI command merged to master by @XD-DENG that shows the next execution date https://github.com/apache/incubator-airflow/pull/3834 Maybe this PR only needs to be modified to show the result of this CLI command on the UI. Also duplicate Jira ticket for this: https://issues.apache.org/jira/browse/AIRFLOW-2618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431766046 There is a CLI command merged to master by @XD-DENG that shows the next execution date https://github.com/apache/incubator-airflow/pull/3834 Maybe this PR can use the logic already merged to master to populate the values to the UI. Also duplicate Jira ticket for this: https://issues.apache.org/jira/browse/AIRFLOW-2618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ron819 removed a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431762576 There is a CLI command merged to master by @XD-DENG that shows the next execution date https://github.com/apache/incubator-airflow/pull/3834 Maybe this PR can use the logic already merged to master. Also duplicate Jira ticket for this: https://issues.apache.org/jira/browse/AIRFLOW-2618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ron819 commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431765827 There is a CLI command merged to master by @XD-DENG that shows the next execution date https://github.com/apache/incubator-airflow/pull/3834 Maybe this PR can use the logic already merged to master to populate the values to the UI. Also duplicate Jira ticket for this: https://issues.apache.org/jira/browse/AIRFLOW-2618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ron819 commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431764312 There is a CLI command merged to master by @XD-DENG that shows the next execution date https://github.com/apache/incubator-airflow/pull/3834 Maybe this PR can use the logic already merged to master to populate the values to the UI. Also duplicate Jira ticket for this: https://issues.apache.org/jira/browse/AIRFLOW-2618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (AIRFLOW-2618) Improve UI by add "Next Run" column
[ https://issues.apache.org/jira/browse/AIRFLOW-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor closed AIRFLOW-2618. -- Resolution: Duplicate > Improve UI by add "Next Run" column > --- > > Key: AIRFLOW-2618 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2618 > Project: Apache Airflow > Issue Type: Improvement > Components: ui >Reporter: jack >Priority: Minor > > Please add also a column in the UI for "Next Run". Ideally when passing mouse > over it we will also see the 5 next scheduled runs. > This can be very helpful. > If for some reason you think this is an "overhead" why not adding it and > allow a "personalized UI" feature where the user can set if this column will > appear or not. This can be a very good feature in allowing users > personalizing their own UI columns. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] ron819 commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible
ron819 commented on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-431766806 There is a CLI command merged to master by @XD-DENG that shows the next execution date https://github.com/apache/incubator-airflow/pull/3834 Maybe this PR can use the logic already merged to master to populate the values to the UI. Also duplicate Jira ticket for this: https://issues.apache.org/jira/browse/AIRFLOW-2618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-63) Dangling Running Jobs
[ https://issues.apache.org/jira/browse/AIRFLOW-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658765#comment-16658765 ] Ash Berlin-Taylor commented on AIRFLOW-63: -- Possibly, though if the scheduler process is killed hard (oom, segfault etc) there still may be cases where the job remains running. So I think I'd say "not quite yet" and this is still possibly an issue (at least not fixed by my PR) > Dangling Running Jobs > - > > Key: AIRFLOW-63 > URL: https://issues.apache.org/jira/browse/AIRFLOW-63 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.7.0 > Environment: mac os X with local executor >Reporter: Giacomo Tagliabe >Priority: Minor > > It seems that if the scheduler is killed unexpectedly, the SchedulerJob > remains marked as running. Same thing applies to LocalTaskJob: if a job is > running when the scheduler dies, the job remains marked as running forever. > I'd expect `kill_zombies` to mark the job with an old heartbeat as not > running, but it seems it only marks the related task instances. This to me > seems like a bug, I also fail to see the piece of code that is supposed to > do that, which leads me to think that this is not handled at all. I don't > think there is anything really critical about having stale jobs marked as > running, but they definitely is confusing to see -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1523) Clicking on Graph View should display related DAG run
[ https://issues.apache.org/jira/browse/AIRFLOW-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658764#comment-16658764 ] jack commented on AIRFLOW-1523: --- This is actually quite annoying. I noticed this too. > Clicking on Graph View should display related DAG run > - > > Key: AIRFLOW-1523 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1523 > Project: Apache Airflow > Issue Type: Improvement > Components: webapp >Reporter: Gediminas Rapolavicius >Priority: Minor > Attachments: Screen Shot 2017-08-20 at 10.09.16 PM.png > > > When you are looking at the logs of a task instance (and you got there from > tree view, etc, see the screenshot), clicking on Graph View will take you to > the Graph View of the latest DAG run. > It's very hard to navigate from task instance logs to the related Graph View, > so you could see logs of other tasks in the same run, etc. > I am proposing to change the Graph View link, so that it would take to the > Graph View of the same run as the task instance, not the latest. > I could try to implement this, if the maintainers think that it would useful > and could be merged. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1471) DAGs not deleted from scheduler after DAG file is removed
[ https://issues.apache.org/jira/browse/AIRFLOW-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658762#comment-16658762 ] jack commented on AIRFLOW-1471: --- This is not a bug. Deleting DAG file manually doesn't delete it from the DB. in Airflow 1.10 a delete option was added to the UI so upgrading your Airflow version should give you the ability you are looking for. > DAGs not deleted from scheduler after DAG file is removed > - > > Key: AIRFLOW-1471 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1471 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler, ui, webserver >Affects Versions: 1.8.0 > Environment: Centos7, python 3.5.2 >Reporter: Daniel Ochoa >Priority: Minor > Attachments: airflow_bug.PNG > > > After I deleted a DAG (i.e load_examples = false or rename a dag in the dag > folder) DAGs do not show up after "ariflow list_dags" but they show up greyed > out and unclickable on airflow UI. I tried "airflow resetdb" and the problem > persists. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3238) Dags, removed from the filesystem, are not deactivated on initdb
[ https://issues.apache.org/jira/browse/AIRFLOW-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor updated AIRFLOW-3238: --- Fix Version/s: 1.10.1 > Dags, removed from the filesystem, are not deactivated on initdb > > > Key: AIRFLOW-3238 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3238 > Project: Apache Airflow > Issue Type: Bug > Components: DAG >Reporter: Jason Shao >Assignee: Jason Shao >Priority: Major > Fix For: 1.10.1 > > > Removed dags continue to show up in the airflow UI. This can only be > remedied, currently, by either deleting the dag or modifying the internal > meta db. Fix models.DAG.deactivate_unknown_dags so that removed dags are > automatically deactivated (hidden from the UI) on restart. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-461) BigQuery: Support autodetection of schemas
[ https://issues.apache.org/jira/browse/AIRFLOW-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658755#comment-16658755 ] Kaxil Naik commented on AIRFLOW-461: Resolved by https://github.com/apache/incubator-airflow/pull/3880 > BigQuery: Support autodetection of schemas > -- > > Key: AIRFLOW-461 > URL: https://issues.apache.org/jira/browse/AIRFLOW-461 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, gcp >Reporter: Jeremiah Lowin >Assignee: Iuliia Volkova >Priority: Major > Fix For: 2.0.0 > > > Add support for autodetecting schemas. Autodetect behavior is described in > the documentation for federated data sources here: > https://cloud.google.com/bigquery/federated-data-sources#auto-detect but is > actually available when loading any CSV or JSON data (not just for federated > tables). See the API: > https://cloud.google.com/bigquery/docs/reference/v2/jobs#configuration.load.autodetect -- This message was sent by Atlassian JIRA (v7.6.3#76005)