[GitHub] dlackty commented on a change in pull request #2306: [AIRFLOW-1214] Fix conversion of int type in BigQueryHook
dlackty commented on a change in pull request #2306: [AIRFLOW-1214] Fix conversion of int type in BigQueryHook URL: https://github.com/apache/incubator-airflow/pull/2306#discussion_r240502685 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -915,7 +915,7 @@ def _bq_cast(string_field, bq_type): if string_field is None: return None elif bq_type == 'INTEGER' or bq_type == 'TIMESTAMP': -return int(string_field) +return int(float(string_field)) Review comment: @Fokko & @tswast for some reason I didn't see your comments. I'll open another PR for this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 commented on issue #2455: [AIRFLOW-1423] Add logs to the scheduler DAG run decision logic
ron819 commented on issue #2455: [AIRFLOW-1423] Add logs to the scheduler DAG run decision logic URL: https://github.com/apache/incubator-airflow/pull/2455#issuecomment-446100420 @ultrabug was there something else needs to be done in this PR ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ron819 commented on issue #3687: [AIRFLOW-2805] Display multiple timezones in the tooltip on TaskInstances
ron819 commented on issue #3687: [AIRFLOW-2805] Display multiple timezones in the tooltip on TaskInstances URL: https://github.com/apache/incubator-airflow/pull/3687#issuecomment-446098744 Can this be cherry picked to 1.10.2? Also the Jira ticket is still open .. needs to be closed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] wyndhblb commented on a change in pull request #4163: [AIRFLOW-3319] - KubernetsExecutor: Need in try_number in labels if getting them later
wyndhblb commented on a change in pull request #4163: [AIRFLOW-3319] - KubernetsExecutor: Need in try_number in labels if getting them later URL: https://github.com/apache/incubator-airflow/pull/4163#discussion_r240486765 ## File path: airflow/contrib/executors/kubernetes_executor.py ## @@ -458,10 +459,16 @@ def _datetime_to_label_safe_datestring(datetime_obj): def _labels_to_key(self, labels): try: +try_num = 1 +try: +try_num = int(labels.get('try_number', '1')) +except ValueError: +self.log.warn("could not get try_number as an int: %s", labels.get('try_number', '1')) return ( labels['dag_id'], labels['task_id'], self._label_safe_datestring_to_datetime(labels['execution_date']), -labels['try_number']) +try_num Review comment: Yes, older pods in a live system will not have this try number set, and needs to be defaulted to something reasonable. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] wyndhblb commented on a change in pull request #4163: [AIRFLOW-3319] - KubernetsExecutor: Need in try_number in labels if getting them later
wyndhblb commented on a change in pull request #4163: [AIRFLOW-3319] - KubernetsExecutor: Need in try_number in labels if getting them later URL: https://github.com/apache/incubator-airflow/pull/4163#discussion_r240485547 ## File path: airflow/contrib/executors/kubernetes_executor.py ## @@ -346,6 +346,7 @@ def run_next(self, next_job): namespace=self.namespace, worker_uuid=self.worker_uuid, Review comment: yes, the only call to this worker_config object function This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2701) Clean up dangling backfill dagrun
[ https://issues.apache.org/jira/browse/AIRFLOW-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716084#comment-16716084 ] ASF GitHub Bot commented on AIRFLOW-2701: - feng-tao closed pull request #3562: [WIP][AIRFLOW-2701] Clean up backfill dangling dagrun URL: https://github.com/apache/incubator-airflow/pull/3562 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/jobs.py b/airflow/jobs.py index 00ede5451d..8f19a67231 100644 --- a/airflow/jobs.py +++ b/airflow/jobs.py @@ -2008,6 +2008,7 @@ def __init__( self.verbose = verbose self.conf = conf self.rerun_failed_tasks = rerun_failed_tasks +self.dag_runs = [] super(BackfillJob, self).__init__(*args, **kwargs) def _update_counters(self, ti_status): @@ -2469,6 +2470,7 @@ def _execute_for_run_dates(self, run_dates, ti_status, executor, pickle_id, session=session) if dag_run is None: continue +self.dag_runs.append(dag_run) ti_status.active_runs.append(dag_run) ti_status.to_run.update(tis_map or {}) @@ -2540,9 +2542,32 @@ def _execute(self, session=None): self.dag_id ) time.sleep(self.delay_on_limit_secs) + +except (KeyboardInterrupt, SystemExit): +for dag_run in self.dag_runs: +dag_run.refresh_from_db(session) +make_transient(dag_run) + +# Check all the tasks which are not in success state. +check_state = State.unfinished() + State.finished() +check_state.remove(State.SUCCESS) + +unfinished_tasks = dag_run.get_task_instances( +state=check_state, +session=session +) +if unfinished_tasks: +# if there are unfinished tasks and ctrl^c +# set dag run state to failed +for task in unfinished_tasks: +task.set_state(State.FAILED, session) +dag_run.state = State.FAILED +else: +dag_run.state = State.SUCCESS +session.merge(dag_run) finally: -executor.end() session.commit() +executor.end() self.log.info("Backfill done. Exiting.") This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Clean up dangling backfill dagrun > - > > Key: AIRFLOW-2701 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2701 > Project: Apache Airflow > Issue Type: Bug >Reporter: Tao Feng >Assignee: Tao Feng >Priority: Major > > When user tries to backfill and hit ctrol+9, the backfill dagrun will stay as > running state. We should set it to failed if it has unfinished tasks. > > In our production, we see lots of these dangling backfill dagrun which will > cause as one active dagrun in the next backfill. This may prevent user from > backfilling if the max_active_run is reached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] feng-tao closed pull request #3562: [WIP][AIRFLOW-2701] Clean up backfill dangling dagrun
feng-tao closed pull request #3562: [WIP][AIRFLOW-2701] Clean up backfill dangling dagrun URL: https://github.com/apache/incubator-airflow/pull/3562 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/jobs.py b/airflow/jobs.py index 00ede5451d..8f19a67231 100644 --- a/airflow/jobs.py +++ b/airflow/jobs.py @@ -2008,6 +2008,7 @@ def __init__( self.verbose = verbose self.conf = conf self.rerun_failed_tasks = rerun_failed_tasks +self.dag_runs = [] super(BackfillJob, self).__init__(*args, **kwargs) def _update_counters(self, ti_status): @@ -2469,6 +2470,7 @@ def _execute_for_run_dates(self, run_dates, ti_status, executor, pickle_id, session=session) if dag_run is None: continue +self.dag_runs.append(dag_run) ti_status.active_runs.append(dag_run) ti_status.to_run.update(tis_map or {}) @@ -2540,9 +2542,32 @@ def _execute(self, session=None): self.dag_id ) time.sleep(self.delay_on_limit_secs) + +except (KeyboardInterrupt, SystemExit): +for dag_run in self.dag_runs: +dag_run.refresh_from_db(session) +make_transient(dag_run) + +# Check all the tasks which are not in success state. +check_state = State.unfinished() + State.finished() +check_state.remove(State.SUCCESS) + +unfinished_tasks = dag_run.get_task_instances( +state=check_state, +session=session +) +if unfinished_tasks: +# if there are unfinished tasks and ctrl^c +# set dag run state to failed +for task in unfinished_tasks: +task.set_state(State.FAILED, session) +dag_run.state = State.FAILED +else: +dag_run.state = State.SUCCESS +session.merge(dag_run) finally: -executor.end() session.commit() +executor.end() self.log.info("Backfill done. Exiting.") This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jongyoul commented on issue #4057: [AIRFLOW-3216] HiveServer2Hook need a password with LDAP authentication
jongyoul commented on issue #4057: [AIRFLOW-3216] HiveServer2Hook need a password with LDAP authentication URL: https://github.com/apache/incubator-airflow/pull/4057#issuecomment-446054890 @Fokko @bolkedebruin Sorry for the late reply. I've added a test case. BTW, the failed one doesn't look related to my changes. Could you please confirm it? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2015: [AIRFLOW-765] Auto detect dag dependency files, variables, and resour…
stale[bot] commented on issue #2015: [AIRFLOW-765] Auto detect dag dependency files, variables, and resour… URL: https://github.com/apache/incubator-airflow/pull/2015#issuecomment-446048852 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2026: [AIRFLOW-811] [BugFix] bash_operator does not return full output
stale[bot] commented on issue #2026: [AIRFLOW-811] [BugFix] bash_operator does not return full output URL: https://github.com/apache/incubator-airflow/pull/2026#issuecomment-446048844 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #1910: [AIRFLOW-659] Automatic Refresh on DAG Graph View
stale[bot] commented on issue #1910: [AIRFLOW-659] Automatic Refresh on DAG Graph View URL: https://github.com/apache/incubator-airflow/pull/1910#issuecomment-446048869 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #1922: [AIRFLOW-675] Add error log to UI
stale[bot] commented on issue #1922: [AIRFLOW-675] Add error log to UI URL: https://github.com/apache/incubator-airflow/pull/1922#issuecomment-446048862 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2014: [AIRFLOW-796] Config options for num_runs, processor_poll_interval, log_level
stale[bot] commented on issue #2014: [AIRFLOW-796] Config options for num_runs, processor_poll_interval, log_level URL: https://github.com/apache/incubator-airflow/pull/2014#issuecomment-446048848 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2052: [AIRFLOW-833] Add an `airflow default_config` sub-command
stale[bot] commented on issue #2052: [AIRFLOW-833] Add an `airflow default_config` sub-command URL: https://github.com/apache/incubator-airflow/pull/2052#issuecomment-446048841 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #1942: [AIRFLOW-697] Add exclusion of tasks.
stale[bot] commented on issue #1942: [AIRFLOW-697] Add exclusion of tasks. URL: https://github.com/apache/incubator-airflow/pull/1942#issuecomment-446048856 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #748: End-to-end DAG testing
stale[bot] commented on issue #748: End-to-end DAG testing URL: https://github.com/apache/incubator-airflow/pull/748#issuecomment-446048866 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2223: [AIRFLOW-1076] Add get method for template variable accessor
stale[bot] commented on issue #2223: [AIRFLOW-1076] Add get method for template variable accessor URL: https://github.com/apache/incubator-airflow/pull/2223#issuecomment-446019133 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2209: [AIRFLOW-766] Skip conn.commit() when in Auto-commit
stale[bot] commented on issue #2209: [AIRFLOW-766] Skip conn.commit() when in Auto-commit URL: https://github.com/apache/incubator-airflow/pull/2209#issuecomment-446019128 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2113: [AIRFLOW-920] Allow marking tasks in zoomed in subdags
stale[bot] commented on issue #2113: [AIRFLOW-920] Allow marking tasks in zoomed in subdags URL: https://github.com/apache/incubator-airflow/pull/2113#issuecomment-446019161 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2483: [AIRFLOW-1463] Clear state of queued task
stale[bot] commented on issue #2483: [AIRFLOW-1463] Clear state of queued task URL: https://github.com/apache/incubator-airflow/pull/2483#issuecomment-446019075 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2559: [AIRFLOW-1558] Py3 fix for S3FileTransformOperator
stale[bot] commented on issue #2559: [AIRFLOW-1558] Py3 fix for S3FileTransformOperator URL: https://github.com/apache/incubator-airflow/pull/2559#issuecomment-446019058 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2306: [AIRFLOW-1214] Fix conversion of int type in BigQueryHook
stale[bot] commented on issue #2306: [AIRFLOW-1214] Fix conversion of int type in BigQueryHook URL: https://github.com/apache/incubator-airflow/pull/2306#issuecomment-446019097 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2513: [AIRFLOW-XXX] Allow setting of keepalives_idle to enable PostgresHook to work with Amazon Redshift
stale[bot] commented on issue #2513: [AIRFLOW-XXX] Allow setting of keepalives_idle to enable PostgresHook to work with Amazon Redshift URL: https://github.com/apache/incubator-airflow/pull/2513#issuecomment-446019071 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2595: [AIRFLOW-1592] Add --keep-alive option for gunicorn to airflow config
stale[bot] commented on issue #2595: [AIRFLOW-1592] Add --keep-alive option for gunicorn to airflow config URL: https://github.com/apache/incubator-airflow/pull/2595#issuecomment-446019061 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2456: [AIRFLOW-1310] Basic operator to run docker container on Kubernetes
stale[bot] commented on issue #2456: [AIRFLOW-1310] Basic operator to run docker container on Kubernetes URL: https://github.com/apache/incubator-airflow/pull/2456#issuecomment-446019083 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2420: [AIRFLOW-1383] Add no_dag_reset option to airflow clear command
stale[bot] commented on issue #2420: [AIRFLOW-1383] Add no_dag_reset option to airflow clear command URL: https://github.com/apache/incubator-airflow/pull/2420#issuecomment-446019092 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2598: [AIRFLOW-1595] Change to construct sqlite_hook from connection schema
stale[bot] commented on issue #2598: [AIRFLOW-1595] Change to construct sqlite_hook from connection schema URL: https://github.com/apache/incubator-airflow/pull/2598#issuecomment-446019060 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2233: [AIRFLOW-1098] Fix issue in setting parent_dag when loading dags
stale[bot] commented on issue #2233: [AIRFLOW-1098] Fix issue in setting parent_dag when loading dags URL: https://github.com/apache/incubator-airflow/pull/2233#issuecomment-446019106 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2228: [AIRFLOW-351] Ensure python_callable is pickleable
stale[bot] commented on issue #2228: [AIRFLOW-351] Ensure python_callable is pickleable URL: https://github.com/apache/incubator-airflow/pull/2228#issuecomment-446019101 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2235: [AIRFLOW-1096] Add conn_ids to template_fields
stale[bot] commented on issue #2235: [AIRFLOW-1096] Add conn_ids to template_fields URL: https://github.com/apache/incubator-airflow/pull/2235#issuecomment-446019129 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2131: [AIRFLOW-946] call commands with virtualenv if available
stale[bot] commented on issue #2131: [AIRFLOW-946] call commands with virtualenv if available URL: https://github.com/apache/incubator-airflow/pull/2131#issuecomment-446019162 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2543: [AIRFLOW-351] fix bug with stop operator
stale[bot] commented on issue #2543: [AIRFLOW-351] fix bug with stop operator URL: https://github.com/apache/incubator-airflow/pull/2543#issuecomment-446019059 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2500: [AIRFLOW-1488] Add the DagRunSensor operator.
stale[bot] commented on issue #2500: [AIRFLOW-1488] Add the DagRunSensor operator. URL: https://github.com/apache/incubator-airflow/pull/2500#issuecomment-446019073 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2657: [AIRFLOW-161] New redirect route and extra links
stale[bot] commented on issue #2657: [AIRFLOW-161] New redirect route and extra links URL: https://github.com/apache/incubator-airflow/pull/2657#issuecomment-446019038 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2515: [AIRFLOW-1325][WIP] Airflow streaming log backed by ElasticSearch
stale[bot] commented on issue #2515: [AIRFLOW-1325][WIP] Airflow streaming log backed by ElasticSearch URL: https://github.com/apache/incubator-airflow/pull/2515#issuecomment-446019063 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2361: [AIRFLOW 1159] DockerOperator can use newer docker.
stale[bot] commented on issue #2361: [AIRFLOW 1159] DockerOperator can use newer docker. URL: https://github.com/apache/incubator-airflow/pull/2361#issuecomment-446019096 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2776: [AIRFLOW-1793] fix DockerOperator using docker_conn_id
stale[bot] commented on issue #2776: [AIRFLOW-1793] fix DockerOperator using docker_conn_id URL: https://github.com/apache/incubator-airflow/pull/2776#issuecomment-446019016 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2444: [ARIFLOW-1414] Add support for retriggering dependent workflows
stale[bot] commented on issue #2444: [ARIFLOW-1414] Add support for retriggering dependent workflows URL: https://github.com/apache/incubator-airflow/pull/2444#issuecomment-446019088 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2503: AIRFLOW-1491 Check celery queue before scheduling commands
stale[bot] commented on issue #2503: AIRFLOW-1491 Check celery queue before scheduling commands URL: https://github.com/apache/incubator-airflow/pull/2503#issuecomment-446019069 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2455: [AIRFLOW-1423] Add logs to the scheduler DAG run decision logic
stale[bot] commented on issue #2455: [AIRFLOW-1423] Add logs to the scheduler DAG run decision logic URL: https://github.com/apache/incubator-airflow/pull/2455#issuecomment-446019079 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2811: [AIRFLOW-1844] task would not be executed when celery broker recovery
stale[bot] commented on issue #2811: [AIRFLOW-1844] task would not be executed when celery broker recovery URL: https://github.com/apache/incubator-airflow/pull/2811#issuecomment-446019011 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2757: [AIRFLOW-1775] Remote File Task Handler
stale[bot] commented on issue #2757: [AIRFLOW-1775] Remote File Task Handler URL: https://github.com/apache/incubator-airflow/pull/2757#issuecomment-446019036 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2192: [AIRFLOW-1044] base_task_runner fix for Task RUn Ignoring dependencies
stale[bot] commented on issue #2192: [AIRFLOW-1044] base_task_runner fix for Task RUn Ignoring dependencies URL: https://github.com/apache/incubator-airflow/pull/2192#issuecomment-446019139 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2740: [AIRFLOW-1768] add if to show trigger only if not paused
stale[bot] commented on issue #2740: [AIRFLOW-1768] add if to show trigger only if not paused URL: https://github.com/apache/incubator-airflow/pull/2740#issuecomment-446019037 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2794: [AIRFLOW-1822] Add gaiohttp and gthread gunicorn worker class to webserver cli
stale[bot] commented on issue #2794: [AIRFLOW-1822] Add gaiohttp and gthread gunicorn worker class to webserver cli URL: https://github.com/apache/incubator-airflow/pull/2794#issuecomment-446019012 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2754: [AIRFLOW-1729] Ignore whole directories from .airflowignore
stale[bot] commented on issue #2754: [AIRFLOW-1729] Ignore whole directories from .airflowignore URL: https://github.com/apache/incubator-airflow/pull/2754#issuecomment-446019020 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3497) Allow for port configuration in KubernetesPodOperator
Wilson Lian created AIRFLOW-3497: Summary: Allow for port configuration in KubernetesPodOperator Key: AIRFLOW-3497 URL: https://issues.apache.org/jira/browse/AIRFLOW-3497 Project: Apache Airflow Issue Type: Improvement Components: contrib Reporter: Wilson Lian Allowing for ports to be configured would enable cross-pod communication. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3496) Support multi-container pod in KubernetesPodOperator
Wilson Lian created AIRFLOW-3496: Summary: Support multi-container pod in KubernetesPodOperator Key: AIRFLOW-3496 URL: https://issues.apache.org/jira/browse/AIRFLOW-3496 Project: Apache Airflow Issue Type: Improvement Components: contrib Reporter: Wilson Lian KubernetesPodOperator currently only allows one to run a single-container pod (aside from init containers). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3495) DataProcSparkSqlOperator and DataProcHiveOperator should raise error when query and query_uri are both provided
Wilson Lian created AIRFLOW-3495: Summary: DataProcSparkSqlOperator and DataProcHiveOperator should raise error when query and query_uri are both provided Key: AIRFLOW-3495 URL: https://issues.apache.org/jira/browse/AIRFLOW-3495 Project: Apache Airflow Issue Type: Bug Components: contrib Reporter: Wilson Lian Exactly 1 of the query and query_uri params will be used. It should be an error to provide more than one. Fixing this will make cases like [this|https://stackoverflow.com/questions/53424091/unable-to-query-using-file-in-data-proc-hive-operator] less confusing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3494) Dask executor has dependency conflict
Colin Campbell created AIRFLOW-3494: --- Summary: Dask executor has dependency conflict Key: AIRFLOW-3494 URL: https://issues.apache.org/jira/browse/AIRFLOW-3494 Project: Apache Airflow Issue Type: Bug Affects Versions: 1.10.2 Reporter: Colin Campbell In the setup.py, the dask extra installs 'distributed>=1.17.1,<2', and requires psutl <5. However, version 1.24.1 of dask-distributed now requires psutil >=5. [Distributed Changelog|https://distributed.readthedocs.io/en/latest/changelog.html#id3] This causes an error when trying to install airflow without first manually installing distributed<=1.24.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3370) Enhance current ES handler with stdout capability and more output options
[ https://issues.apache.org/jira/browse/AIRFLOW-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715509#comment-16715509 ] ASF GitHub Bot commented on AIRFLOW-3370: - rhwang10 opened a new pull request #4303: [AIRFLOW-3370] Elasticsearch log task handler additional features URL: https://github.com/apache/incubator-airflow/pull/4303 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow-3370](https://issues.apache.org/jira/browse/AIRFLOW-3370) ### Description - [x] Add additional options and documentation for using the ElasticsearchTaskHandler. The easiest and most foolproof way to implement a logging solution is to write to the standard output streams (stdout and stderr). In the Kubernetes ecosystem, given that pods are killed and restarted constantly, this implies persistent storage is a requirement for log history preservation. For the webserver and scheduler components of Airflow, logging to standard output streams is built in. However, when tasks are executed in Celery, workers will fork off child processes to execute tasks concurrently. Before the child process ends, it makes a call to the Airflow task logger, and a task log file is written to the file system. This potentially causes several problems with the Airflow on top of Kubernetes architecture. Given that Airflow has a constant stream of log output, running an Airflow environment using Celery in a Kubernetes cluster requires large amounts of memory resources. As such, when memory resources are exceeded, either 1) worker pods are often evicted by Kubernetes, or 2) worker output stalls and tasks pile up without completing. A current best-case workaround is to run a sidecar container that tails the `stdout` and `stderr` streams to read logs from the filesystem of the worker node and then output those logs to its own standard output. However, this becomes a scaling issue when running multiple deployments, and is unsustainable for massive Airflow deployments. The options that are added in this PR look to circumvent the need for persistent volumes in worker nodes when running Airflow on Kubernetes. Workers will no longer need to be Stateful Sets and can instead be Deployments. The `elasticsearch_write_stdout` flag in `airflow.cfg` will allow the child process to write its log output to the parent process standard output stream. The `elasticsearch_json_format` flag in `airflow.cfg` allows additional optional JSON configuration for task instances based on the `logging` module LogRecord attributes. This must be used in conjunction with the `elasticsearch_record_labels` configuration. A potential use case for these options is precisely when setting up a log monitoring stack, such as EFK (Elasticsearch FluentD Kibana), without requiring persistent volumes. A FluentD daemon listening on every node is awaiting log output on the standard output stream. With the `write_stdout` flag set, it can capture the task log information that has been executed on child processes from the parent process standard output. Using the `json_format` field, FluentD can be configured to filter and specify log records, without needing to parse the standard Airflow log formatted record, before sending it off to a destination, such as Elasticsearch, where it can stored and handled independent of Airflow on Kubernetes. If either of these options are NULL, or not set, the `es_task_handler.py` will function exactly as before, and will only have read functionality. The options in this PR simply provide more functionality and options to users. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: handler close - test_close_stdout_logs - test_close_closed_stdout_logs - test_close_no_mark_end_stdout_logs - test_close_with_no_handler_stdout_logs - test_close_with_no_stream_stdout_logs handler read - test_read_stdout_logs - test_read_nonexistent_log_stdout_logs - test_read_raises_stdout_logs_json_true - test_read_timeout_stdout_logs - test_read_with_empty_metadata_stdout_logs - test_read_with_none_metadata_stdout_logs handler render_log_id and set_context - test_render_log_id_json_true - test_set_context_stdout_logs ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1.
[jira] [Commented] (AIRFLOW-3370) Enhance current ES handler with stdout capability and more output options
[ https://issues.apache.org/jira/browse/AIRFLOW-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715508#comment-16715508 ] ASF GitHub Bot commented on AIRFLOW-3370: - rhwang10 closed pull request #4303: [AIRFLOW-3370] Elasticsearch log task handler additional features URL: https://github.com/apache/incubator-airflow/pull/4303 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/config_templates/airflow_local_settings.py b/airflow/config_templates/airflow_local_settings.py index 45a2f2923c..2bf9f9d0e8 100644 --- a/airflow/config_templates/airflow_local_settings.py +++ b/airflow/config_templates/airflow_local_settings.py @@ -59,6 +59,13 @@ END_OF_LOG_MARK = conf.get('elasticsearch', 'ELASTICSEARCH_END_OF_LOG_MARK') +ELASTICSEARCH_WRITE_STDOUT = conf.get('elasticsearch', 'ELASTICSEARCH_WRITE_STDOUT') + +ELASTICSEARCH_JSON_FORMAT = conf.get('elasticsearch', 'ELASTICSEARCH_JSON_FORMAT') + +ELASTICSEARCH_RECORD_LABELS = [label.strip() for label in conf.get('elasticsearch', + 'ELASTICSEARCH_RECORD_LABELS').split(",")] + DEFAULT_LOGGING_CONFIG = { 'version': 1, 'disable_existing_loggers': False, @@ -191,6 +198,9 @@ 'filename_template': FILENAME_TEMPLATE, 'end_of_log_mark': END_OF_LOG_MARK, 'host': ELASTICSEARCH_HOST, +'write_stdout': ELASTICSEARCH_WRITE_STDOUT, +'json_format': ELASTICSEARCH_JSON_FORMAT, +'record_labels': ELASTICSEARCH_RECORD_LABELS, }, }, } diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg index a9473178c1..146192051d 100644 --- a/airflow/config_templates/default_airflow.cfg +++ b/airflow/config_templates/default_airflow.cfg @@ -588,6 +588,9 @@ hide_sensitive_variable_fields = True elasticsearch_host = elasticsearch_log_id_template = {{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}} elasticsearch_end_of_log_mark = end_of_log +elasticsearch_write_stdout= +elasticsearch_json_format= +elasticsearch_record_labels= [kubernetes] # The repository, tag and imagePullPolicy of the Kubernetes Image for the Worker to Run diff --git a/airflow/utils/log/es_task_handler.py b/airflow/utils/log/es_task_handler.py index 16372c0600..e396504969 100644 --- a/airflow/utils/log/es_task_handler.py +++ b/airflow/utils/log/es_task_handler.py @@ -17,8 +17,12 @@ # specific language governing permissions and limitations # under the License. -# Using `from elasticsearch import *` would break elasticsearch mocking used in unit test. +# Using `from elasticsearch import *` breaks es mocking in unit test. import elasticsearch +import json +import logging +import sys + import pendulum from elasticsearch_dsl import Search @@ -28,6 +32,43 @@ from airflow.utils.log.logging_mixin import LoggingMixin +class ParentStdout(): +""" +Keep track of the ParentStdout stdout context in child process +""" +def __init__(self): +self.closed = False + +def write(self, string): +sys.__stdout__.write(string) + +def close(self): +self.closed = True + + +class JsonFormatter(logging.Formatter): +""" +Custom formatter to allow for fields to be captured in JSON format. +Fields are added via the RECORD_LABELS list. +TODO: Move RECORD_LABELS into configs/log_config.py +""" +def __init__(self, record_labels, processedTask=None): +super(JsonFormatter, self).__init__() +self.processedTask = processedTask +self.record_labels = record_labels + +def _mergeDictionaries(self, dict_1, dict_2): +merged = dict_1.copy() +merged.update(dict_2) +return merged + +def format(self, record): +recordObj = {label: getattr(record, label) + for label in self.record_labels} +log_context = self._mergeDictionaries(recordObj, self.processedTask) +return json.dumps(log_context) + + class ElasticsearchTaskHandler(FileTaskHandler, LoggingMixin): PAGE = 0 MAX_LINE_PER_PAGE = 1000 @@ -50,7 +91,8 @@ class ElasticsearchTaskHandler(FileTaskHandler, LoggingMixin): def __init__(self, base_log_folder, filename_template, log_id_template, end_of_log_mark, - host='localhost:9200'): + write_stdout=None, json_format=None, + record_labels=None, host='localhost:9200'): """ :param base_log_folder: base folder to store logs locally :param log_id_template: log id template @@ -58,7 +100,15 @@ def __init__(self, base_log_folder, filename_template, """
[GitHub] codecov-io edited a comment on issue #4295: AIRFLOW-3452 removed an unused/dangerous display-none
codecov-io edited a comment on issue #4295: AIRFLOW-3452 removed an unused/dangerous display-none URL: https://github.com/apache/incubator-airflow/pull/4295#issuecomment-445373465 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=h1) Report > Merging [#4295](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/788bd6fcb35ce6354ae874a508772e58a850683e?src=pr=desc) will **increase** coverage by `0.02%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4295/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=tree) ```diff @@Coverage Diff@@ ## master #4295 +/- ## = + Coverage 78.08% 78.1% +0.02% = Files 201 201 Lines 16462 16464 +2 = + Hits12854 12859 +5 + Misses 36083605 -3 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/4295/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `64.59% <0%> (ø)` | :arrow_up: | | [airflow/security/kerberos.py](https://codecov.io/gh/apache/incubator-airflow/pull/4295/diff?src=pr=tree#diff-YWlyZmxvdy9zZWN1cml0eS9rZXJiZXJvcy5weQ==) | `76.19% <0%> (+4.76%)` | :arrow_up: | | [airflow/api/common/experimental/delete\_dag.py](https://codecov.io/gh/apache/incubator-airflow/pull/4295/diff?src=pr=tree#diff-YWlyZmxvdy9hcGkvY29tbW9uL2V4cGVyaW1lbnRhbC9kZWxldGVfZGFnLnB5) | `88% <0%> (+5.39%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=footer). Last update [788bd6f...63481e0](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4295: AIRFLOW-3452 removed an unused/dangerous display-none
codecov-io edited a comment on issue #4295: AIRFLOW-3452 removed an unused/dangerous display-none URL: https://github.com/apache/incubator-airflow/pull/4295#issuecomment-445373465 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=h1) Report > Merging [#4295](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/788bd6fcb35ce6354ae874a508772e58a850683e?src=pr=desc) will **increase** coverage by `0.02%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4295/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=tree) ```diff @@Coverage Diff@@ ## master #4295 +/- ## = + Coverage 78.08% 78.1% +0.02% = Files 201 201 Lines 16462 16464 +2 = + Hits12854 12859 +5 + Misses 36083605 -3 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/4295/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `64.59% <0%> (ø)` | :arrow_up: | | [airflow/security/kerberos.py](https://codecov.io/gh/apache/incubator-airflow/pull/4295/diff?src=pr=tree#diff-YWlyZmxvdy9zZWN1cml0eS9rZXJiZXJvcy5weQ==) | `76.19% <0%> (+4.76%)` | :arrow_up: | | [airflow/api/common/experimental/delete\_dag.py](https://codecov.io/gh/apache/incubator-airflow/pull/4295/diff?src=pr=tree#diff-YWlyZmxvdy9hcGkvY29tbW9uL2V4cGVyaW1lbnRhbC9kZWxldGVfZGFnLnB5) | `88% <0%> (+5.39%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=footer). Last update [788bd6f...63481e0](https://codecov.io/gh/apache/incubator-airflow/pull/4295?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rhwang10 closed pull request #4303: [AIRFLOW-3370] Elasticsearch log task handler additional features
rhwang10 closed pull request #4303: [AIRFLOW-3370] Elasticsearch log task handler additional features URL: https://github.com/apache/incubator-airflow/pull/4303 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/config_templates/airflow_local_settings.py b/airflow/config_templates/airflow_local_settings.py index 45a2f2923c..2bf9f9d0e8 100644 --- a/airflow/config_templates/airflow_local_settings.py +++ b/airflow/config_templates/airflow_local_settings.py @@ -59,6 +59,13 @@ END_OF_LOG_MARK = conf.get('elasticsearch', 'ELASTICSEARCH_END_OF_LOG_MARK') +ELASTICSEARCH_WRITE_STDOUT = conf.get('elasticsearch', 'ELASTICSEARCH_WRITE_STDOUT') + +ELASTICSEARCH_JSON_FORMAT = conf.get('elasticsearch', 'ELASTICSEARCH_JSON_FORMAT') + +ELASTICSEARCH_RECORD_LABELS = [label.strip() for label in conf.get('elasticsearch', + 'ELASTICSEARCH_RECORD_LABELS').split(",")] + DEFAULT_LOGGING_CONFIG = { 'version': 1, 'disable_existing_loggers': False, @@ -191,6 +198,9 @@ 'filename_template': FILENAME_TEMPLATE, 'end_of_log_mark': END_OF_LOG_MARK, 'host': ELASTICSEARCH_HOST, +'write_stdout': ELASTICSEARCH_WRITE_STDOUT, +'json_format': ELASTICSEARCH_JSON_FORMAT, +'record_labels': ELASTICSEARCH_RECORD_LABELS, }, }, } diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg index a9473178c1..146192051d 100644 --- a/airflow/config_templates/default_airflow.cfg +++ b/airflow/config_templates/default_airflow.cfg @@ -588,6 +588,9 @@ hide_sensitive_variable_fields = True elasticsearch_host = elasticsearch_log_id_template = {{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}} elasticsearch_end_of_log_mark = end_of_log +elasticsearch_write_stdout= +elasticsearch_json_format= +elasticsearch_record_labels= [kubernetes] # The repository, tag and imagePullPolicy of the Kubernetes Image for the Worker to Run diff --git a/airflow/utils/log/es_task_handler.py b/airflow/utils/log/es_task_handler.py index 16372c0600..e396504969 100644 --- a/airflow/utils/log/es_task_handler.py +++ b/airflow/utils/log/es_task_handler.py @@ -17,8 +17,12 @@ # specific language governing permissions and limitations # under the License. -# Using `from elasticsearch import *` would break elasticsearch mocking used in unit test. +# Using `from elasticsearch import *` breaks es mocking in unit test. import elasticsearch +import json +import logging +import sys + import pendulum from elasticsearch_dsl import Search @@ -28,6 +32,43 @@ from airflow.utils.log.logging_mixin import LoggingMixin +class ParentStdout(): +""" +Keep track of the ParentStdout stdout context in child process +""" +def __init__(self): +self.closed = False + +def write(self, string): +sys.__stdout__.write(string) + +def close(self): +self.closed = True + + +class JsonFormatter(logging.Formatter): +""" +Custom formatter to allow for fields to be captured in JSON format. +Fields are added via the RECORD_LABELS list. +TODO: Move RECORD_LABELS into configs/log_config.py +""" +def __init__(self, record_labels, processedTask=None): +super(JsonFormatter, self).__init__() +self.processedTask = processedTask +self.record_labels = record_labels + +def _mergeDictionaries(self, dict_1, dict_2): +merged = dict_1.copy() +merged.update(dict_2) +return merged + +def format(self, record): +recordObj = {label: getattr(record, label) + for label in self.record_labels} +log_context = self._mergeDictionaries(recordObj, self.processedTask) +return json.dumps(log_context) + + class ElasticsearchTaskHandler(FileTaskHandler, LoggingMixin): PAGE = 0 MAX_LINE_PER_PAGE = 1000 @@ -50,7 +91,8 @@ class ElasticsearchTaskHandler(FileTaskHandler, LoggingMixin): def __init__(self, base_log_folder, filename_template, log_id_template, end_of_log_mark, - host='localhost:9200'): + write_stdout=None, json_format=None, + record_labels=None, host='localhost:9200'): """ :param base_log_folder: base folder to store logs locally :param log_id_template: log id template @@ -58,7 +100,15 @@ def __init__(self, base_log_folder, filename_template, """ super(ElasticsearchTaskHandler, self).__init__( base_log_folder, filename_template) + self.closed = False +self.write_stdout = write_stdout +self.json_format = json_format +self.record_labels = record_labels + +
[GitHub] rhwang10 opened a new pull request #4303: [AIRFLOW-3370] Elasticsearch log task handler additional features
rhwang10 opened a new pull request #4303: [AIRFLOW-3370] Elasticsearch log task handler additional features URL: https://github.com/apache/incubator-airflow/pull/4303 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow-3370](https://issues.apache.org/jira/browse/AIRFLOW-3370) ### Description - [x] Add additional options and documentation for using the ElasticsearchTaskHandler. The easiest and most foolproof way to implement a logging solution is to write to the standard output streams (stdout and stderr). In the Kubernetes ecosystem, given that pods are killed and restarted constantly, this implies persistent storage is a requirement for log history preservation. For the webserver and scheduler components of Airflow, logging to standard output streams is built in. However, when tasks are executed in Celery, workers will fork off child processes to execute tasks concurrently. Before the child process ends, it makes a call to the Airflow task logger, and a task log file is written to the file system. This potentially causes several problems with the Airflow on top of Kubernetes architecture. Given that Airflow has a constant stream of log output, running an Airflow environment using Celery in a Kubernetes cluster requires large amounts of memory resources. As such, when memory resources are exceeded, either 1) worker pods are often evicted by Kubernetes, or 2) worker output stalls and tasks pile up without completing. A current best-case workaround is to run a sidecar container that tails the `stdout` and `stderr` streams to read logs from the filesystem of the worker node and then output those logs to its own standard output. However, this becomes a scaling issue when running multiple deployments, and is unsustainable for massive Airflow deployments. The options that are added in this PR look to circumvent the need for persistent volumes in worker nodes when running Airflow on Kubernetes. Workers will no longer need to be Stateful Sets and can instead be Deployments. The `elasticsearch_write_stdout` flag in `airflow.cfg` will allow the child process to write its log output to the parent process standard output stream. The `elasticsearch_json_format` flag in `airflow.cfg` allows additional optional JSON configuration for task instances based on the `logging` module LogRecord attributes. This must be used in conjunction with the `elasticsearch_record_labels` configuration. A potential use case for these options is precisely when setting up a log monitoring stack, such as EFK (Elasticsearch FluentD Kibana), without requiring persistent volumes. A FluentD daemon listening on every node is awaiting log output on the standard output stream. With the `write_stdout` flag set, it can capture the task log information that has been executed on child processes from the parent process standard output. Using the `json_format` field, FluentD can be configured to filter and specify log records, without needing to parse the standard Airflow log formatted record, before sending it off to a destination, such as Elasticsearch, where it can stored and handled independent of Airflow on Kubernetes. If either of these options are NULL, or not set, the `es_task_handler.py` will function exactly as before, and will only have read functionality. The options in this PR simply provide more functionality and options to users. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: handler close - test_close_stdout_logs - test_close_closed_stdout_logs - test_close_no_mark_end_stdout_logs - test_close_with_no_handler_stdout_logs - test_close_with_no_stream_stdout_logs handler read - test_read_stdout_logs - test_read_nonexistent_log_stdout_logs - test_read_raises_stdout_logs_json_true - test_read_timeout_stdout_logs - test_read_with_empty_metadata_stdout_logs - test_read_with_none_metadata_stdout_logs handler render_log_id and set_context - test_render_log_id_json_true - test_set_context_stdout_logs ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that
[jira] [Created] (AIRFLOW-3493) DAG doc_md only shows up on Graph View
James Meickle created AIRFLOW-3493: -- Summary: DAG doc_md only shows up on Graph View Key: AIRFLOW-3493 URL: https://issues.apache.org/jira/browse/AIRFLOW-3493 Project: Apache Airflow Issue Type: Improvement Components: ui Affects Versions: 1.10.1 Reporter: James Meickle Currently, setting doc_md on a DAG causes rendered documentation to appear in Graph View. This is a great feature, but it could be applied more widely. Out of the current DAG-level tabs, I would also expect DAG documentation to show up on Tree View (it's similar to Graph) and on Details (it has info on the overall DAG). I would not expect it to show up on the other DAG-level pages, since they are interactive querying displays. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1585) Documentation (e.g. doc_md) for tasks should be displayed more prominently
[ https://issues.apache.org/jira/browse/AIRFLOW-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715493#comment-16715493 ] James Meickle commented on AIRFLOW-1585: +1 on this, I'd love a short snippet to appear on hover or when you click on a task instance. > Documentation (e.g. doc_md) for tasks should be displayed more prominently > --- > > Key: AIRFLOW-1585 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1585 > Project: Apache Airflow > Issue Type: Improvement > Components: ui >Affects Versions: 1.8.0 >Reporter: Max Grender-Jones >Priority: Major > > We have a web of complex tasks. I love that when I set `dag_md` on an entire > DAG it is visible very prominently on the Graph View. However, the same > cannot be said of `dag_md` set on individual tasks (until now I've never even > had a use for the `Task Details` tab, even there it's not very prominently > presented). > My wishlist of where it would be displayed: > - In the mouseover (although if there's a long description, perhaps that > would be too much for here) > - In the popup that you see when you click on a task (My favourite) > - At the top of *all* the tabs for that task, displayed in a prominent > break-out fashion (as with the DAG), as opposed to buried in the 'attributes' > section -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] stale[bot] commented on issue #3782: [AIRFLOW-2936] Use official Python images as base image for Docker
stale[bot] commented on issue #3782: [AIRFLOW-2936] Use official Python images as base image for Docker URL: https://github.com/apache/incubator-airflow/pull/3782#issuecomment-445945137 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3526: [AIRFLOW-2651] Add file system hooks with a common interface
stale[bot] commented on issue #3526: [AIRFLOW-2651] Add file system hooks with a common interface URL: https://github.com/apache/incubator-airflow/pull/3526#issuecomment-445945168 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3447: [AIRFLOW-2549] Fix DataProcOperation error-check
stale[bot] commented on issue #3447: [AIRFLOW-2549] Fix DataProcOperation error-check URL: https://github.com/apache/incubator-airflow/pull/3447#issuecomment-445945190 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3186: [AIRFLOW-2280]Add feature in CheckIntervalOperator
stale[bot] commented on issue #3186: [AIRFLOW-2280]Add feature in CheckIntervalOperator URL: https://github.com/apache/incubator-airflow/pull/3186#issuecomment-445945212 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3713: [AIRFLOW-2428] Add AutoScalingRole key to emr_hook
stale[bot] commented on issue #3713: [AIRFLOW-2428] Add AutoScalingRole key to emr_hook URL: https://github.com/apache/incubator-airflow/pull/3713#issuecomment-445945156 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3139: [AIRFLOW-2224] Add support for CSV files in mysql_to_gcs operator
stale[bot] commented on issue #3139: [AIRFLOW-2224] Add support for CSV files in mysql_to_gcs operator URL: https://github.com/apache/incubator-airflow/pull/3139#issuecomment-445945232 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2932: [AIRFLOW-1974] Improve Databricks Hook/Operator
stale[bot] commented on issue #2932: [AIRFLOW-1974] Improve Databricks Hook/Operator URL: https://github.com/apache/incubator-airflow/pull/2932#issuecomment-445945256 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #2881: Correct http_hook to use conn_type instead of schema
stale[bot] commented on issue #2881: Correct http_hook to use conn_type instead of schema URL: https://github.com/apache/incubator-airflow/pull/2881#issuecomment-445945261 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3134: [AIRFLOW-878] Use absolute gunicorn executable location
stale[bot] commented on issue #3134: [AIRFLOW-878] Use absolute gunicorn executable location URL: https://github.com/apache/incubator-airflow/pull/3134#issuecomment-445945239 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3786: [AIRFLOW-XXX] Note on min_file_process_interval
stale[bot] commented on issue #3786: [AIRFLOW-XXX] Note on min_file_process_interval URL: https://github.com/apache/incubator-airflow/pull/3786#issuecomment-445945129 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3816: [HOLD][AIRFLOW-2973] Use Python 3.6.x everywhere possible
stale[bot] commented on issue #3816: [HOLD][AIRFLOW-2973] Use Python 3.6.x everywhere possible URL: https://github.com/apache/incubator-airflow/pull/3816#issuecomment-445945110 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3115: [AIRFLOW-2193] Add ROperator for using R
stale[bot] commented on issue #3115: [AIRFLOW-2193] Add ROperator for using R URL: https://github.com/apache/incubator-airflow/pull/3115#issuecomment-445945249 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3702: [AIRFLOW-81] Add ScheduleBlackoutSensor
stale[bot] commented on issue #3702: [AIRFLOW-81] Add ScheduleBlackoutSensor URL: https://github.com/apache/incubator-airflow/pull/3702#issuecomment-445945162 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3912: [AIRFLOW-3077] Default Not to Raise Error When PyMongo Contruct JSON Data
stale[bot] commented on issue #3912: [AIRFLOW-3077] Default Not to Raise Error When PyMongo Contruct JSON Data URL: https://github.com/apache/incubator-airflow/pull/3912#issuecomment-445945088 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3858: [AIRFLOW-2929] Add get and set for pool class in models
stale[bot] commented on issue #3858: [AIRFLOW-2929] Add get and set for pool class in models URL: https://github.com/apache/incubator-airflow/pull/3858#issuecomment-445945100 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3941: [AIRFLOW-3106] Validate Postgres connection after saving it
stale[bot] commented on issue #3941: [AIRFLOW-3106] Validate Postgres connection after saving it URL: https://github.com/apache/incubator-airflow/pull/3941#issuecomment-445945072 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3249: [AIRFLOW-2354] Change task instance run validation to not exclude das…
stale[bot] commented on issue #3249: [AIRFLOW-2354] Change task instance run validation to not exclude das… URL: https://github.com/apache/incubator-airflow/pull/3249#issuecomment-445945197 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3722: [AIRFLOW-2759] Add changes to extract proxy details at the base hook …
stale[bot] commented on issue #3722: [AIRFLOW-2759] Add changes to extract proxy details at the base hook … URL: https://github.com/apache/incubator-airflow/pull/3722#issuecomment-445945151 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3503: Misc fixes
stale[bot] commented on issue #3503: Misc fixes URL: https://github.com/apache/incubator-airflow/pull/3503#issuecomment-445945173 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3182: Testing oracle-java9-installer for travisci builds
stale[bot] commented on issue #3182: Testing oracle-java9-installer for travisci builds URL: https://github.com/apache/incubator-airflow/pull/3182#issuecomment-445945233 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3805: [AIRFLOW-2062] Add per-connection KMS encryption.
stale[bot] commented on issue #3805: [AIRFLOW-2062] Add per-connection KMS encryption. URL: https://github.com/apache/incubator-airflow/pull/3805#issuecomment-445945118 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3229: [AIRFLOW-2325] Add cloudwatch task handler (IN PROGRESS)
stale[bot] commented on issue #3229: [AIRFLOW-2325] Add cloudwatch task handler (IN PROGRESS) URL: https://github.com/apache/incubator-airflow/pull/3229#issuecomment-445945219 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3774: [AIRFLOW-2920] Added downward API metadata to Kubernetes pods
stale[bot] commented on issue #3774: [AIRFLOW-2920] Added downward API metadata to Kubernetes pods URL: https://github.com/apache/incubator-airflow/pull/3774#issuecomment-445945143 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3250: [WIP] Add dms raw
stale[bot] commented on issue #3250: [WIP] Add dms raw URL: https://github.com/apache/incubator-airflow/pull/3250#issuecomment-445945204 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3867: [AIRFLOW-3011][CLI] Add cmd function printing config
stale[bot] commented on issue #3867: [AIRFLOW-3011][CLI] Add cmd function printing config URL: https://github.com/apache/incubator-airflow/pull/3867#issuecomment-445945095 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3891: [AIRFLOW-3029] New Operator - SqlOperator
stale[bot] commented on issue #3891: [AIRFLOW-3029] New Operator - SqlOperator URL: https://github.com/apache/incubator-airflow/pull/3891#issuecomment-445945083 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3796: [AIRFLOW-2824] - Add config to disable default conn creation
stale[bot] commented on issue #3796: [AIRFLOW-2824] - Add config to disable default conn creation URL: https://github.com/apache/incubator-airflow/pull/3796#issuecomment-445945124 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #3851: AIRFLOW-3014 Increase possible length of passwords in connection table
stale[bot] commented on issue #3851: AIRFLOW-3014 Increase possible length of passwords in connection table URL: https://github.com/apache/incubator-airflow/pull/3851#issuecomment-445945115 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stale[bot] commented on issue #4055: [AIRFLOW-3206] neutral and clear GPL dependency notice
stale[bot] commented on issue #4055: [AIRFLOW-3206] neutral and clear GPL dependency notice URL: https://github.com/apache/incubator-airflow/pull/4055#issuecomment-445945065 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] MarcusSorealheis edited a comment on issue #4295: AIRFLOW-3452 removed an unused/dangerous display-none
MarcusSorealheis edited a comment on issue #4295: AIRFLOW-3452 removed an unused/dangerous display-none URL: https://github.com/apache/incubator-airflow/pull/4295#issuecomment-445932870 @ashb I have not identified any reason for https://github.com/apache/incubator-airflow/blob/master/airflow/www_rbac/templates/airflow/dags.html#L306 other than they style but there may be another reason that I have not pinpointed yet. My vote would be to remove it when we know why it is there as the reason could extend beyond overwriting the `display: none`. The entire JavaScript section of the file there will be rewritten soon. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tal181 commented on issue #4267: [AIRFLOW-3411] create openfaas hook
tal181 commented on issue #4267: [AIRFLOW-3411] create openfaas hook URL: https://github.com/apache/incubator-airflow/pull/4267#issuecomment-445943097 ping @kaxil This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] apraovjr commented on issue #4290: [AIRFLOW-3282] Implement Azure Kubernetes Service Operator
apraovjr commented on issue #4290: [AIRFLOW-3282] Implement Azure Kubernetes Service Operator URL: https://github.com/apache/incubator-airflow/pull/4290#issuecomment-445935100 Reviewers? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] MarcusSorealheis commented on issue #4295: AIRFLOW-3452 removed an unused/dangerous display-none
MarcusSorealheis commented on issue #4295: AIRFLOW-3452 removed an unused/dangerous display-none URL: https://github.com/apache/incubator-airflow/pull/4295#issuecomment-445932870 @ashb I have not identified any reason for https://github.com/apache/incubator-airflow/blob/master/airflow/www_rbac/templates/airflow/dags.html#L306 other than they style but there may be another reason that I have not pinpointed yet. My vote would be to remove it. The entire JavaScript section of the file there will be rewritten soon. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rhwang10 opened a new pull request #4303: [AIRFLOW-3370] Elasticsearch log task handler additional features
rhwang10 opened a new pull request #4303: [AIRFLOW-3370] Elasticsearch log task handler additional features URL: https://github.com/apache/incubator-airflow/pull/4303 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow-3370](https://issues.apache.org/jira/browse/AIRFLOW-3370) ### Description Add additional options and documentation for using the ElasticsearchTaskHandler. The easiest and most foolproof way to implement a logging solution is to write to the standard output streams (stdout and stderr). In the Kubernetes ecosystem, given that pods are killed and restarted constantly, this implies persistent storage is a requirement for log history preservation. For the webserver and scheduler components of Airflow, logging to standard output streams is built in. However, when tasks are executed in Celery, workers will fork off child processes to execute tasks concurrently. Before the child process ends, it makes a call to the Airflow task logger, and a task log file is written to the file system. This potentially causes several problems with the Airflow on top of Kubernetes architecture. Given that Airflow has a constant stream of log output, running an Airflow environment using Celery in a Kubernetes cluster requires large amounts of memory resources. As such, when memory resources are exceeded, either 1) worker pods are often evicted by Kubernetes, or 2) worker output stalls and tasks pile up without completing. A current best-case workaround is to run a sidecar container that tails the `stdout` and `stderr` streams to read logs from the filesystem of the worker node and then output those logs to its own standard output. However, this becomes a scaling issue when running multiple deployments, and is unsustainable for massive Airflow deployments. The options that are added in this PR look to circumvent the need for persistent volumes in worker nodes when running Airflow on Kubernetes. Workers will no longer need to be Stateful Sets and can instead be Deployments. The `elasticsearch_write_stdout` flag in `airflow.cfg` will allow the child process to write its log output to the parent process standard output stream. The `elasticsearch_json_format` flag in `airflow.cfg` allows additional optional JSON configuration for task instances based on the `logging` module LogRecord attributes. This must be used in conjunction with the `elasticsearch_record_labels` configuration. A potential use case for these options is precisely when setting up a log monitoring stack, such as EFK (Elasticsearch FluentD Kibana), without requiring persistent volumes. A FluentD daemon listening on every node is awaiting log output on the standard output stream. With the `write_stdout` flag set, it can capture the task log information that has been executed on child processes from the parent process standard output. Using the `json_format` field, FluentD can be configured to filter and specify log records, without needing to parse the standard Airflow log formatted record, before sending it off to a destination, such as Elasticsearch, where it can stored and handled independent of Airflow on Kubernetes. If either of these options are NULL, or not set, the `es_task_handler.py` will function exactly as before, and will only have read functionality. The options in this PR simply provide more functionality and options to users. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: handler close - test_close_stdout_logs - test_close_closed_stdout_logs - test_close_no_mark_end_stdout_logs - test_close_with_no_handler_stdout_logs - test_close_with_no_stream_stdout_logs handler read - test_read_stdout_logs - test_read_nonexistent_log_stdout_logs - test_read_raises_stdout_logs_json_true - test_read_timeout_stdout_logs - test_read_with_empty_metadata_stdout_logs - test_read_with_none_metadata_stdout_logs handler render_log_id and set_context - test_render_log_id_json_true - test_set_context_stdout_logs ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes
[jira] [Commented] (AIRFLOW-3370) Enhance current ES handler with stdout capability and more output options
[ https://issues.apache.org/jira/browse/AIRFLOW-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715356#comment-16715356 ] ASF GitHub Bot commented on AIRFLOW-3370: - rhwang10 opened a new pull request #4303: [AIRFLOW-3370] Elasticsearch log task handler additional features URL: https://github.com/apache/incubator-airflow/pull/4303 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow-3370](https://issues.apache.org/jira/browse/AIRFLOW-3370) ### Description Add additional options and documentation for using the ElasticsearchTaskHandler. The easiest and most foolproof way to implement a logging solution is to write to the standard output streams (stdout and stderr). In the Kubernetes ecosystem, given that pods are killed and restarted constantly, this implies persistent storage is a requirement for log history preservation. For the webserver and scheduler components of Airflow, logging to standard output streams is built in. However, when tasks are executed in Celery, workers will fork off child processes to execute tasks concurrently. Before the child process ends, it makes a call to the Airflow task logger, and a task log file is written to the file system. This potentially causes several problems with the Airflow on top of Kubernetes architecture. Given that Airflow has a constant stream of log output, running an Airflow environment using Celery in a Kubernetes cluster requires large amounts of memory resources. As such, when memory resources are exceeded, either 1) worker pods are often evicted by Kubernetes, or 2) worker output stalls and tasks pile up without completing. A current best-case workaround is to run a sidecar container that tails the `stdout` and `stderr` streams to read logs from the filesystem of the worker node and then output those logs to its own standard output. However, this becomes a scaling issue when running multiple deployments, and is unsustainable for massive Airflow deployments. The options that are added in this PR look to circumvent the need for persistent volumes in worker nodes when running Airflow on Kubernetes. Workers will no longer need to be Stateful Sets and can instead be Deployments. The `elasticsearch_write_stdout` flag in `airflow.cfg` will allow the child process to write its log output to the parent process standard output stream. The `elasticsearch_json_format` flag in `airflow.cfg` allows additional optional JSON configuration for task instances based on the `logging` module LogRecord attributes. This must be used in conjunction with the `elasticsearch_record_labels` configuration. A potential use case for these options is precisely when setting up a log monitoring stack, such as EFK (Elasticsearch FluentD Kibana), without requiring persistent volumes. A FluentD daemon listening on every node is awaiting log output on the standard output stream. With the `write_stdout` flag set, it can capture the task log information that has been executed on child processes from the parent process standard output. Using the `json_format` field, FluentD can be configured to filter and specify log records, without needing to parse the standard Airflow log formatted record, before sending it off to a destination, such as Elasticsearch, where it can stored and handled independent of Airflow on Kubernetes. If either of these options are NULL, or not set, the `es_task_handler.py` will function exactly as before, and will only have read functionality. The options in this PR simply provide more functionality and options to users. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: handler close - test_close_stdout_logs - test_close_closed_stdout_logs - test_close_no_mark_end_stdout_logs - test_close_with_no_handler_stdout_logs - test_close_with_no_stream_stdout_logs handler read - test_read_stdout_logs - test_read_nonexistent_log_stdout_logs - test_read_raises_stdout_logs_json_true - test_read_timeout_stdout_logs - test_read_with_empty_metadata_stdout_logs - test_read_with_none_metadata_stdout_logs handler render_log_id and set_context - test_render_log_id_json_true - test_set_context_stdout_logs ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1.
[GitHub] Fokko commented on issue #4298: [AIRFLOW-3478] Make sure that the session is closed
Fokko commented on issue #4298: [AIRFLOW-3478] Make sure that the session is closed URL: https://github.com/apache/incubator-airflow/pull/4298#issuecomment-445925323 @kaxil Thanks for the Flake8 fix :-) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #4298: [AIRFLOW-3478] Make sure that the session is closed
Fokko commented on a change in pull request #4298: [AIRFLOW-3478] Make sure that the session is closed URL: https://github.com/apache/incubator-airflow/pull/4298#discussion_r240330093 ## File path: airflow/bin/cli.py ## @@ -423,14 +418,11 @@ def unpause(args, dag=None): def set_is_paused(is_paused, args, dag=None): dag = dag or get_dag(args) -session = settings.Session() -dm = session.query(DagModel).filter( -DagModel.dag_id == dag.dag_id).first() -dm.is_paused = is_paused -session.commit() Review comment: So we should not remove any `.commit()` without consideration, but after the `create_session`, or `@provide_session` the session is committed anyway. Therefore the explicit `.commit()` in the code will only act as an in-between commit. We should close the session if we don't use it again directly. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #4298: [AIRFLOW-3478] Make sure that the session is closed
Fokko commented on a change in pull request #4298: [AIRFLOW-3478] Make sure that the session is closed URL: https://github.com/apache/incubator-airflow/pull/4298#discussion_r240329624 ## File path: airflow/api/common/experimental/mark_tasks.py ## @@ -180,18 +180,15 @@ def set_state(task, execution_date, upstream=False, downstream=False, tis_altered += qry_sub_dag.with_for_update().all() for ti in tis_altered: ti.state = state -session.commit() else: tis_altered = qry_dag.all() if len(sub_dag_ids) > 0: tis_altered += qry_sub_dag.all() -session.expunge_all() Review comment: I had to dig deeper into this. By default, [the objects are cleaned up](https://docs.sqlalchemy.org/en/latest/orm/session_api.html#sqlalchemy.orm.session.Session.commit): By default, the `Session` also expires all database loaded state on all ORM-managed attributes after transaction commit. This so that subsequent operations load the most recent data from the database. This behavior can be disabled using the `expire_on_commit=False` option to sessionmaker or the `Session`constructor. But I've noticed that we explicitly set the `expire_on_commit=False`. https://github.com/apache/incubator-airflow/blob/ded25e16c1fb912019d3d0e5d47d020dccaa54b7/airflow/settings.py#L198 In this case this change would indeed change behaviour. Maybe remove the `expire_on_commit ` to make it simpeler? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #3547: [AIRFLOW-2659] Improve Robustness of Operators in Airflow during Infra Outages
Fokko commented on issue #3547: [AIRFLOW-2659] Improve Robustness of Operators in Airflow during Infra Outages URL: https://github.com/apache/incubator-airflow/pull/3547#issuecomment-445899544 Wouldn't it be preferable to design the operator in such a way, that it will pick it up again automagically. Now we add additional configuration etc, which increases the complexity of Airflow and the specific operators. Would it be possible to integrate this in the operator itself? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dlamblin commented on a change in pull request #4163: [AIRFLOW-3319] - KubernetsExecutor: Need in try_number in labels if getting them later
dlamblin commented on a change in pull request #4163: [AIRFLOW-3319] - KubernetsExecutor: Need in try_number in labels if getting them later URL: https://github.com/apache/incubator-airflow/pull/4163#discussion_r240277825 ## File path: airflow/contrib/executors/kubernetes_executor.py ## @@ -458,10 +459,16 @@ def _datetime_to_label_safe_datestring(datetime_obj): def _labels_to_key(self, labels): try: +try_num = 1 +try: +try_num = int(labels.get('try_number', '1')) +except ValueError: +self.log.warn("could not get try_number as an int: %s", labels.get('try_number', '1')) return ( labels['dag_id'], labels['task_id'], self._label_safe_datestring_to_datetime(labels['execution_date']), -labels['try_number']) +try_num Review comment: So, if you upgrade a live system any pods without the label will return `try_num` 1. That's the crux of the fix (along with adding the try num label to new pods made), right? Would be a spot for a pep8 recommended (plays nice with vcs) redundant comma ```suggestion try_num, ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dlamblin commented on a change in pull request #4163: [AIRFLOW-3319] - KubernetsExecutor: Need in try_number in labels if getting them later
dlamblin commented on a change in pull request #4163: [AIRFLOW-3319] - KubernetsExecutor: Need in try_number in labels if getting them later URL: https://github.com/apache/incubator-airflow/pull/4163#discussion_r240275773 ## File path: airflow/contrib/executors/kubernetes_executor.py ## @@ -346,6 +346,7 @@ def run_next(self, next_job): namespace=self.namespace, worker_uuid=self.worker_uuid, Review comment: This is the only call to make_pod? If so, then good. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dlamblin commented on a change in pull request #4163: [AIRFLOW-3319] - KubernetsExecutor: Need in try_number in labels if getting them later
dlamblin commented on a change in pull request #4163: [AIRFLOW-3319] - KubernetsExecutor: Need in try_number in labels if getting them later URL: https://github.com/apache/incubator-airflow/pull/4163#discussion_r240276619 ## File path: airflow/contrib/executors/kubernetes_executor.py ## @@ -458,10 +459,16 @@ def _datetime_to_label_safe_datestring(datetime_obj): def _labels_to_key(self, labels): try: +try_num = 1 Review comment: Can this block be moved before the try on 461? The try in the try doesn't read well. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services