[GitHub] [airflow] uranusjr commented on issue #27328: SFTPOperator throws object of type 'PlainXComArg' has no len() when using with Taskflow API
uranusjr commented on issue #27328: URL: https://github.com/apache/airflow/issues/27328#issuecomment-1294500097 Value checks in operators should be done in `execute`, not `__init__`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] o-nikolas commented on issue #27338: scripts/tools/initialize_virtualenv.py calling .exists() on str
o-nikolas commented on issue #27338: URL: https://github.com/apache/airflow/issues/27338#issuecomment-1294435687 Thanks for the bug report @rkarish! I've assigned you the task since you checked that you're willing to submit a PR :smiley: > I believe this should be os.path.exists(airflow_home) instead Looking at the function stub for `clean_up_airflow_home` it looks like that code is expecting a `Path` object, but it's getting a string. So I think the better fix is to update the calling code to pass in a Path object instead of a string as it is now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] o-nikolas commented on issue #27328: SFTPOperator throws object of type 'PlainXComArg' has no len() when using with Taskflow API
o-nikolas commented on issue #27328: URL: https://github.com/apache/airflow/issues/27328#issuecomment-1294411017 Thanks for the bug report @jtommi! I see that you've checked that you're willing to submit a PR, so I have assigned the task to you :smiley: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] o-nikolas commented on a diff in pull request #27184: SSHOperator ignores cmd_timeout (#27182)
o-nikolas commented on code in PR #27184: URL: https://github.com/apache/airflow/pull/27184#discussion_r1007576007 ## airflow/providers/ssh/hooks/ssh.py: ## @@ -491,9 +491,12 @@ def exec_ssh_client_command( if stdout_buffer_length > 0: agg_stdout += stdout.channel.recv(stdout_buffer_length) +timedout = False + # read from both stdout and stderr while not channel.closed or channel.recv_ready() or channel.recv_stderr_ready(): readq, _, _ = select([channel], [], [], timeout) +timedout = len(readq) == 0 Review Comment: I don't have a deep enough understanding of the select api and ssh to know for sure, but it seems harmless to check the others. If you're positive that this is correct then I'm happy to commit and approve. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #27338: scripts/tools/initialize_virtualenv.py calling .exists() on str
boring-cyborg[bot] commented on issue #27338: URL: https://github.com/apache/airflow/issues/27338#issuecomment-1294388090 Thanks for opening your first issue here! Be sure to follow the issue template! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] rkarish opened a new issue, #27338: scripts/tools/initialize_virtualenv.py calling .exists() on str
rkarish opened a new issue, #27338: URL: https://github.com/apache/airflow/issues/27338 ### Apache Airflow version main (development) ### What happened While setting up a local development environment I went to use the `scripts/tools/initialize_virtualenv.py ` script and received an exception. I believe this should be `os.path.exists(airflow_home)` instead. ``` Traceback (most recent call last): File "/Users/rkarish/Projects/airflow/scripts/tools/initialize_virtualenv.py", line 187, in main() File "/Users/rkarish/Projects/airflow/scripts/tools/initialize_virtualenv.py", line 142, in main clean_up_airflow_home(airflow_home_dir) File "/Users/rkarish/Projects/airflow/scripts/tools/initialize_virtualenv.py", line 36, in clean_up_airflow_home if airflow_home.exists(): AttributeError: 'str' object has no attribute 'exists' ``` Also `LOCAL_VIRTUALENV.rst` has an incorrect path to this file. ### What you think should happen instead _No response_ ### How to reproduce _No response_ ### Operating System macOS 13.0 ### Versions of Apache Airflow Providers _No response_ ### Deployment Other ### Deployment details _No response_ ### Anything else _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #27337: fIx failing masking tests for python < 3.10
potiuk commented on PR #27337: URL: https://github.com/apache/airflow/pull/27337#issuecomment-1294354347 Running for "full tests" and with change in setup.py to get latest version of exceptiongroup - apparently the test results sligthly differ for different Python version -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk opened a new pull request, #27337: fIx failing masking tests for python < 3.10
potiuk opened a new pull request, #27337: URL: https://github.com/apache/airflow/pull/27337 Seems that the number of times user is printed in stack trace depend on Python version. The fix in #27335 seems to only have worked for Python 3.10 with the 1.0.0 of exceptiongroup the stack trace has less stack levels. --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] shubham22 commented on a diff in pull request #27262: Strenghten a bit and clarify importance of triaging issues
shubham22 commented on code in PR #27262: URL: https://github.com/apache/airflow/pull/27262#discussion_r1007459660 ## ISSUE_TRIAGE_PROCESS.rst: ## @@ -30,6 +30,52 @@ to fix an issue or make an enhancement, without needing to open an issue first. This is intended to make it as easy as possible to contribute to the project. +Another important part of our Issue reporting process are also Github Discussions. Review Comment: - remove "also" - "GitHub", also may be add a link to it as this is the first time it is mentioned (https://github.com/apache/airflow/discussions) ## COMMITTERS.rst: ## @@ -75,7 +75,8 @@ Code contribution Community contributions -1. Was instrumental in triaging issues +1. Actively participated in `triaging issues `_ showing their understanding Review Comment: Points 3~5 are present tense. Suggestion: Actively participates in... ## ISSUE_TRIAGE_PROCESS.rst: ## @@ -30,6 +30,52 @@ to fix an issue or make an enhancement, without needing to open an issue first. This is intended to make it as easy as possible to contribute to the project. +Another important part of our Issue reporting process are also Github Discussions. +Issues should represent clear feature requests or bugs which can/should be either implemented or fixed. +Users are encouraged to open Discussions rather than Issues if there are no clear, reproducible +steps, or when they have troubleshooting problems. + +Responding to issues/discussions (relatively) quickly +' + +It is vital to provide rather quick feedback to Issues and Discussions opened by our users, so that they +feel listened to rather than ignored. Even if the response is "we are not going to work on it because ...", +or "converting this issue to discussion because ..." or "closing because it is a duplicate of #xxx", it is +far more welcoming than leaving issues and discussions unanswered. Sometimes issues and discussions are +answered by other users (and this is cool) but if an issue/discussion is not responded to for a few days or +weeks, this gives an impression that the user reporting it is ignored, which creates an impression of a +non-welcoming project. + +We strive to provide relatively quick responses to all such issues and discussions. Users should exercise +patience while waiting for those (knowing that people might be busy, on vacations etc.) however they should +not wait weeks until someone looks at their issues. + +Issue Triage team +'' + +While many of the issues can be responded to by other users and committers, the committer team is not +big enough to handle all such requests and sometimes they are busy with implementing important huge features Review Comment: Suggestion: with implementing important and complex features... ## ISSUE_TRIAGE_PROCESS.rst: ## @@ -30,6 +30,52 @@ to fix an issue or make an enhancement, without needing to open an issue first. This is intended to make it as easy as possible to contribute to the project. +Another important part of our Issue reporting process are also Github Discussions. +Issues should represent clear feature requests or bugs which can/should be either implemented or fixed. +Users are encouraged to open Discussions rather than Issues if there are no clear, reproducible +steps, or when they have troubleshooting problems. + +Responding to issues/discussions (relatively) quickly +' + +It is vital to provide rather quick feedback to Issues and Discussions opened by our users, so that they +feel listened to rather than ignored. Even if the response is "we are not going to work on it because ...", +or "converting this issue to discussion because ..." or "closing because it is a duplicate of #xxx", it is +far more welcoming than leaving issues and discussions unanswered. Sometimes issues and discussions are +answered by other users (and this is cool) but if an issue/discussion is not responded to for a few days or +weeks, this gives an impression that the user reporting it is ignored, which creates an impression of a +non-welcoming project. Review Comment: Suggestion: it gives an impression that the user was ignored and that the Airflow project is unwelcoming. ## ISSUE_TRIAGE_PROCESS.rst: ## @@ -30,6 +30,52 @@ to fix an issue or make an enhancement, without needing to open an issue first. This is intended to make it as easy as possible to contribute to the project. +Another important part of our Issue reporting process are also Github Discussions. +Issues should represent clear feature requests or bugs which can/should be either implemented or fixed. +Users are encouraged to open Discussions rather than Issues if there are no clear, reproducible +steps, or when they have troubleshooting problems. Review
[GitHub] [airflow] potiuk commented on pull request #27262: Strenghten a bit and clarify importance of triaging issues
potiuk commented on PR #27262: URL: https://github.com/apache/airflow/pull/27262#issuecomment-1294202399 Any more comments? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated (acc6982770 -> 550b49b418)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git from acc6982770 More resilient test for secrets masker (#27335) add 550b49b418 Skip Integration tests on Public runners if not full tests needed (#27322) No new revisions were added by this update. Summary of changes: .github/workflows/ci.yml | 5 + dev/breeze/SELECTIVE_CHECKS.md | 1 + .../airflow_breeze/commands/testing_commands.py| 32 ++- .../commands/testing_commands_config.py| 1 + .../src/airflow_breeze/utils/selective_checks.py | 24 +-- dev/breeze/tests/test_selective_checks.py | 8 + images/breeze/output-commands-hash.txt | 4 +- images/breeze/output-commands.svg | 90 - images/breeze/output_testing.svg | 22 +-- images/breeze/output_testing_tests.svg | 216 +++-- 10 files changed, 223 insertions(+), 180 deletions(-)
[GitHub] [airflow] potiuk merged pull request #27322: Skip Integration tests on Public runners if not full tests needed
potiuk merged PR #27322: URL: https://github.com/apache/airflow/pull/27322 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #27322: Skip Integration tests on Public runners if not full tests needed
potiuk commented on PR #27322: URL: https://github.com/apache/airflow/pull/27322#issuecomment-1294202074 Failures unrelated. All good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] punx120 commented on a diff in pull request #27184: SSHOperator ignores cmd_timeout (#27182)
punx120 commented on code in PR #27184: URL: https://github.com/apache/airflow/pull/27184#discussion_r1007434797 ## airflow/providers/ssh/hooks/ssh.py: ## @@ -491,9 +491,12 @@ def exec_ssh_client_command( if stdout_buffer_length > 0: agg_stdout += stdout.channel.recv(stdout_buffer_length) +timedout = False + # read from both stdout and stderr while not channel.closed or channel.recv_ready() or channel.recv_stderr_ready(): readq, _, _ = select([channel], [], [], timeout) +timedout = len(readq) == 0 Review Comment: I'm not sure - from `select` doc ``` the return value is a tuple of three lists corresponding to the first three arguments; each contains the subset of the corresponding file descriptors that are ready. ``` and we pass empty list to 2nd and 3rd args. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] kazanzhy commented on a diff in pull request #26944: Use DbApiHook.run for DbApiHook.get_records and DbApiHook.get_first
kazanzhy commented on code in PR #26944: URL: https://github.com/apache/airflow/pull/26944#discussion_r1007433847 ## airflow/providers/common/sql/hooks/sql.py: ## @@ -175,41 +207,26 @@ def get_pandas_df_by_chunks(self, sql, parameters=None, *, chunksize, **kwargs): yield from psql.read_sql(sql, con=conn, params=parameters, chunksize=chunksize, **kwargs) def get_records( -self, -sql: str | list[str], -parameters: Iterable | Mapping | None = None, -**kwargs: dict, -): +self, sql: str | list[str], parameters: Iterable | Mapping | None = None +) -> Any | list[Any]: Review Comment: I changed back from `sql: str = ""` to `sql: str | list[str] = ""`. It seems strange but without it, I can't remove `# type: ignore[override]` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #27322: Skip Integration tests on Public runners if not full tests needed
potiuk commented on PR #27322: URL: https://github.com/apache/airflow/pull/27322#issuecomment-1294152356 And added better messaging (colour and showing up the actually skipped/sequentialized test types. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated (9c73b3f7fc -> acc6982770)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git from 9c73b3f7fc Fix typo (#27327) add acc6982770 More resilient test for secrets masker (#27335) No new revisions were added by this update. Summary of changes: tests/utils/log/test_secrets_masker.py | 27 ++- 1 file changed, 2 insertions(+), 25 deletions(-)
[GitHub] [airflow] potiuk merged pull request #27335: More resilient test for secrets masker
potiuk merged PR #27335: URL: https://github.com/apache/airflow/pull/27335 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] dejii opened a new pull request, #27336: use `id` key to retrieve the dataflow job_id
dejii opened a new pull request, #27336: URL: https://github.com/apache/airflow/pull/27336 When using any of the DataflowJob sensors ([example](https://github.com/apache/airflow/blob/9c73b3f7fc1d18925d0ed09e8719f53b8147b0f2/airflow/providers/google/cloud/example_dags/example_dataflow.py#L176-L181)), the `dataflow_job_id` key is used to extract the dataflow job id from the job returned by the dataflow task. This results in the error shown below ``` jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'dataflow_job_id' ``` The key that contains the job id is `id`. [Dataflow REST API reference](https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs#Job) --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk opened a new pull request, #27335: More resilient test for secrets masker
potiuk opened a new pull request, #27335: URL: https://github.com/apache/airflow/pull/27335 The test expected exact stack trace, but we really want to check if the stacktrace contains masked passwords at all levels of context. This PR makes the test more resilient to any changes in stacktrace. --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #27333: Add examples and howtos about sensors
potiuk commented on PR #27333: URL: https://github.com/apache/airflow/pull/27333#issuecomment-1294060637 Small but I think nice :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #27322: Skip Integration tests on Public runners if not full tests needed
potiuk commented on PR #27322: URL: https://github.com/apache/airflow/pull/27322#issuecomment-1294041835 Actually I found that out that it did not work for Postgres - because it was inside "mysql', 'mssql' - moved the if outside -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk closed issue #27332: airflow tasks are getting randomly terminated with no errors in UI and logs on worker shows module not found
potiuk closed issue #27332: airflow tasks are getting randomly terminated with no errors in UI and logs on worker shows module not found URL: https://github.com/apache/airflow/issues/27332 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
potiuk commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1294034952 This is very close to what I've heard! Good one @Taragolis! And yeah PYTHONDONTWRITEBYTECODE is also my typical recommendation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk opened a new pull request, #27333: Add examples and howtos about sensors
potiuk opened a new pull request, #27333: URL: https://github.com/apache/airflow/pull/27333 The examples and docs were missing for a number of built-in sensors. This documentation and examples do not add much but at least give the user information that there are such sensors available when they look at our documentation. --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #27332: airflow tasks are getting randomly terminated with no errors in UI and logs on worker shows module not found
boring-cyborg[bot] commented on issue #27332: URL: https://github.com/apache/airflow/issues/27332#issuecomment-1294020775 Thanks for opening your first issue here! Be sure to follow the issue template! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] gurjit-sandhu opened a new issue, #27332: airflow tasks are getting randomly terminated with no errors in UI and logs on worker shows module not found
gurjit-sandhu opened a new issue, #27332: URL: https://github.com/apache/airflow/issues/27332 ### Official Helm Chart version 1.6.0 ### Apache Airflow version airflow 2 ### Kubernetes Version 1.22 ### Helm Chart configuration i have set PYTHONPATH as below in extraENV and verified after login to worker pod extraEnv: | - name: PYTHONPATH value: "/opt/airflow/dags:/opt/airflow" also have setup PYTHONPATH in docker image # Setting python path for importing dag modules ENV PYTHONPATH="/opt/airflow/dags:/opt/airflow" ### Docker Image customisations also have setup PYTHONPATH in docker image # Setting python path for importing dag modules ENV PYTHONPATH="/opt/airflow/dags:/opt/airflow" ### What happened dags are showing up on UI however after executing it get terminated and below are logs from worker showing its not able to find python modules -- if we re-run same tasks it succeeds however randomly it fails with below error module not found _execute_in_fork(command_to_exec, celery_task_id) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/celery_executor.py", line 108, in _execute_in_fork raise AirflowException(msg) airflow.exceptions.AirflowException: Celery command failed on host: airflow-worker-1.airflow-worker.airflow.svc.cluster.local with celery_task_id f672c24c-d54c-4401-a1c1-8892d54f90f4 [2022-10-27 18:48:32,001: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[1fdb829f-755a-4ac0-98c5-362e6b6c8a44] received [2022-10-27 18:48:32,011: INFO/ForkPoolWorker-15] [1fdb829f-755a-4ac0-98c5-362e6b6c8a44] Executing command in Celery: ['airflow', 'tasks', 'run', 'yipit_fsv_data_export_dag', 'verify_all_exports_were_successful', 'manual__2022-10-27T13:42:55-05:00', '--local', '--subdir', 'DAGS_FOLDER/partner_exports/data_export_dag_builder.py'] [2022-10-27 18:48:32,087: INFO/ForkPoolWorker-15] Filling up the DagBag from /opt/airflow/dags/partner_exports/data_export_dag_builder.py [2022-10-27 18:48:32,144: ERROR/ForkPoolWorker-15] Failed to import: /opt/airflow/dags/partner_exports/data_export_dag_builder.py Traceback (most recent call last): File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dagbag.py", line 317, in parse loader.exec_module(new_module) File "", line 843, in exec_module File "", line 219, in _call_with_frames_removed File "/opt/airflow/dags/partner_exports/data_export_dag_builder.py", line 20, in from utils.generic_callbacks import on_failure_callback, on_success_callback ModuleNotFoundError: No module named 'utils' [2022-10-27 18:48:32,145: ERROR/ForkPoolWorker-15] [1fdb829f-755a-4ac0-98c5-362e6b6c8a44] Failed to execute task Dag 'yipit_fsv_data_export_dag' could not be found; either it does not exist or it failed to parse.. Traceback (most recent call last): File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/celery_executor.py", line 128, in _execute_in_fork args.func(args) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 51, in command return func(*args, **kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/cli.py", line 99, in wrapper return f(*args, **kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 360, in task_run dag = get_dag(args.subdir, args.dag_id) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/cli.py", line 203, in get_dag raise AirflowException( airflow.exceptions.AirflowException: Dag 'yipit_fsv_data_export_dag' could not be found; either it does not exist or it failed to parse. ### What you think should happen instead since PYTHONPATH is set in both docker and helm chart environment variables - dag should be able to find modules ### How to reproduce _No response_ ### Anything else _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] vincbeck opened a new pull request, #27331: Fix example_emr_eks system test. Clean trust policies from the execution role
vincbeck opened a new pull request, #27331: URL: https://github.com/apache/airflow/pull/27331 Fix example_emr_eks system test. Clean trust policies from the execution role. The trust policy from the execution role gets too big each system test occurence add a new one through update-role-trust-policy. See error below ``` (LimitExceeded) when calling the UpdateAssumeRolePolicy operation: Cannot exceed quota for ACLSizePerRole: 2048 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
Taragolis commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293984562 > BTW. I've heard VERY bad things about EFS when EFS is used to share DAGs. It has profound impact on stability and performance of Airlfow if you have big number of DAGs unless you pay big bucks for IOPS. I've heard that from many people. > This is the moment when I usually STRONGLY recommend GitSync instead: https://medium.com/apache-airflow/shared-volumes-in-airflow-the-good-the-bad-and-the-ugly-22e9f681afca It's always it depends on configuration and monitoring. I personally have this issue might be in Airflow 2.1.x and I do not know is it actually related to Airflow itself or some other stuff. Work with EFS definitely take more effort rather than GitSync. Just for someone who might found this thread in the future with EFS performance degradation might help: **Disable save python bytecodes inside of NFS (AWS EFS) mount** + Mount as Read-Only + Disable Python bytecode by set `PYTHONDONTWRITEBYTECODE=x` + Or set location for bytecodes by set `PYTHONPYCACHEPREFIX` for example to `/tmp/pycaches` Throughput in mode Bursting in first looks like miracle but when all Bursting Capacity go to zero it could turn into your life into the hell. Each newly created EFS share has about 2.1 TB Bursting capacity. What could be done here: - Switch to Provisional Throughput mode permanently which might cost a lot, something like 6 USD per 1 MiB/sec without VAT - Switch to Provisional Throughput mode only when Bursting Capacity less than some amount, like 0.5 TB, and switch back when Bursting Capacity exceed limit 2.1 TB. Unfortunately there is no autoscaling so it would be manual or combination of CloudWatch Alerting + AWS Lambda. ![image](https://user-images.githubusercontent.com/3998685/198383225-2b101e42-726f-4f60-90e2-44ab3e4a1098.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ferruzzi commented on pull request #27278: Add Andrey as member of the triage team
ferruzzi commented on PR #27278: URL: https://github.com/apache/airflow/pull/27278#issuecomment-1293961451 Hey! I missed this one. Agreed, and congrats. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
potiuk commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293946748 > 5, which involves educating all current/future maintainers to understand memory nuances sweat_smile As counterintuitive as it is, I know what you are talking about :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
potiuk commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293945450 BTW. I've heard VERY bad things about EFS when EFS is used to share DAGs. It has profound impact on stability and performance of Airlfow if you have big number of DAGs unless you pay big bucks for IOPS. I've heard that from many people. This is the moment when I STRONGLY recommend GitSync instead: https://medium.com/apache-airflow/shared-volumes-in-airflow-the-good-the-bad-and-the-ugly-22e9f681afca -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] o-nikolas commented on a diff in pull request #27184: SSHOperator ignores cmd_timeout (#27182)
o-nikolas commented on code in PR #27184: URL: https://github.com/apache/airflow/pull/27184#discussion_r1007236645 ## airflow/providers/ssh/hooks/ssh.py: ## @@ -491,9 +491,12 @@ def exec_ssh_client_command( if stdout_buffer_length > 0: agg_stdout += stdout.channel.recv(stdout_buffer_length) +timedout = False + # read from both stdout and stderr while not channel.closed or channel.recv_ready() or channel.recv_stderr_ready(): readq, _, _ = select([channel], [], [], timeout) +timedout = len(readq) == 0 Review Comment: I think you need to check if **all** three results are empty to be sure a timeout has occurred. Useful link: https://stackoverflow.com/a/15195460/1055702 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ferruzzi commented on a diff in pull request #27322: Skip Integration tests on Public runners if not full tests needed
ferruzzi commented on code in PR #27322: URL: https://github.com/apache/airflow/pull/27322#discussion_r1007223939 ## dev/breeze/src/airflow_breeze/utils/selective_checks.py: ## @@ -290,7 +290,7 @@ def default_constraints_branch(self) -> str: return self._default_constraints_branch @cached_property -def _full_tests_needed(self) -> bool: Review Comment: Congrats, little method, you've been promoted. :P -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] zachliu commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
zachliu commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293904526 i'm also using AWS EFS :handshake: i think i'll try 1 & 2, they seem to be the easiest except 5, which involves educating all current/future maintainers to understand memory nuances :sweat_smile: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
potiuk commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293897683 5. Stop worrying about it. The fact that Unix cache grows has only one drawback - it will show up when you choose to in your monitoring service. You should monitor other parameters - it's perfectly OK that cache grows up until all available memory - it has no negative consequences -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
potiuk commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293895871 3. Write a custom handler to rotate the logs outside. 4. Use externa service for logging (CloudWatch etc.) for storing them remotely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #27329: Airflow with Oracle: the field dag.next_dagrun_data_interval_start shows the error ORA-00972: identifier is too long
potiuk commented on issue #27329: URL: https://github.com/apache/airflow/issues/27329#issuecomment-1293890994 We do not support Oracle. Look at prerequisites There are likely many more errors when you try to run Airflow on unsupported database and even if you fix them, they are likely to break any time. Simply don't use Oracle as metadata db. We are not going to help with solving any issues with it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk closed issue #27329: Airflow with Oracle: the field dag.next_dagrun_data_interval_start shows the error ORA-00972: identifier is too long
potiuk closed issue #27329: Airflow with Oracle: the field dag.next_dagrun_data_interval_start shows the error ORA-00972: identifier is too long URL: https://github.com/apache/airflow/issues/27329 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #27322: Skip Integration tests on Public runners if not full tests needed
potiuk commented on code in PR #27322: URL: https://github.com/apache/airflow/pull/27322#discussion_r1007204589 ## dev/breeze/src/airflow_breeze/commands/testing_commands.py: ## @@ -285,6 +286,13 @@ def run_tests_in_parallel( if test_type.startswith(heavy_test_type): test_types_list.remove(test_type) tests_to_run_sequentially.append(test_type) +if full_tests_needed: Review Comment: Obviously :facepalm: . That would actually explain why they were run last time :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] jedcunningham commented on pull request #27327: Fix typo
jedcunningham commented on PR #27327: URL: https://github.com/apache/airflow/pull/27327#issuecomment-1293885338 Thanks @bmtKIA6! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated: Fix typo (#27327)
This is an automated email from the ASF dual-hosted git repository. jedcunningham pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 9c73b3f7fc Fix typo (#27327) 9c73b3f7fc is described below commit 9c73b3f7fc1d18925d0ed09e8719f53b8147b0f2 Author: bmtKIA6 AuthorDate: Thu Oct 27 14:04:23 2022 -0400 Fix typo (#27327) --- RELEASE_NOTES.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/RELEASE_NOTES.rst b/RELEASE_NOTES.rst index 4d171f94c6..be6e0b34c4 100644 --- a/RELEASE_NOTES.rst +++ b/RELEASE_NOTES.rst @@ -158,7 +158,7 @@ pass a list of 1 or more Datasets: .. code-block:: python -with DAG(dag_id='dataset-consmer', schedule=[dataset]): +with DAG(dag_id='dataset-consumer', schedule=[dataset]): ... And to mark a task as producing a dataset pass the dataset(s) to the ``outlets`` attribute:
[GitHub] [airflow] boring-cyborg[bot] commented on pull request #27327: Fix typo
boring-cyborg[bot] commented on PR #27327: URL: https://github.com/apache/airflow/pull/27327#issuecomment-1293884947 Awesome work, congrats on your first merged pull request! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] jedcunningham merged pull request #27327: Fix typo
jedcunningham merged PR #27327: URL: https://github.com/apache/airflow/pull/27327 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
Taragolis commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293884630 I also mount default `logs` directory to NFS (AWS EFS) so I could only suggest my personal configuration which use for a long time 1. Change default dag processor manager log location outside of NFS, e.g. `AIRFLOW__LOGGING__DAG_PROCESSOR_MANAGER_LOG_LOCATION = "/tmp/airflow/logs/dag_processor_manager/dag_processor_manager.log"` 2. Increase print stats interval `AIRFLOW__SCHEDULER__PRINT_STATS_INTERVAL = 300` which could reduce final size of file -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #27214: Refactor amazon providers tests which use `moto`
potiuk commented on PR #27214: URL: https://github.com/apache/airflow/pull/27214#issuecomment-1293882239 Actually that's a good one. Maybe it has something to do with race when creating the network. The idea for this test was to test kerberos with the right address. And Problem with those kerberos tests were that they did not work with Docker2 (there was an issue about it) so we will have to fix it anyway Maybe soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
potiuk commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293847808 Nothing we can do about it :). But I am not sure if those are the culprits - accoding to the descriptions those should be removed when airflow stops keeping the file unless client crashes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
potiuk commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293846632 silly rename :rofl: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] pankajastro opened a new pull request, #27330: [Docs] Fix duplicate param in docstring RedshiftSQLHook `get_table_primary_key` method
pankajastro opened a new pull request, #27330: URL: https://github.com/apache/airflow/pull/27330 [docs only change] RedshiftSQLHook `get_table_primary_key` docs string has `table` param twice but it should have one `table` and other `schema` --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
Taragolis commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293838786 > wait i'm confused, so it is NFS design choice not to remove the cache file after it's written to an actual file? > > ![2022-10-27_12-07](https://user-images.githubusercontent.com/14293802/198342056-2a836c9b-4d02-40da-9ab2-231087e6fac6.png) https://nfs.sourceforge.net/#faq_d2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] jedcunningham commented on a diff in pull request #27322: Skip Integration tests on Public runners if not full tests needed
jedcunningham commented on code in PR #27322: URL: https://github.com/apache/airflow/pull/27322#discussion_r1007164990 ## dev/breeze/src/airflow_breeze/commands/testing_commands.py: ## @@ -285,6 +286,13 @@ def run_tests_in_parallel( if test_type.startswith(heavy_test_type): test_types_list.remove(test_type) tests_to_run_sequentially.append(test_type) +if full_tests_needed: Review Comment: Should this be `if not full_tests_needed`? We want to still run them when `full_tests_needed` is true, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] eitanme commented on pull request #27190: External task sensor fail fix
eitanme commented on PR #27190: URL: https://github.com/apache/airflow/pull/27190#issuecomment-1293802461 @potiuk because of this bug, to use the `ExternalTaskSensor` currently you must explicitly set a timeout on the sensor or your DAG will hang forever. To your point on reliance on old behavior, to workaround the bug, folks may have set that timeout to avoid an infinite hang. In those cases, fixing this bug will cause a change in the exception they receive from `AirflowSensorTimeout` to the generic `AirflowException`. If they are relying on catching the `AirflowSensorTimeout` exception subclass they may have issues though if they catch the base class they'd still be OK. Does that sound about right? What would you propose we do? I'm happy to update a changelog if I'm pointed in the right direction? Also, there are some failing checks on this PR that I don't understand. Specifically, in the Sqlite Py3.7: API Always CLI Core Integration Other Providers WWW check a test fails that I'm pretty sure I don't go anywhere near: ``` FAILED tests/jobs/test_local_task_job.py::TestLocalTaskJob::test_heartbeat_failed_fast ``` Any ideas on that front? The logs are long and I didn't see much useful in them while looking through so I wanted to ask before trying to dig deeper as I'm not super familiar with this code-base and the checks on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ejstembler commented on issue #27300: Scheduler encounters database update error, then gets stuck in endless loop, yet still shows as healthy
ejstembler commented on issue #27300: URL: https://github.com/apache/airflow/issues/27300#issuecomment-1293802439 Incidentally, two Astronomer engineers familiar with the issue: @alex-astronomer and @wolfier -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #27329: Airflow with Oracle: the field dag.next_dagrun_data_interval_start shows the error ORA-00972: identifier is too long
boring-cyborg[bot] commented on issue #27329: URL: https://github.com/apache/airflow/issues/27329#issuecomment-1293763131 Thanks for opening your first issue here! Be sure to follow the issue template! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Alaeddine22 opened a new issue, #27329: Airflow with Oracle: the field dag.next_dagrun_data_interval_start shows the error ORA-00972: identifier is too long
Alaeddine22 opened a new issue, #27329: URL: https://github.com/apache/airflow/issues/27329 ### Apache Airflow version Other Airflow 2 version (please specify below) ### What happened Hello, I installed the stable version 2.4.1 with sqlAlchemy configured for an Oracle database. when running airflow standalone i'm having the following error : `sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) ORA-00972: identifier is too long [SQL: SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_parsed_time AS dag_last_parsed_time, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.processor_subdir AS dag_processor_subdir, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval, dag.timetable_description AS dag_timetable_descriptio_1, dag.max_active_tasks AS dag_max_active_tasks, dag.max_active_runs AS dag_max_active_runs, dag.has_task_concurrency_limits AS dag_has_task_concurrency_2, dag.has_import_errors AS dag_has_import_errors, dag.next_dagrun AS dag_next_dagrun, dag.next_dagrun_data_interval_start AS dag_next_dagrun_data_int_3, dag.next_dagrun_data_interval_end AS dag_next _dagrun_data_int_4, dag.next_dagrun_create_after AS dag_next_dagrun_create_a_5, dag_tag_1.name AS dag_tag_1_name, dag_tag_1.dag_id AS dag_tag_1_dag_id, dag_schedule_dataset_ref_1.dataset_id AS dag_schedule_dataset_ref_6, dag_schedule_dataset_ref_1.dag_id AS dag_schedule_dataset_ref_7, dag_schedule_dataset_ref_1.created_at AS dag_schedule_dataset_ref_8, dag_schedule_dataset_ref_1.updated_at AS dag_schedule_dataset_ref_9, task_outlet_dataset_refe_2.dataset_id AS task_outlet_dataset_refe_a, task_outlet_dataset_refe_2.dag_id AS task_outlet_dataset_refe_b, task_outlet_dataset_refe_2.task_id AS task_outlet_dataset_refe_c, task_outlet_dataset_refe_2.created_at AS task_outlet_dataset_refe_d, task_outlet_dataset_refe_2.updated_at AS task_outlet_dataset_refe_e FROM dag LEFT OUTER JOIN dag_tag dag_tag_1 ON dag.dag_id = dag_tag_1.dag_id LEFT OUTER JOIN dag_schedule_dataset_reference dag_schedule_dataset_ref_1 ON dag.dag_id = dag_schedule_dataset_ref_1.dag_id LEFT OUTER JOIN task_outlet_dataset_reference task_outlet_dataset_refe_2 ON dag.dag_id = task_outlet_dataset_refe_2.dag_id WHERE dag.dag_id IN (:dag_id_1_1, :dag_id_1_2, :dag_id_1_3, :dag_id_1_4, :dag_id_1_5, :dag_id_1_6, :dag_id_1_7, :dag_id_1_8, :dag_id_1_9, :dag_id_1_10, :dag_id_1_11, :dag_id_1_12, :dag_id_1_13, :dag_id_1_14, :dag_id_1_15, :dag_id_1_16, :dag_id_1_17, :dag_id_1_18, :dag_id_1_19, :dag_id_1_20, :dag_id_1_21, :dag_id_1_22, :dag_id_1_23, :dag_id_1_24, :dag_id_1_25, :dag_id_1_26, :dag_id_1_27, :dag_id_1_28, :dag_id_1_29, :dag_id_1_30, :dag_id_1_31, :dag_id_1_32, :dag_id_1_33, :dag_id_1_34, :dag_id_1_35, :dag_id_1_36, :dag_id_1_37, :dag_id_1_38, :dag_id_1_39, :dag_id_1_40, :dag_id_1_41, :dag_id_1_42) FOR UPDATE OF ] [parameters: {'dag_id_1_1': 'latest_only', 'dag_id_1_2': 'example_short_circuit_operator', 'dag_id_1_3': 'example_branch_python_operator_decorator', 'dag_id_1_4': 'example_weekday_branch_operator', 'dag_id_1_5': 'example_short_circuit_decorator', 'dag_id_1_6': 'example_external_task_marker_child', 'dag_id_1_7': 'dataset_consumes_1_never_scheduled', 'dag_id_1_8': 'example_branch_datetime_operator_3', 'dag_id_1_9': 'example_trigger_controller_dag', 'dag_id_1_10': 'example_subdag_operator.section-2', 'dag_id_1_11': 'example_bash_operator', 'dag_id_1_12': 'dataset_consumes_1_and_2', 'dag_id_1_13': 'example_complex', 'dag_id_1_14': 'dataset_produces_2', 'dag_id_1_15': 'example_xcom_args_with_operators', 'dag_id_1_16': 'tutorial_taskflow_api', 'dag_id_1_17': 'example_branch_datetime_operator_2', 'dag_id_1_18': 'example_branch_datetime_operator', 'dag_id_1_19': 'example_branch_dop_operator_v3', 'dag_id_1_20': 'latest_only_with_trigger', 'dag_id_1_21': 'dataset_produces_1', 'dag_id_1_22': 'example_subdag_operator.section-1', 'dag_id_1_23': 'example_skip_dag', 'dag_id_1_24': 'tutorial_dag', 'dag_id_1_25': 'example_branch_operator', 'dag_id_1_26': 'example_external_task_marker_parent', 'dag_id_1_27': 'example_xcom_args', 'dag_id_1_28': 'example_trigger_target_dag', 'dag_id_1_29': 'dataset_consumes_unknown_never_scheduled', 'dag_id_1_30': 'dataset_consumes_1', 'dag_id_1_31': 'example_subdag_operator', 'dag_id_1_32': 'example_nested_branch_dag', 'dag_id_1_33': 'example_dag_decorator', 'dag_id_1_34': 'example_python_operator', 'dag_id_1_35': 'tutorial', 'dag_id_1_36': 'example_sla_dag', 'dag_id_1_37': 'example_task_group_decorator', 'dag_id_1_38': 'example_passing_params_via_test_command', 'dag_id_1_39':
[GitHub] [airflow] zachliu commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
zachliu commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293754845 wait i'm confused, so it is NFS design choice not to remove the cache file after it's written to an actual file? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] zachliu commented on issue #23512: Random "duplicate key value violates unique constraint" errors when initializing the postgres database
zachliu commented on issue #23512: URL: https://github.com/apache/airflow/issues/23512#issuecomment-1293752936 i checked out `2.4.2` and did ```bash wget -qO - https://github.com/apache/airflow/pull/27297.patch | git apply -v -3 ``` then built my own airflow ```bash breeze release-management prepare-airflow-package --package-format=wheel --verbose ``` then installed it ```bash pip install apache_airflow-2.4.2-py3-none-any.whl[...] --constraint ... ``` no more "duplicate key value violates unique constraint" errors :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
potiuk commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293750588 Very much so. This is the choice of using NFS to store logs :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
Taragolis commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293745606 `.nfs*` files should be related to NFS not to Airflow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] zachliu commented on pull request #27297: Fix IntegrityError during webserver startup
zachliu commented on PR #27297: URL: https://github.com/apache/airflow/pull/27297#issuecomment-1293744097 just tested, this works! :+1: :+1: :+1: :+1: :+1: :+1: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on pull request #27327: Fix typo
boring-cyborg[bot] commented on PR #27327: URL: https://github.com/apache/airflow/pull/27327#issuecomment-1293731189 Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst) Here are some useful points: - Pay attention to the quality of your code (flake8, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that. - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it. - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations. - Be patient and persistent. It might take some time to get a review or get the final approval from Committers. - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack. - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices). Apache Airflow is a community-driven project and together we are making it better . In case of doubts contact the developers at: Mailing List: d...@airflow.apache.org Slack: https://s.apache.org/airflow-slack -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] bmtKIA6 opened a new pull request, #27327: Fix typo
bmtKIA6 opened a new pull request, #27327: URL: https://github.com/apache/airflow/pull/27327 --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis commented on pull request #27322: Skip Integration tests on Public runners if not full tests needed
Taragolis commented on PR #27322: URL: https://github.com/apache/airflow/pull/27322#issuecomment-1293731595 Finger crossed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] jtommi opened a new issue, #27328: SFTPOperator throws object of type 'PlainXComArg' has no len() when using with Taskflow API
jtommi opened a new issue, #27328: URL: https://github.com/apache/airflow/issues/27328 ### Apache Airflow Provider(s) sftp ### Versions of Apache Airflow Providers apache-airflow-providers-sftp==4.1.0 ### Apache Airflow version 2.4.2 Python 3.10 ### Operating System Debian 11 (Official docker image) ### Deployment Docker-Compose ### Deployment details Base image is apache/airflow:2.4.2-python3.10 ### What happened When combining Taskflow API and SFTPOperator, it throws an exception that didn't happen with apache-airflow-providers-sftp 4.0.0 ### What you think should happen instead The DAG should work as expected ### How to reproduce ```python import pendulum from airflow import DAG from airflow.decorators import task from airflow.providers.sftp.operators.sftp import SFTPOperator with DAG( "example_sftp", schedule="@once", start_date=pendulum.datetime(2021, 1, 1, tz="UTC"), catchup=False, tags=["example"], ) as dag: @task def get_file_path(): return "test.csv" local_filepath = get_file_path() upload = SFTPOperator( task_id=f"upload_file_to_sftp", ssh_conn_id="sftp_connection", local_filepath=local_filepath, remote_filepath="test.csv", ) ``` ### Anything else ```logs [2022-10-27T15:21:38.106+]` {logging_mixin.py:120} INFO - [2022-10-27T15:21:38.102+] {dagbag.py:342} ERROR - Failed to import: /opt/airflow/dags/test.py Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/dagbag.py", line 338, in parse loader.exec_module(new_module) File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/opt/airflow/dags/test.py", line 21, in upload = SFTPOperator( File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/baseoperator.py", line 408, in apply_defaults result = func(self, **kwargs, default_args=default_args) File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/sftp/operators/sftp.py", line 116, in __init__ if len(self.local_filepath) != len(self.remote_filepath): TypeError: object of type 'PlainXComArg' has no len() ``` It looks like the offending code was introduced in commit 5f073e38dd46217b64dbc16d7b1055d89e8c3459 ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] mobuchowski commented on a diff in pull request #27113: notification: add dag run state notification system
mobuchowski commented on code in PR #27113: URL: https://github.com/apache/airflow/pull/27113#discussion_r1007072665 ## airflow/config_templates/config.yml: ## @@ -2169,6 +2169,13 @@ type: string example: ~ default: "15" +- name: enable_dagrun_listener_notifications + description: | +Enable emitting dagrun listener notifications in scheduler. + version_added: 2.5.0 + type: boolean + example: ~ + default: "False" Review Comment: We probably can work with `if there's scheduler plugin registered, just do it`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis commented on pull request #27214: Refactor amazon providers tests which use `moto`
Taragolis commented on PR #27214: URL: https://github.com/apache/airflow/pull/27214#issuecomment-1293728819 > It's interesting to see it happening in 3 jobs out of 4 as it was the case in your build. I'm just a "Lucky Guy" > I am chasing that one for a long time and I was never able to make a plausible hypothesis on why it happens and implements some workaround. But any ideas/inputs are more than welcome. I also try to figure out fist why it might happen and is it possible that changes on this PR might increase probability of this error. But only found that `Trino` and `Kerberos` the only one which define network configuration and specific ip-address https://github.com/apache/airflow/blob/12b8bc1d754ab8db1ca224cfe4ce6e34254b35d4/scripts/ci/docker-compose/integration-trino.yml#L23-L27 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] zachliu commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
zachliu commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293721092 the 2 `.nfs*` files at `~/logs/dag_processor_manager` do add up to my current cache memory usage :thinking: ![2022-10-27_11-36](https://user-images.githubusercontent.com/14293802/198334683-656930fa-553b-4145-8874-f9fe4fe45d6a.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on a diff in pull request #27113: notification: add dag run state notification system
ashb commented on code in PR #27113: URL: https://github.com/apache/airflow/pull/27113#discussion_r1007040063 ## airflow/config_templates/config.yml: ## @@ -2169,6 +2169,13 @@ type: string example: ~ default: "15" +- name: enable_dagrun_listener_notifications + description: | +Enable emitting dagrun listener notifications in scheduler. + version_added: 2.5.0 + type: boolean + example: ~ + default: "False" Review Comment: Not sure we need a config setting for this really (Thinking: we already have so many config options, adding another one should be avoided unless we really need it) ## airflow/listeners/listener.py: ## @@ -47,6 +50,15 @@ def __init__(self): def has_listeners(self) -> bool: return len(self.pm.get_plugins()) > 0 +@property +def has_scheduler_listeners(self) -> bool: +for plugin in self.pm.get_plugins(): +if inspect.ismodule(plugin): Review Comment: Can you explain what's going on here? Why do we need to check if its a module? Pluggy supports adding classes to I thought? (But mostly: why do we care? ## airflow/jobs/backfill_job.py: ## @@ -18,6 +18,7 @@ from __future__ import annotations import time +from concurrent.futures import Executor, ThreadPoolExecutor Review Comment: ```suggestion from concurrent.futures import Executor as FutureExecutor, ThreadPoolExecutor ``` (and matching else where: to avoid confusing with Airflow's own Executor class ## airflow/listeners/listener.py: ## @@ -47,6 +50,15 @@ def __init__(self): def has_listeners(self) -> bool: return len(self.pm.get_plugins()) > 0 +@property Review Comment: ```suggestion @cached_property ``` Once we've computed this once per process it can't change again. ## airflow/listeners/listener.py: ## @@ -33,6 +34,8 @@ _listener_manager = None +_scheduler_hooks = ["on_dag_run_success", "on_dag_run_failure"] Review Comment: Given these hooks are also called from backfill: ```suggestion _dagrun_hooks = ["on_dag_run_success", "on_dag_run_failure"] ``` ## airflow/jobs/scheduler_job.py: ## @@ -1568,3 +1590,21 @@ def _cleanup_stale_dags(self, session: Session = NEW_SESSION) -> None: dag.is_active = False SerializedDagModel.remove_dag(dag_id=dag.dag_id, session=session) session.flush() + +def notify_dagrun_state_changed(self, dag_run: DagRun, msg: str = ""): +if not self.enabled_dagrun_listener or not get_listener_manager().has_scheduler_listeners: +return + +if dag_run.state == DagRunState.RUNNING: +self._notification_threadpool.submit( # type: ignore[union-attr] +get_listener_manager().hook.on_dag_run_start, dag_run=dag_run, msg=msg +) +elif dag_run.state == DagRunState.SUCCESS: +self._notification_threadpool.submit( # type: ignore[union-attr] +get_listener_manager().hook.on_dag_run_success, dag_run=dag_run, msg=msg +) +elif dag_run.state == DagRunState.FAILED: +self._notification_threadpool.submit( # type: ignore[union-attr] +get_listener_manager().hook.on_dag_run_failure, dag_run=dag_run, msg=msg +) Review Comment: I'm not sure the threadpool belongs in here. If we make that the _plugin's_ responsibility then a) I don't think forcing a threadpool (of a fixed size, but that could be config driven) on users of this hook is required; b) We probably don't need `has_scheduler_listeners` anymore; b) This PR becomes a lot smaller. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ayushthe1 commented on issue #27200: Handle TODO: .first() is not None can be changed to .scalar()
ayushthe1 commented on issue #27200: URL: https://github.com/apache/airflow/issues/27200#issuecomment-1293687618 hey @potiuk ,i made a pr for this issue .Could you please review it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
potiuk commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293680414 You can always drop the whole cache to verify what causes it: https://linuxhint.com/clear_cache_linux/ Also you can do some trial/error to see which files are in the cache as explained in this answer: https://serverfault.com/questions/278454/is-it-possible-to-list-the-files-that-are-cached Seems this is not easy to get list of files which contribute to cache, but if you have some guesses you might try to find out by using fntools. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] zachliu commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
zachliu commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293668371 yeah, the root cause of this might be somewhere else. here are the facts: * setting `CONFIG_PROCESSOR_MANAGER_LOGGER=True` does make the cache increase * deleting files under `~/logs/dag_processor_manager` has no effect on cache memory usage, also there are cache files i cannot delete ![2022-10-27_11-02](https://user-images.githubusercontent.com/14293802/198326228-bb59c72d-fb38-4fdd-bcd7-8ec49582db86.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #27148: Make custom env vars optional for job templates
potiuk commented on PR #27148: URL: https://github.com/apache/airflow/pull/27148#issuecomment-1293656226 Needs conflict resilution -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #14261: Airflow Scheduler liveness probe crashing (version 2.0)
potiuk commented on issue #14261: URL: https://github.com/apache/airflow/issues/14261#issuecomment-1293655208 If you can do some analysis - look at the hostname that you got there (maybe add echo) and see if it still there in 2.4.* that would be awesome @dschneiderch (and open a new issue if it is still there for you - including some more information - log and possibly content of your jobs table) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk merged pull request #27326: Fix failing coverage info test
potiuk merged PR #27326: URL: https://github.com/apache/airflow/pull/27326 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated: Fix failing coverage info test (#27326)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 12b8bc1d75 Fix failing coverage info test (#27326) 12b8bc1d75 is described below commit 12b8bc1d754ab8db1ca224cfe4ce6e34254b35d4 Author: Jarek Potiuk AuthorDate: Thu Oct 27 16:49:28 2022 +0200 Fix failing coverage info test (#27326) The #27304 was merged with failing tests (my bad) after fixing head -> heads typo. This PR fixes the source of tests files where the typo has been also corrected. --- dev/breeze/tests/test_pr_info_files/pr_github_context.json| 2 +- dev/breeze/tests/test_pr_info_files/push_github_context.json | 2 +- dev/breeze/tests/test_pr_info_files/schedule_github_context.json | 2 +- dev/breeze/tests/test_pr_info_files/self_hosted_forced_pr.json| 2 +- dev/breeze/tests/test_pr_info_files/simple_pr.json| 2 +- dev/breeze/tests/test_pr_info_files/simple_pr_different_repo.json | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/dev/breeze/tests/test_pr_info_files/pr_github_context.json b/dev/breeze/tests/test_pr_info_files/pr_github_context.json index dab869f396..af8e7a40f1 100644 --- a/dev/breeze/tests/test_pr_info_files/pr_github_context.json +++ b/dev/breeze/tests/test_pr_info_files/pr_github_context.json @@ -31,5 +31,5 @@ } }, "ref_name": "main", -"ref": "refs/head/main" +"ref": "refs/heads/main" } diff --git a/dev/breeze/tests/test_pr_info_files/push_github_context.json b/dev/breeze/tests/test_pr_info_files/push_github_context.json index d02cf8ac00..4e04c8d3df 100644 --- a/dev/breeze/tests/test_pr_info_files/push_github_context.json +++ b/dev/breeze/tests/test_pr_info_files/push_github_context.json @@ -7,5 +7,5 @@ } }, "ref_name": "main", -"ref": "refs/head/main" +"ref": "refs/heads/main" } diff --git a/dev/breeze/tests/test_pr_info_files/schedule_github_context.json b/dev/breeze/tests/test_pr_info_files/schedule_github_context.json index a66a03dfb2..9f7fc57392 100644 --- a/dev/breeze/tests/test_pr_info_files/schedule_github_context.json +++ b/dev/breeze/tests/test_pr_info_files/schedule_github_context.json @@ -5,5 +5,5 @@ "schedule": "28 0 * * *" }, "ref_name": "main", -"ref": "refs/head/main" +"ref": "refs/heads/main" } diff --git a/dev/breeze/tests/test_pr_info_files/self_hosted_forced_pr.json b/dev/breeze/tests/test_pr_info_files/self_hosted_forced_pr.json index f118681372..153146f769 100644 --- a/dev/breeze/tests/test_pr_info_files/self_hosted_forced_pr.json +++ b/dev/breeze/tests/test_pr_info_files/self_hosted_forced_pr.json @@ -25,5 +25,5 @@ } }, "ref_name": "main", -"ref": "refs/head/main" +"ref": "refs/heads/main" } diff --git a/dev/breeze/tests/test_pr_info_files/simple_pr.json b/dev/breeze/tests/test_pr_info_files/simple_pr.json index da0fb12bb7..c7a34fbc69 100644 --- a/dev/breeze/tests/test_pr_info_files/simple_pr.json +++ b/dev/breeze/tests/test_pr_info_files/simple_pr.json @@ -22,5 +22,5 @@ } }, "ref_name": "main", -"ref": "refs/head/main" +"ref": "refs/heads/main" } diff --git a/dev/breeze/tests/test_pr_info_files/simple_pr_different_repo.json b/dev/breeze/tests/test_pr_info_files/simple_pr_different_repo.json index 8ce2f521ec..2e78021748 100644 --- a/dev/breeze/tests/test_pr_info_files/simple_pr_different_repo.json +++ b/dev/breeze/tests/test_pr_info_files/simple_pr_different_repo.json @@ -22,5 +22,5 @@ } }, "ref_name": "main", -"ref": "refs/head/main" +"ref": "refs/heads/main" }
[GitHub] [airflow] potiuk commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
potiuk commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293640759 Would be great contribution back :) ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
potiuk commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293640011 Maybe The rotating file handler has another place where it copies files and leaves them behind. Not the end of the world (as you know this is no-harm-at-all and perfecrly normal to happen. Maybe i will take a look soon (or maybe you can @zahchliu - you could see how I've done that and you could potentially iterate on it and verify it in your test system and make a PR after you test it ? How about that? Also there are ways you can check if this might be the cause. Just delete the rotated files and see if that causes drrop in cache memory used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ei-grad commented on a diff in pull request #23720: Fix backfill queued task getting reset to scheduled state.
ei-grad commented on code in PR #23720: URL: https://github.com/apache/airflow/pull/23720#discussion_r1006973733 ## airflow/executors/kubernetes_executor.py: ## @@ -464,7 +464,9 @@ def clear_not_launched_queued_tasks(self, session=None) -> None: if not self.kube_client: raise AirflowException(NOT_STARTED_MESSAGE) -query = session.query(TaskInstance).filter(TaskInstance.state == State.QUEUED) +query = session.query(TaskInstance).filter( +TaskInstance.state == State.QUEUED, TaskInstance.queued_by_job_id == self.job_id +) Review Comment: Would it be possible to have more than one backfill running then? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk closed issue #11085: Airflow Elasticsearch configuration log output does not contain required elements
potiuk closed issue #11085: Airflow Elasticsearch configuration log output does not contain required elements URL: https://github.com/apache/airflow/issues/11085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #11085: Airflow Elasticsearch configuration log output does not contain required elements
potiuk commented on issue #11085: URL: https://github.com/apache/airflow/issues/11085#issuecomment-1293633447 Working fine in Aiflow 2.4.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #11085: Airflow Elasticsearch configuration log output does not contain required elements
potiuk commented on issue #11085: URL: https://github.com/apache/airflow/issues/11085#issuecomment-1293632868 Told ya :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk closed issue #26566: Have SLA docs reflect reality
potiuk closed issue #26566: Have SLA docs reflect reality URL: https://github.com/apache/airflow/issues/26566 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk merged pull request #27111: Update SLA wording to reflect it is relative to Dag Run start
potiuk merged PR #27111: URL: https://github.com/apache/airflow/pull/27111 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated: Update SLA wording to reflect it is relative to Dag Run start. (#27111)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 639210a7e0 Update SLA wording to reflect it is relative to Dag Run start. (#27111) 639210a7e0 is described below commit 639210a7e0bfc3f04f28c7d7278292d2cae7234b Author: Damian Shaw <111310636+notatallshaw-...@users.noreply.github.com> AuthorDate: Thu Oct 27 10:34:57 2022 -0400 Update SLA wording to reflect it is relative to Dag Run start. (#27111) --- docs/apache-airflow/concepts/tasks.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/apache-airflow/concepts/tasks.rst b/docs/apache-airflow/concepts/tasks.rst index 63fbe818e0..c3f9d1de3b 100644 --- a/docs/apache-airflow/concepts/tasks.rst +++ b/docs/apache-airflow/concepts/tasks.rst @@ -158,7 +158,7 @@ If you merely want to be notified if a task runs over but still let it run to co SLAs -An SLA, or a Service Level Agreement, is an expectation for the maximum time a Task should take. If a task takes longer than this to run, it is then visible in the "SLA Misses" part of the user interface, as well as going out in an email of all tasks that missed their SLA. +An SLA, or a Service Level Agreement, is an expectation for the maximum time a Task should be completed relative to the Dag Run start time. If a task takes longer than this to run, it is then visible in the "SLA Misses" part of the user interface, as well as going out in an email of all tasks that missed their SLA. Tasks over their SLA are not cancelled, though - they are allowed to run to completion. If you want to cancel a task after a certain runtime is reached, you want :ref:`concepts:timeouts` instead.
[GitHub] [airflow] zachliu commented on issue #27065: Log files are still being cached causing ever-growing memory usage when scheduler is running
zachliu commented on issue #27065: URL: https://github.com/apache/airflow/issues/27065#issuecomment-1293620996 @potiuk the cache memory is still growing :crying_cat_face: ![2022-10-27_10-32](https://user-images.githubusercontent.com/14293802/198315723-c48d12b0-314d-4459-b485-dce1e169940a.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #27326: Fix failing coverage info test
potiuk commented on PR #27326: URL: https://github.com/apache/airflow/pull/27326#issuecomment-1293618599 Tests are great - need an approval to unbreak main :pray: (few lines only) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated (5e6cec849a -> 671029bebc)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git from 5e6cec849a Update google_analytics.html (#27226) add 671029bebc Refactor amazon providers tests which use `moto` (#27214) No new revisions were added by this update. Summary of changes: setup.py | 2 +- tests/providers/amazon/aws/hooks/conftest.py | 34 tests/providers/amazon/aws/hooks/test_base_aws.py | 16 +--- .../amazon/aws/hooks/test_cloud_formation.py | 13 +-- tests/providers/amazon/aws/hooks/test_ec2.py | 96 -- tests/providers/amazon/aws/hooks/test_ecs.py | 6 -- tests/providers/amazon/aws/hooks/test_eks.py | 15 +--- tests/providers/amazon/aws/hooks/test_emr.py | 13 ++- tests/providers/amazon/aws/hooks/test_glue.py | 11 +-- .../amazon/aws/hooks/test_glue_catalog.py | 28 +-- tests/providers/amazon/aws/hooks/test_kinesis.py | 11 +-- .../amazon/aws/hooks/test_lambda_function.py | 10 +-- tests/providers/amazon/aws/hooks/test_logs.py | 21 ++--- .../amazon/aws/hooks/test_redshift_cluster.py | 16 +--- tests/providers/amazon/aws/hooks/test_s3.py| 18 ++-- .../amazon/aws/hooks/test_secrets_manager.py | 19 + tests/providers/amazon/aws/hooks/test_sns.py | 28 ++- tests/providers/amazon/aws/hooks/test_sqs.py | 10 +-- .../amazon/aws/hooks/test_step_function.py | 14 +--- .../amazon/aws/log/test_cloudwatch_task_handler.py | 12 +-- .../amazon/aws/log/test_s3_task_handler.py | 14 +--- tests/providers/amazon/aws/operators/test_ec2.py | 34 tests/providers/amazon/aws/operators/test_ecs.py | 54 +--- tests/providers/amazon/aws/operators/test_rds.py | 18 +--- .../amazon/aws/sensors/test_cloud_formation.py | 21 ++--- tests/providers/amazon/aws/sensors/test_ec2.py | 36 .../aws/sensors/test_glue_catalog_partition.py | 11 +-- tests/providers/amazon/aws/sensors/test_rds.py | 10 +-- .../amazon/aws/sensors/test_redshift_cluster.py| 13 +-- .../amazon/aws/system/utils/test_helpers.py| 7 +- .../amazon/aws/transfers/test_gcs_to_s3.py | 20 + .../amazon/aws/utils/eks_test_constants.py | 1 - tests/providers/amazon/conftest.py | 61 ++ 33 files changed, 246 insertions(+), 447 deletions(-) delete mode 100644 tests/providers/amazon/aws/hooks/conftest.py create mode 100644 tests/providers/amazon/conftest.py
[GitHub] [airflow] potiuk merged pull request #27214: Refactor amazon providers tests which use `moto`
potiuk merged PR #27214: URL: https://github.com/apache/airflow/pull/27214 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #27214: Refactor amazon providers tests which use `moto`
potiuk commented on PR #27214: URL: https://github.com/apache/airflow/pull/27214#issuecomment-1293616490 Yeah - re-running the jobs fixed it :( . Now I re-run #27322 on public runners to see the exclusion working. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] BobDu commented on a diff in pull request #27316: [docs] best-practices add use variable with template example.
BobDu commented on code in PR #27316: URL: https://github.com/apache/airflow/pull/27316#discussion_r1006950778 ## docs/apache-airflow/best-practices.rst: ## @@ -213,6 +213,30 @@ or if you need to deserialize a json object from the variable : {{ var.json. }} +Ensure use variable with template in operator, not get it in top level code. + +Bad example: + +.. code-block:: python + +from airflow.models import Variable + +foo_var = Variable.get("foo") +bash_use_variable_bad = BashOperator( +task_id="bash_use_variable_bad", bash_command="echo variable foo=${foo_env}", env={"foo_env": foo_var} +) + +Good example: + +.. code-block:: python + +bash_use_variable_good = BashOperator( +task_id="bash_use_variable_good", +bash_command="echo variable foo=${foo_env}", +env={"foo_env": "{{ var.value.get('foo') }}"}, +) + + Review Comment: ? `bash_command="echo variable foo=${Variable.get('foo')}"` is not a effective syntax. Are you want to say `bash_command=f"echo variable foo={Variable.get('foo')}"` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk opened a new pull request, #27326: Fix failing coverage info test
potiuk opened a new pull request, #27326: URL: https://github.com/apache/airflow/pull/27326 The #27304 was merged with failing tests (my bad) after fixing head -> heads typo. This PR fixes the source of tests files where the typo has been also corrected. --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] caupetit-itf commented on issue #11085: Airflow Elasticsearch configuration log output does not contain required elements
caupetit-itf commented on issue #11085: URL: https://github.com/apache/airflow/issues/11085#issuecomment-1293604452 I've juste tested from the docker image 2.4.2-python3.10 No more endless loop and can see my end_of_log with a log_id in elasticsearch ! So for me the problem is resolved :) I will update to the latest version first thing next time : ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #27304: Fix coverage upload step
potiuk commented on PR #27304: URL: https://github.com/apache/airflow/pull/27304#issuecomment-1293598403 Ah I have not seen the test failing . Bad me . FGixing it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk closed issue #27324: Add `held` to possible TaskInstanceState
potiuk closed issue #27324: Add `held` to possible TaskInstanceState URL: https://github.com/apache/airflow/issues/27324 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] cdabella opened a new issue, #27324: Add `held` to possible TaskInstanceState
cdabella opened a new issue, #27324: URL: https://github.com/apache/airflow/issues/27324 ### Description Add `held` as a TaskInstanceState which functions similarly to `failed` but represents a stop in DAG execution that is known and planned. ### Use case/motivation Many DAGs and pipelines have steps that require human intervention and sign-off before continuing, like manual data validation or manager approval before continuation. This can be functionally achieved today by marking a task as `failed`, but operationally overloading the meaning of `failed` can cause issues with Ops/monitoring/alerting that may not have the complete picture to know whether a task has truly failed or has been marked `failed` by design. Adding an additional TaskInstanceState which represents putting a task on-hold improves clarity in DAG design while achieving the functional goal of stopping a DAGRun from continuing. ### Related issues _No response_ ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on pull request #27323: Handle the todo part and replaced .first() is not None to .scalar()
boring-cyborg[bot] commented on PR #27323: URL: https://github.com/apache/airflow/pull/27323#issuecomment-1293542841 Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst) Here are some useful points: - Pay attention to the quality of your code (flake8, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that. - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it. - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations. - Be patient and persistent. It might take some time to get a review or get the final approval from Committers. - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack. - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices). Apache Airflow is a community-driven project and together we are making it better . In case of doubts contact the developers at: Mailing List: d...@airflow.apache.org Slack: https://s.apache.org/airflow-slack -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ayushthe1 opened a new pull request, #27323: Handle the todo part and replaced .first() is not None to .scalar()
ayushthe1 opened a new pull request, #27323: URL: https://github.com/apache/airflow/pull/27323 closes #27200 : Changed `.first() is not None` to `.scalar()` in the todo section of [file](https://github.com/apache/airflow/blob/d67ac5932dabbf06ae733fc57b48491a8029b8c2/airflow/models/serialized_dag.py#L156-L158) --- -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ejstembler commented on issue #27300: Scheduler encounters database update error, then gets stuck in endless loop, yet still shows as healthy
ejstembler commented on issue #27300: URL: https://github.com/apache/airflow/issues/27300#issuecomment-1293542042 ``` [2022-10-24 22:11:55,940] {scheduler_job.py:768} ERROR - Exception when executing SchedulerJob._run_scheduler_loop Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line 751, in _execute self._run_scheduler_loop() File "/usr/local/lib/python3.9/site-packages/astronomer/airflow/version_check/plugin.py", line 29, in run_before fn(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line 839, in _run_scheduler_loop num_queued_tis = self._do_scheduling(session) File "/usr/local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line 911, in _do_scheduling self._create_dagruns_for_dags(guard, session) File "/usr/local/lib/python3.9/site-packages/airflow/utils/retries.py", line 76, in wrapped_function for attempt in run_with_db_retries(max_retries=retries, logger=logger, **retry_kwargs): File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 382, in __iter__ do = self.iter(retry_state=retry_state) File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 349, in iter return fut.result() File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result return self.__get_result() File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result raise self._exception File "/usr/local/lib/python3.9/site-packages/airflow/utils/retries.py", line 85, in wrapped_function return func(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line 979, in _create_dagruns_for_dags self._create_dag_runs(query.all(), session) File "/usr/local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line 1029, in _create_dag_runs dag.create_dagrun( File "/usr/local/lib/python3.9/site-packages/airflow/utils/session.py", line 68, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/airflow/models/dag.py", line 2384, in create_dagrun session.flush() File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3255, in flush self._flush(objects) File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3395, in _flush transaction.rollback(_capture_exception=True) File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__ compat.raise_( File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 211, in raise_ raise exception File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3355, in _flush flush_context.execute() File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 453, in execute rec.execute(self) File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 627, in execute util.preloaded.orm_persistence.save_obj( File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 234, in save_obj _emit_update_statements( File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 1032, in _emit_update_statements raise orm_exc.StaleDataError( sqlalchemy.orm.exc.StaleDataError: UPDATE statement on table 'dag' expected to update 1 row(s); 0 were matched. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] notatallshaw-gts commented on pull request #27111: Update SLA wording to reflect it is relative to Dag Run start
notatallshaw-gts commented on PR #27111: URL: https://github.com/apache/airflow/pull/27111#issuecomment-1293526817 @potiuk Whenever you get a chance, any objections? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated: Update google_analytics.html (#27226)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 5e6cec849a Update google_analytics.html (#27226) 5e6cec849a is described below commit 5e6cec849a5fa90967df1447aba9521f1cfff3d0 Author: oleg-ruban <54796035+oleg-ru...@users.noreply.github.com> AuthorDate: Thu Oct 27 16:25:47 2022 +0300 Update google_analytics.html (#27226) fix bug #27225 - Tracking User Activity Issue: Google Analytics tag version is not up-to-date https://github.com/apache/airflow/issues/27225 --- airflow/www/templates/analytics/google_analytics.html | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/airflow/www/templates/analytics/google_analytics.html b/airflow/www/templates/analytics/google_analytics.html index ab661a05b6..379f32f930 100644 --- a/airflow/www/templates/analytics/google_analytics.html +++ b/airflow/www/templates/analytics/google_analytics.html @@ -17,12 +17,12 @@ under the License. #} + +https://www.googletagmanager.com/gtag/js?id={{ analytics_id }}"> - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); + window.dataLayer = window.dataLayer || []; + function gtag(){dataLayer.push(arguments);} + gtag('js', new Date()); - ga('create', '{{ analytics_id }}', 'auto'); - ga('send', 'pageview'); + gtag('config', '{{ analytics_id }}');