[GitHub] [airflow] potiuk commented on issue #29648: Optionally exclude task deferral time from overall runtime of the task
potiuk commented on issue #29648: URL: https://github.com/apache/airflow/issues/29648#issuecomment-1438000722 I am inclined to close it as "won't do" :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis commented on pull request #29616: Refactor docker-compose quick start test
Taragolis commented on PR #29616: URL: https://github.com/apache/airflow/pull/29616#issuecomment-1437988917 Something new 😮 ```console airflow-scheduler_1 | airflow-scheduler_1 | BACKEND=redis airflow-scheduler_1 | DB_HOST=redis airflow-scheduler_1 | DB_PORT=6379 airflow-scheduler_1 | airflow-scheduler_1 | /home/airflow/.local/lib/python3.7/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to "sqlalchemy<2.0". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9) airflow-scheduler_1 | _ airflow-scheduler_1 | |__( )_ __/__ / __ airflow-scheduler_1 | /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / / airflow-scheduler_1 | ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ / airflow-scheduler_1 | _/_/ |_/_/ /_//_//_/ \//|__/ airflow-scheduler_1 | [2023-02-21T07:28:42.954+] {executor_loader.py:114} INFO - Loaded executor: CeleryExecutor airflow-scheduler_1 | [2023-02-21T07:28:43.045+] {scheduler_job.py:724} INFO - Starting the scheduler airflow-scheduler_1 | [2023-02-21T07:28:43.054+] {scheduler_job.py:731} INFO - Processing each file at most -1 times airflow-scheduler_1 | [2023-02-21T07:28:43.061+] {manager.py:164} INFO - Launched DagFileProcessorManager with pid: 32 airflow-scheduler_1 | [2023-02-21T07:28:43.071+] {scheduler_job.py:1437} INFO - Resetting orphaned tasks for active dag runs airflow-scheduler_1 | [2023-02-21T07:28:43.080+] {settings.py:61} INFO - Configured default timezone Timezone('UTC') airflow-scheduler_1 | [2023-02-21T07:28:46.099+] {scheduler_job.py:788} ERROR - Exception when executing SchedulerJob._run_scheduler_loop airflow-scheduler_1 | Traceback (most recent call last): airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 771, in _execute airflow-scheduler_1 | self._run_scheduler_loop() airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 899, in _run_scheduler_loop airflow-scheduler_1 | num_queued_tis = self._do_scheduling(session) airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1006, in _do_scheduling airflow-scheduler_1 | num_queued_tis = self._critical_section_enqueue_task_instances(session=session) airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 589, in _critical_section_enqueue_task_instances airflow-scheduler_1 | queued_tis = self._executable_task_instances_to_queued(max_tis, session=session) airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 280, in _executable_task_instances_to_queued airflow-scheduler_1 | pools = Pool.slots_stats(lock_rows=True, session=session) airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/session.py", line 73, in wrapper airflow-scheduler_1 | return func(*args, **kwargs) airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/pool.py", line 174, in slots_stats airflow-scheduler_1 | .group_by(TaskInstance.pool, TaskInstance.state) airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 2773, in all airflow-scheduler_1 | return self._iter().all() airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/result.py", line 1129, in all airflow-scheduler_1 | return self._allrows() airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/result.py", line 401, in _allrows airflow-scheduler_1 | rows = self._fetchall_impl() airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/result.py", line 1813, in _fetchall_impl airflow-scheduler_1 | return list(self.iterator) airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/loading.py", line 147, in chunks airflow-scheduler_1 | fetch = cursor._raw_all_rows() airflow-scheduler_1 | File "/home/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/result.py", l
[GitHub] [airflow] eladkal commented on issue #29607: Status of testing Providers that were prepared on February 18, 2023
eladkal commented on issue #29607: URL: https://github.com/apache/airflow/issues/29607#issuecomment-1437961577 > #29552 Tested and it works. From my perspective there are 2 issues. > > * proper documentation on how to use `drive_folder` and `folder_id` arguments together (as they can be confusing) > * log doesn't print the exact path where the file will be located. (I believe log shouldn't print the path, or print it correctly, is it worth making X amount of calls and go up the folder tree, since google returns only parent directory. > > I am happy to address (in this or next release) them just let me know your thoughts Feel free to raise PR to adress this since these are not regressions it will not block this release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Lee-W closed pull request #29649: Livy operator async (fixing failed test in PR 29047)
Lee-W closed pull request #29649: Livy operator async (fixing failed test in PR 29047) URL: https://github.com/apache/airflow/pull/29649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Lee-W commented on pull request #29649: Livy operator async (fixing failed test in PR 29047)
Lee-W commented on PR #29649: URL: https://github.com/apache/airflow/pull/29649#issuecomment-1437959795 Thanks @potiuk . I'll close this one and rebase on #29047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #29649: Livy operator async (fixing failed test in PR 29047)
potiuk commented on PR #29649: URL: https://github.com/apache/airflow/pull/29649#issuecomment-1437955076 Please rebase. I fixed some "queue" stuck for our CI builds and you need to rebase to re-run tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #29644: Remove <2.0.0 limit on google-cloud-bigtable
potiuk commented on PR #29644: URL: https://github.com/apache/airflow/pull/29644#issuecomment-1437952272 > @uranusjr think I got everything. Last test failed due to 2 hour queue timeout. Yeah - we had some "blocked" build queue - I unblocked it now. Rebasing to re-run. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] aru-trackunit commented on issue #29607: Status of testing Providers that were prepared on February 18, 2023
aru-trackunit commented on issue #29607: URL: https://github.com/apache/airflow/issues/29607#issuecomment-1437952596 #29552 Tested and it works. From my perspective there are 2 issues. - proper documentation on how to use `drive_folder` and `folder_id` arguments together (as they can be confusing) - log doesn't print the exact path where the file will be located. (I believe log shouldn't print the path, or print it correctly, is it worth making X amount of calls and go up the folder tree, since google returns only parent directory. I am happy to address (in this or next release) them just let me know your thoughts -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] r-richmond commented on pull request #29644: Remove <2.0.0 limit on google-cloud-bigtable
r-richmond commented on PR #29644: URL: https://github.com/apache/airflow/pull/29644#issuecomment-1437945839 @uranusjr think I got everything. Last test failed due to 2 hour queue timeout. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow-ci-infra] branch main updated: Update docker runner to 302.1
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow-ci-infra.git The following commit(s) were added to refs/heads/main by this push: new 80680f8 Update docker runner to 302.1 80680f8 is described below commit 80680f83dca6c63981025855142770518e23e714 Author: Jarek Potiuk AuthorDate: Tue Feb 21 07:39:13 2023 +0100 Update docker runner to 302.1 --- github-runner-ami/packer/vars/variables.pkrvars.hcl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/github-runner-ami/packer/vars/variables.pkrvars.hcl b/github-runner-ami/packer/vars/variables.pkrvars.hcl index 519b4ce..66547db 100644 --- a/github-runner-ami/packer/vars/variables.pkrvars.hcl +++ b/github-runner-ami/packer/vars/variables.pkrvars.hcl @@ -19,5 +19,5 @@ vpc_id = "vpc-d73487bd" ami_name = "airflow-runner-ami" aws_regions = ["eu-central-1", "us-east-2"] packer_role_arn = "arn:aws:iam::827901512104:role/packer-role" -runner_version = "2.301.1-airflow1" +runner_version = "2.302.1-airflow1" session_manager_instance_profile_name = "packer_ssm_instance_profile"
[GitHub] [airflow] potiuk closed issue #29621: Fix adding annotations for dag persistence PVC
potiuk closed issue #29621: Fix adding annotations for dag persistence PVC URL: https://github.com/apache/airflow/issues/29621 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk closed issue #28745: annotations in logs pvc
potiuk closed issue #28745: annotations in logs pvc URL: https://github.com/apache/airflow/issues/28745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #25297: on_failure_callback is not called when task is terminated externally
potiuk commented on issue #25297: URL: https://github.com/apache/airflow/issues/25297#issuecomment-1437923131 @seub - please provide a reproducible case and logs if you would like to report it. It might be something specific for your case, and to be brutally honest it's much easier for a reporter of one issue to provide the evidence rather than maintainers who look at 100s of issues and PRs a day to try to reproduce an issue that someone claims is happening without shouwing their evidences. Please be empathetic towards maintainers who have a lot of things to handle. If you report "it does not work for me" for an issue that is already closed, you will nearly universaly be asked to report the issue and provide new evidences in a new issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #29648: Optionally exclude task deferral time from overall runtime of the task
potiuk commented on issue #29648: URL: https://github.com/apache/airflow/issues/29648#issuecomment-1437918635 > As an alternative and maybe simpler feature it'd also suit my use case if we could make it so that we are optionally able to "reset" the execution timer when the task resumes from deferral. Does that sound more reasonable? Not really - because the task can be resumed from deferral for different reasons. Many od the deferred tasks will have several different resumable functions - and might go through different stages. For example they can run few times resume in function a, different number of times in function b (both quick usually) , then they can run for a long time in function c and then make another several times resume in function d (also quick usually). Each of those resumed runs would have different expectations on how long they can run on its own - some of the checks will be quick. some of them might take longer. So if you would like to have timeout separately for ach resume, you would have to add a feature of having different timeout configured for each resumed function. And that get's rather complex to configure, manage, and to keep control of. I think the cumulative time is the only "reasonable" approach but as @andrewgodwin confirmed - it's non-trivial and inconsistent with sensors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] andrewgodwin commented on issue #29648: Optionally exclude task deferral time from overall runtime of the task
andrewgodwin commented on issue #29648: URL: https://github.com/apache/airflow/issues/29648#issuecomment-1437917495 Yes, the changes required here are non-trivial - you'd need some sort of extra column to track cumulative runtime and then have the workers add values to that whenever they finished running a task section. Even if we could track such timing, I'd hesitate to then make timeouts use it, as that's breaking a core assumption about how those work - would we apply the same logic to rescheduling sensors, which act the same way as deferred operators? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #29616: Refactor docker-compose quick start test
potiuk commented on PR #29616: URL: https://github.com/apache/airflow/pull/29616#issuecomment-1437911170 > That is more interesting. I've seen before that statics checks sometimes failed with particular this version of docker 20.10.23+azure-2 and didn't seen that this happen in docker without azure-X. Those tests seems to only fail on public runners (that's where azure-* is added in version) - so I think this is just coincidence. > Thats all, seems like it scheduler is just hang but service reported that it healthy. Is this a problem with recent changes in health check https://github.com/apache/airflow/pull/29408 or maybe problem with simple http server in scheduler? I agree. I think the problem was triggered by #29408 - but It would be great to get to the bottom of it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on a diff in pull request #29616: Refactor docker-compose quick start test
uranusjr commented on code in PR #29616: URL: https://github.com/apache/airflow/pull/29616#discussion_r1112580793 ## docker_tests/test_docker_compose_quick_start.py: ## @@ -114,53 +104,60 @@ def wait_for_terminal_dag_state(dag_id, dag_run_id): break -def test_trigger_dag_and_wait_for_result(): +def test_trigger_dag_and_wait_for_result(tmp_path_factory, monkeypatch): +"""Simple test which reproduce setup docker-compose environment and trigger example dag.""" +tmp_dir = tmp_path_factory.mktemp("airflow-quick-start") +monkeypatch.chdir(tmp_dir) +monkeypatch.setenv("AIRFLOW_IMAGE_NAME", docker_image) +monkeypatch.setenv("COMPOSE_PROJECT_NAME", "quick-start") + compose_file_path = ( SOURCE_ROOT / "docs" / "apache-airflow" / "howto" / "docker-compose" / "docker-compose.yaml" ) +copyfile(compose_file_path, tmp_dir / "docker-compose.yaml") -with tempfile.TemporaryDirectory() as tmp_dir, tmp_chdir(tmp_dir), mock.patch.dict( -"os.environ", AIRFLOW_IMAGE_NAME=docker_image -): -copyfile(str(compose_file_path), f"{tmp_dir}/docker-compose.yaml") -os.mkdir(f"{tmp_dir}/dags") -os.mkdir(f"{tmp_dir}/logs") -os.mkdir(f"{tmp_dir}/plugins") -(Path(tmp_dir) / ".env").write_text(f"AIRFLOW_UID={subprocess.check_output(['id', '-u']).decode()}\n") -print(".emv=", (Path(tmp_dir) / ".env").read_text()) -copyfile( -str(SOURCE_ROOT / "airflow" / "example_dags" / "example_bash_operator.py"), -f"{tmp_dir}/dags/example_bash_operator.py", -) +# Create required directories for docker compose quick start howto +for subdir in ("dags", "logs", "plugins"): +(tmp_dir / subdir).mkdir() + +dot_env_file = Path(tmp_dir) / ".env" Review Comment: ```suggestion dot_env_file = Path(tmp_dir, ".env") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #29616: Refactor docker-compose quick start test
potiuk commented on PR #29616: URL: https://github.com/apache/airflow/pull/29616#issuecomment-1437905137 Closed/Reopened to re-run. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk closed pull request #29616: Refactor docker-compose quick start test
potiuk closed pull request #29616: Refactor docker-compose quick start test URL: https://github.com/apache/airflow/pull/29616 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] rkarish closed pull request #29034: Add functionality for operators to template all eligible fields (apac…
rkarish closed pull request #29034: Add functionality for operators to template all eligible fields (apac… URL: https://github.com/apache/airflow/pull/29034 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] pshrivastava27 commented on issue #29432: Jinja templating doesn't work with container_resources when using dymanic task mapping with Kubernetes Pod Operator
pshrivastava27 commented on issue #29432: URL: https://github.com/apache/airflow/issues/29432#issuecomment-1437887986 > @jose-lpa using `Variable.get()` in the dag script is not recommended because the `DagFileProcessor` process the script each X minutes, and it loads this variable from the DB. Also this solution works with Variable but not all the other jinja templates. > > @pshrivastava27 here is a solution for your need > > ```python > import datetime > import os > > from airflow import models > from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import ( > KubernetesPodOperator, > ) > from kubernetes.client import models as k8s_models > > dvt_image = os.environ.get("DVT_IMAGE", "dev") > > default_dag_args = {"start_date": datetime.datetime(2022, 1, 1)} > > > class PatchedResourceRequirements(k8s_models.V1ResourceRequirements): > template_fields = ("limits", "requests") > > > def pod_mem(): > return "4000M" > > > def pod_cpu(): > return "1000m" > > > with models.DAG( > "sample_dag", > schedule_interval=None, > default_args=default_dag_args, > render_template_as_native_obj=True, > user_defined_macros={ > "pod_mem": pod_mem, > "pod_cpu": pod_cpu, > }, > ) as dag: > > task_1 = KubernetesPodOperator( > task_id="task_1", > name="task_1", > namespace="default", > image=dvt_image, > cmds=["bash", "-cx"], > arguments=["echo hello"], > service_account_name="sa-k8s", > container_resources=PatchedResourceRequirements( > limits={ > "memory": "{{ pod_mem() }}", > "cpu": "{{ pod_cpu() }}", > } > ), > startup_timeout_seconds=1800, > get_logs=True, > image_pull_policy="Always", > config_file="/home/airflow/composer_kube_config", > dag=dag, > ) > > task_2 = KubernetesPodOperator.partial( > task_id="task_2", > name="task_2", > namespace="default", > image=dvt_image, > cmds=["bash", "-cx"], > service_account_name="sa-k8s", > container_resources=PatchedResourceRequirements( > limits={ > "memory": "{{ pod_mem() }}", > "cpu": "{{ pod_cpu() }}", > } > ), > startup_timeout_seconds=1800, > get_logs=True, > image_pull_policy="Always", > config_file="/home/airflow/composer_kube_config", > dag=dag, > ).expand(arguments=[["echo hello"]]) > > task_1 >> task_2 > ``` > > You can do the same for the other classes if needed. Thanks for the help! @hussein-awala -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on a diff in pull request #29625: Aggressively cache entry points in process
uranusjr commented on code in PR #29625: URL: https://github.com/apache/airflow/pull/29625#discussion_r1112560468 ## airflow/utils/entry_points.py: ## @@ -28,26 +30,33 @@ log = logging.getLogger(__name__) +EPnD = Tuple[metadata.EntryPoint, metadata.Distribution] -def entry_points_with_dist(group: str) -> Iterator[tuple[metadata.EntryPoint, metadata.Distribution]]: -"""Retrieve entry points of the given group. - -This is like the ``entry_points()`` function from importlib.metadata, -except it also returns the distribution the entry_point was loaded from. -:param group: Filter results to only this entrypoint group -:return: Generator of (EntryPoint, Distribution) objects for the specified groups -""" +@functools.lru_cache(maxsize=None) +def _get_grouped_entry_points() -> dict[str, list[EPnD]]: loaded: set[str] = set() +mapping: dict[str, list[EPnD]] = collections.defaultdict(list) for dist in metadata.distributions(): try: key = canonicalize_name(dist.metadata["Name"]) Review Comment: I checked the callers and currently this is used by loading `airflow.plugins` and `apache_airflow_provider`. Both of these implement their own deduplication logic, so I think it is safe to remove this entirely. Although this would not actually help the latter case, which still accesses `metadata` anyway… -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] amoghrajesh commented on issue #29322: DAG list, sorting lost when switching page
amoghrajesh commented on issue #29322: URL: https://github.com/apache/airflow/issues/29322#issuecomment-1437874203 @bbovenzi I need some help understanding how to fix this issue. Please refer to my last comment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] amoghrajesh commented on issue #29621: Fix adding annotations for dag persistence PVC
amoghrajesh commented on issue #29621: URL: https://github.com/apache/airflow/issues/29621#issuecomment-1437873701 @potiuk we can close this issue. The PR is merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] amoghrajesh commented on issue #28745: annotations in logs pvc
amoghrajesh commented on issue #28745: URL: https://github.com/apache/airflow/issues/28745#issuecomment-1437873467 @potiuk we can close this. It has been merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Lee-W opened a new pull request, #29649: Livy operator async (fixing failed test in PR 29047)
Lee-W opened a new pull request, #29649: URL: https://github.com/apache/airflow/pull/29649 --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] seub commented on issue #25297: on_failure_callback is not called when task is terminated externally
seub commented on issue #25297: URL: https://github.com/apache/airflow/issues/25297#issuecomment-1437822040 @potiuk It's the same issue, and it's easy to reproduce: just interrupt a task manually (in the UI) and observe that `on_failure_callback` is not called. Please try and let me know if it's working for you or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] josh-fell commented on issue #29607: Status of testing Providers that were prepared on February 18, 2023
josh-fell commented on issue #29607: URL: https://github.com/apache/airflow/issues/29607#issuecomment-1437817901 #29565 looks good. Thanks for organizing @eladkal! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] joshuaghezzi commented on issue #29648: Optionally exclude task deferral time from overall runtime of the task
joshuaghezzi commented on issue #29648: URL: https://github.com/apache/airflow/issues/29648#issuecomment-1437787575 As an alternative and maybe simpler feature it'd also suit my use case if we could make it so that we are optionally able to "reset" the execution timer when the task resumes from deferral. Does that sound more reasonable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] hussein-awala commented on pull request #29498: add missing read for K8S config file from conn in deferred `KubernetesPodOperator`
hussein-awala commented on PR #29498: URL: https://github.com/apache/airflow/pull/29498#issuecomment-1437703231 > Hi! May i ask in which format you will pass the config file to trigger? So it will be just a file passed as a parameter to trigger? Or how? @VladaZakharova - Yes, I pass the file path and let the triggerer loads it. Can you check my last commit? BTW, I am not sure if loading the config file from the env var `KUBECONFIG` is a good idea or not, because it's difficult to decide when we need to load it and when we don't. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] pgagnon commented on a diff in pull request #29623: Implement file credentials provider for AWS hook AssumeRoleWithWebIdentity
pgagnon commented on code in PR #29623: URL: https://github.com/apache/airflow/pull/29623#discussion_r1112410526 ## airflow/providers/amazon/aws/hooks/base_aws.py: ## @@ -312,19 +312,35 @@ def _get_web_identity_credential_fetcher( base_session = self.basic_session._session or botocore.session.get_session() client_creator = base_session.create_client federation = self.extra_config.get("assume_role_with_web_identity_federation") -if federation == "google": -web_identity_token_loader = self._get_google_identity_token_loader() -else: -raise AirflowException( -f'Unsupported federation: {federation}. Currently "google" only are supported.' -) + +web_identity_token_loader = ( +{ +"file": self._get_file_token_loader, +"google": self._get_google_identity_token_loader, +}.get(federation)() +if type(federation) == str +else None +) Review Comment: `botocore` is a low-level API, but it is not private. I'd be _very_ happy to work with you toward addressing your concerns about the implementation (really). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] s0neq commented on a diff in pull request #29635: YandexCloud provider: support Yandex SDK feature "endpoint"
s0neq commented on code in PR #29635: URL: https://github.com/apache/airflow/pull/29635#discussion_r1112407914 ## airflow/providers/yandex/hooks/yandex.py: ## @@ -122,7 +127,8 @@ def __init__( self.connection = self.get_connection(self.connection_id) self.extras = self.connection.extra_dejson credentials = self._get_credentials() -self.sdk = yandexcloud.SDK(user_agent=self.provider_user_agent(), **credentials) +endpoint = self._get_field("endpoint", "api.cloud.yandex.net") +self.sdk = yandexcloud.SDK(user_agent=self.provider_user_agent(), endpoint=endpoint, **credentials) Review Comment: some documentation on authorization is in README https://github.com/yandex-cloud/python-sdk :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis commented on a diff in pull request #29623: Implement file credentials provider for AWS hook AssumeRoleWithWebIdentity
Taragolis commented on code in PR #29623: URL: https://github.com/apache/airflow/pull/29623#discussion_r1112407307 ## airflow/providers/amazon/aws/hooks/base_aws.py: ## @@ -312,19 +312,35 @@ def _get_web_identity_credential_fetcher( base_session = self.basic_session._session or botocore.session.get_session() client_creator = base_session.create_client federation = self.extra_config.get("assume_role_with_web_identity_federation") -if federation == "google": -web_identity_token_loader = self._get_google_identity_token_loader() -else: -raise AirflowException( -f'Unsupported federation: {federation}. Currently "google" only are supported.' -) + +web_identity_token_loader = ( +{ +"file": self._get_file_token_loader, +"google": self._get_google_identity_token_loader, +}.get(federation)() +if type(federation) == str +else None +) Review Comment: Use non-documented features of `botocore` and private components 🙄 What could definitely goes wrong? I know at least `aiobotocore`, `PynamoDB` had this problems in the past (or even nowadays) and I wondering why Airflow still do not have. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] pgagnon commented on a diff in pull request #29623: Implement file credentials provider for AWS hook AssumeRoleWithWebIdentity
pgagnon commented on code in PR #29623: URL: https://github.com/apache/airflow/pull/29623#discussion_r1112403111 ## airflow/providers/amazon/aws/hooks/base_aws.py: ## @@ -312,19 +312,35 @@ def _get_web_identity_credential_fetcher( base_session = self.basic_session._session or botocore.session.get_session() client_creator = base_session.create_client federation = self.extra_config.get("assume_role_with_web_identity_federation") -if federation == "google": -web_identity_token_loader = self._get_google_identity_token_loader() -else: -raise AirflowException( -f'Unsupported federation: {federation}. Currently "google" only are supported.' -) + +web_identity_token_loader = ( +{ +"file": self._get_file_token_loader, +"google": self._get_google_identity_token_loader, +}.get(federation)() +if type(federation) == str +else None +) Review Comment: It still doesn't address the setting where you can't set external configs. The objective is to allow configuring connections that use `AssumeRoleWithWebIdentity` without relying on environment variables and AWS's configuration/shared credentials file. edit: (btw thank you for clarifying, I did misread your code snippet earlier.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis commented on a diff in pull request #29635: YandexCloud provider: support Yandex SDK feature "endpoint"
Taragolis commented on code in PR #29635: URL: https://github.com/apache/airflow/pull/29635#discussion_r1112404383 ## airflow/providers/yandex/hooks/yandex.py: ## @@ -122,7 +127,8 @@ def __init__( self.connection = self.get_connection(self.connection_id) self.extras = self.connection.extra_dejson credentials = self._get_credentials() -self.sdk = yandexcloud.SDK(user_agent=self.provider_user_agent(), **credentials) +endpoint = self._get_field("endpoint", "api.cloud.yandex.net") +self.sdk = yandexcloud.SDK(user_agent=self.provider_user_agent(), endpoint=endpoint, **credentials) Review Comment: ```suggestion sdk_config = {} endpoint = self._get_field("endpoint", None) if endpoint: sdk_config["endpoint"] = endpoint self.sdk = yandexcloud.SDK(user_agent=self.provider_user_agent(), **sdk_config, **credentials) ``` I would avoid specify default value in Airflow Provider, if it changed in Yandex Cloud SDK it would break in the future. BTW, is that all their [documentation](https://cloud.yandex.com/en/docs/functions/lang/python/sdk) for Python SDK? 🙄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] s0neq commented on a diff in pull request #29635: YandexCloud provider: support Yandex SDK feature "endpoint"
s0neq commented on code in PR #29635: URL: https://github.com/apache/airflow/pull/29635#discussion_r1112403729 ## airflow/providers/yandex/hooks/yandex.py: ## @@ -122,7 +127,8 @@ def __init__( self.connection = self.get_connection(self.connection_id) self.extras = self.connection.extra_dejson credentials = self._get_credentials() -self.sdk = yandexcloud.SDK(user_agent=self.provider_user_agent(), **credentials) +endpoint = self._get_field("endpoint", False) +self.sdk = yandexcloud.SDK(user_agent=self.provider_user_agent(), endpoint=endpoint, **credentials) Review Comment: > But it not missing in our case, we would provide `False` in this case and it would evaluate to this wrong value > > ```python > kwargs_no_endpoint = {"foo": "bar"} > print(kwargs_no_endpoint.get("endpoint", "api.cloud.yandex.net")) > > kwargs_none_endpoint = {"foo": "bar", "endpoint": None} > print(kwargs_none_endpoint.get("endpoint", "api.cloud.yandex.net")) > > kwargs_false_endpoint = {"foo": "bar", "endpoint": False} > print(kwargs_false_endpoint.get("endpoint", "api.cloud.yandex.net")) > ``` very tru, thanks, force pushed the change, now default endpoint is set in provider -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] pgagnon commented on a diff in pull request #29623: Implement file credentials provider for AWS hook AssumeRoleWithWebIdentity
pgagnon commented on code in PR #29623: URL: https://github.com/apache/airflow/pull/29623#discussion_r1112403111 ## airflow/providers/amazon/aws/hooks/base_aws.py: ## @@ -312,19 +312,35 @@ def _get_web_identity_credential_fetcher( base_session = self.basic_session._session or botocore.session.get_session() client_creator = base_session.create_client federation = self.extra_config.get("assume_role_with_web_identity_federation") -if federation == "google": -web_identity_token_loader = self._get_google_identity_token_loader() -else: -raise AirflowException( -f'Unsupported federation: {federation}. Currently "google" only are supported.' -) + +web_identity_token_loader = ( +{ +"file": self._get_file_token_loader, +"google": self._get_google_identity_token_loader, +}.get(federation)() +if type(federation) == str +else None +) Review Comment: It still doesn't address the setting where you can't set external configs. The objective is to allow configuring connections that use `AssumeRoleWithWebIdentity` without relying on environment variables and AWS's configuration/shared credentials file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on issue #20267: can't delete airflow connection from CLI when it has bad fernet key
github-actions[bot] commented on issue #20267: URL: https://github.com/apache/airflow/issues/20267#issuecomment-1437689864 This issue has been automatically marked as stale because it has been open for 365 days without any activity. There has been several Airflow releases since last activity on this issue. Kindly asking to recheck the report against latest Airflow version and let us know if the issue is reproducible. The issue will be closed in next 30 days if no further activity occurs from the issue author. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on issue #20588: Health check path is not configured properly if AIRFLOW__WEBSERVER__BASE_URL is set via ENV variable
github-actions[bot] commented on issue #20588: URL: https://github.com/apache/airflow/issues/20588#issuecomment-1437689838 This issue has been automatically marked as stale because it has been open for 365 days without any activity. There has been several Airflow releases since last activity on this issue. Kindly asking to recheck the report against latest Airflow version and let us know if the issue is reproducible. The issue will be closed in next 30 days if no further activity occurs from the issue author. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on issue #20821: Airflow can not schedule task in new subDAG
github-actions[bot] commented on issue #20821: URL: https://github.com/apache/airflow/issues/20821#issuecomment-1437689819 This issue has been automatically marked as stale because it has been open for 365 days without any activity. There has been several Airflow releases since last activity on this issue. Kindly asking to recheck the report against latest Airflow version and let us know if the issue is reproducible. The issue will be closed in next 30 days if no further activity occurs from the issue author. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on issue #20949: Spark Submit Operator reporting job failure. Not able to poll the successful completion of Spark Job.
github-actions[bot] commented on issue #20949: URL: https://github.com/apache/airflow/issues/20949#issuecomment-1437689787 This issue has been automatically marked as stale because it has been open for 365 days without any activity. There has been several Airflow releases since last activity on this issue. Kindly asking to recheck the report against latest Airflow version and let us know if the issue is reproducible. The issue will be closed in next 30 days if no further activity occurs from the issue author. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis commented on a diff in pull request #29623: Implement file credentials provider for AWS hook AssumeRoleWithWebIdentity
Taragolis commented on code in PR #29623: URL: https://github.com/apache/airflow/pull/29623#discussion_r1112397450 ## airflow/providers/amazon/aws/hooks/base_aws.py: ## @@ -312,19 +312,35 @@ def _get_web_identity_credential_fetcher( base_session = self.basic_session._session or botocore.session.get_session() client_creator = base_session.create_client federation = self.extra_config.get("assume_role_with_web_identity_federation") -if federation == "google": -web_identity_token_loader = self._get_google_identity_token_loader() -else: -raise AirflowException( -f'Unsupported federation: {federation}. Currently "google" only are supported.' -) + +web_identity_token_loader = ( +{ +"file": self._get_file_token_loader, +"google": self._get_google_identity_token_loader, +}.get(federation)() +if type(federation) == str +else None +) Review Comment: > Unfortunately, this doesn't work out of the box for the vast majority of operators. Furthermore, this doesn't address the use case; there are many ways to obtain temporary credentials, but none currently allow configuring AssumeRoleWithWebIdentity without relying on external configs. It would work with all operators which required AWS Connection, the DAG I provide as a sample, in prod all you need: 1. Setup in your environment `AWS_ROLE_ARN` and `AWS_WEB_IDENTITY_TOKEN_FILE` 2. Allow for `AWS_ROLE_ARN` to assume another role. This could be setup thought AWS IAM which do not required to change anything in Airflow environment. 3. Setup your connection which in extra `{"role_arn": "your-required-role-here"}` Repeat step 2 and 3 for new roles what you required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis commented on a diff in pull request #29623: Implement file credentials provider for AWS hook AssumeRoleWithWebIdentity
Taragolis commented on code in PR #29623: URL: https://github.com/apache/airflow/pull/29623#discussion_r1112397450 ## airflow/providers/amazon/aws/hooks/base_aws.py: ## @@ -312,19 +312,35 @@ def _get_web_identity_credential_fetcher( base_session = self.basic_session._session or botocore.session.get_session() client_creator = base_session.create_client federation = self.extra_config.get("assume_role_with_web_identity_federation") -if federation == "google": -web_identity_token_loader = self._get_google_identity_token_loader() -else: -raise AirflowException( -f'Unsupported federation: {federation}. Currently "google" only are supported.' -) + +web_identity_token_loader = ( +{ +"file": self._get_file_token_loader, +"google": self._get_google_identity_token_loader, +}.get(federation)() +if type(federation) == str +else None +) Review Comment: > Unfortunately, this doesn't work out of the box for the vast majority of operators. Furthermore, this doesn't address the use case; there are many ways to obtain temporary credentials, but none currently allow configuring AssumeRoleWithWebIdentity without relying on external configs. It would work with all operators which required AWS Connection, the DAG I provide as a sample, in prod all you need is: 1. Setup in your environment `AWS_ROLE_ARN` and `AWS_WEB_IDENTITY_TOKEN_FILE` 2. Allow for `AWS_ROLE_ARN` to assume another role. This could be setup thought AWS IAM which do not required to change anything in Airflow environment. 3. Setup your connection which in extra `{"role_arn": "your-required-role-here"}` Repeat step 2 and 3 for new roles what you required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis commented on a diff in pull request #29623: Implement file credentials provider for AWS hook AssumeRoleWithWebIdentity
Taragolis commented on code in PR #29623: URL: https://github.com/apache/airflow/pull/29623#discussion_r1112397450 ## airflow/providers/amazon/aws/hooks/base_aws.py: ## @@ -312,19 +312,35 @@ def _get_web_identity_credential_fetcher( base_session = self.basic_session._session or botocore.session.get_session() client_creator = base_session.create_client federation = self.extra_config.get("assume_role_with_web_identity_federation") -if federation == "google": -web_identity_token_loader = self._get_google_identity_token_loader() -else: -raise AirflowException( -f'Unsupported federation: {federation}. Currently "google" only are supported.' -) + +web_identity_token_loader = ( +{ +"file": self._get_file_token_loader, +"google": self._get_google_identity_token_loader, +}.get(federation)() +if type(federation) == str +else None +) Review Comment: > Unfortunately, this doesn't work out of the box for the vast majority of operators. Furthermore, this doesn't address the use case; there are many ways to obtain temporary credentials, but none currently allow configuring AssumeRoleWithWebIdentity without relying on external configs. It would work with all operators which required AWS Connection, the DAG I provide as a sample, in prod all you need is: 1. Setup in your environment `AWS_ROLE_ARN` and `AWS_WEB_IDENTITY_TOKEN_FILE` 2. Allow for `AWS_ROLE_ARN` to assume another role. This could be setup thought AWS IAM which do not required to change anything in Airflow environment. 3. Setup your connection which in extra {"role_arn": "your-required-role-here"} Repeat step 2 and 3 for new roles what you required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] pgagnon commented on a diff in pull request #29623: Implement file credentials provider for AWS hook AssumeRoleWithWebIdentity
pgagnon commented on code in PR #29623: URL: https://github.com/apache/airflow/pull/29623#discussion_r1112391709 ## airflow/providers/amazon/aws/hooks/base_aws.py: ## @@ -312,19 +312,35 @@ def _get_web_identity_credential_fetcher( base_session = self.basic_session._session or botocore.session.get_session() client_creator = base_session.create_client federation = self.extra_config.get("assume_role_with_web_identity_federation") -if federation == "google": -web_identity_token_loader = self._get_google_identity_token_loader() -else: -raise AirflowException( -f'Unsupported federation: {federation}. Currently "google" only are supported.' -) + +web_identity_token_loader = ( +{ +"file": self._get_file_token_loader, +"google": self._get_google_identity_token_loader, +}.get(federation)() +if type(federation) == str +else None +) Review Comment: > Seems like it provide ability to set different web identity token file, isn't it? 🤔 Yes; I don't see a reason not to allow parameterization here. In fact, I think not doing so would be inconsistent and counter to the goal of allowing the feature to be fully configured through the connections subsystem. > I guess this option exists for allow create own method auth to AWS without overwrite everything in provider. If it complicated better to think how it make easier to end users. Custom classes will always only be used by our more sophisticated users. I do believe that they have their place in Airflow, but they shouldn't be considered a replacement for more accessible APIs, even if they functionally allow the same implementation. > Airflow [`AssumeRole`](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html) in two steps > > 1. Obtain initial session by: either direct credentials or use `boto3` default strategy for obtain credentials (Environment Variables, EC2 IAM Profile, ECS Execution Role and etc) > 2. Use session obtained in Step 1 for assume role thought STS API [`AssumeRole`](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html). Unfortunately, this doesn't work out of the box for the vast majority of operators. Furthermore, this doesn't address the use case; there are many ways to obtain temporary credentials, but none currently allow configuring `AssumeRoleWithWebIdentity` without relying on external configs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] hussein-awala commented on issue #29621: Fix adding annotations for dag persistence PVC
hussein-awala commented on issue #29621: URL: https://github.com/apache/airflow/issues/29621#issuecomment-1437660008 @potiuk we can close it since #29622 is megred. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch revert-29408-docker-compose-change-example updated (9da3771744 -> 37aef76496)
This is an automated email from the ASF dual-hosted git repository. taragolis pushed a change to branch revert-29408-docker-compose-change-example in repository https://gitbox.apache.org/repos/asf/airflow.git discard 9da3771744 Revert "Improve health checks in example docker-compose and clarify usage (#29408)" add 6ef5ba9104 Refactor Dataproc Trigger (#29364) add 5835b08e8b Adding possibility for annotations in logs pvc (#29270) add 901774718c Fix adding annotations for dag persistence PVC (#29622) add 3dbcf99d20 Update google cloud dlp package and adjust hook and operators (#29234) add f51742d20b Don't push secret in XCOM in BigQueryCreateDataTransferOperator (#29348) add 37aef76496 Revert "Improve health checks in example docker-compose and clarify usage (#29408)" This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (9da3771744) \ N -- N -- N refs/heads/revert-29408-docker-compose-change-example (37aef76496) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: airflow/providers/google/cloud/hooks/dlp.py| 373 ++--- .../google/cloud/operators/bigquery_dts.py | 3 + .../providers/google/cloud/operators/dataproc.py | 4 +- airflow/providers/google/cloud/operators/dlp.py| 63 ++-- .../providers/google/cloud/triggers/dataproc.py| 191 --- airflow/providers/google/provider.yaml | 4 +- chart/templates/dags-persistent-volume-claim.yaml | 6 +- chart/templates/logs-persistent-volume-claim.yaml | 4 + chart/values.schema.json | 8 + chart/values.yaml | 4 + generated/provider_dependencies.json | 2 +- tests/providers/google/cloud/hooks/test_dlp.py | 327 -- .../google/cloud/operators/test_bigquery_dts.py| 10 +- tests/providers/google/cloud/operators/test_dlp.py | 41 ++- .../google/cloud/triggers/test_dataproc.py | 4 +- 15 files changed, 638 insertions(+), 406 deletions(-)
[GitHub] [airflow] Taragolis commented on a diff in pull request #29623: Implement file credentials provider for AWS hook AssumeRoleWithWebIdentity
Taragolis commented on code in PR #29623: URL: https://github.com/apache/airflow/pull/29623#discussion_r1112384223 ## airflow/providers/amazon/aws/hooks/base_aws.py: ## @@ -312,19 +312,35 @@ def _get_web_identity_credential_fetcher( base_session = self.basic_session._session or botocore.session.get_session() client_creator = base_session.create_client federation = self.extra_config.get("assume_role_with_web_identity_federation") -if federation == "google": -web_identity_token_loader = self._get_google_identity_token_loader() -else: -raise AirflowException( -f'Unsupported federation: {federation}. Currently "google" only are supported.' -) + +web_identity_token_loader = ( +{ +"file": self._get_file_token_loader, +"google": self._get_google_identity_token_loader, +}.get(federation)() +if type(federation) == str +else None +) Review Comment: > The objective is more about using multiple role_arn than multiple tokens. It is widespread in practice for users not to have the ability to override config files (for instance, in corporate environments or on PaaS), so defining profiles in the AWS config or shared credentials file is not always practical. Seems like it provide ability to set different web identity token file, isn't it? 🤔 ```python token_file = self.extra_config.get("assume_role_with_web_identity_token_file") or os.getenv( AssumeRoleWithWebIdentityProvider._CONFIG_TO_ENV_VAR["web_identity_token_file"] ) ``` > This shares the same concern as point 1 regarding the inability to overwrite config files. Also, I think this option would be dismissed as too complex and burdensome by 99% of Airflow users. I guess this option exists for allow create own method auth to AWS without overwrite everything in provider. If it complicated better to think how it make easier to end users. > This does not permit configuration through the Airflow connections subsystem without relying on external configs, which are (1) not necessarily mutable and (2) more-or-less static. Airflow [`AssumeRole`](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html) in two steps 1. Obtain initial session by: either direct credentials or use `boto3` default strategy for obtain credentials (Environment Variables, EC2 IAM Profile, ECS Execution Role and etc) 2. Use session obtained in Step 1 for assume role thought STS API [`AssumeRole`](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html). So I can't see any problem for obtain in EKS IRSA or by other SA (`AWS_ROLE_ARN`, `AWS_WEB_IDENTITY_TOKEN_FILE` and `AWS_ROLE_SESSION_NAME` environment variables) and after that assume to required role. That simple snippet DAG for check ```python import os from airflow import DAG from airflow.decorators import task from airflow.models import Connection import pendulum with DAG( dag_id="test-irsa-role", start_date=pendulum.datetime(2023, 2, 1, tz="UTC"), schedule=None, catchup=False, tags=["aws", "assume-role", "29623"] ): @task def check_connection(): os.environ["AWS_ROLE_ARN"] = "arn:aws:iam:::role/spam-egg-role" os.environ["AWS_WEB_IDENTITY_TOKEN_FILE"] = "/var/run/secrets/eks.amazonaws.com/serviceaccount/token" conn_id = "fake-conn-for-test" region_name = "eu-west-1" assumed_role = "arn:aws:iam:::role/foo-bar-role" assert assumed_role != os.environ["AWS_ROLE_ARN"] conn = Connection( conn_id=conn_id, conn_type="aws", extra={ "role_arn": assumed_role, "region_name": region_name } ) env_key = f"AIRFLOW_CONN_{conn.conn_id.upper()}" os.environ[env_key] = conn.get_uri() result, message = conn.test_connection() if result: return message raise PermissionError(message) check_connection() ``` > This would be possible if boto3 allowed passing web_identity_token_file, role_arn, or role_session_name as parameters to clients or sessions, but sadly it does not at this point. This would be nice to address to [boto team](https://github.com/boto), and may be they could share why it is not implemented into their SDK, I guess some reason exists for that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For
[airflow] branch main updated: Don't push secret in XCOM in BigQueryCreateDataTransferOperator (#29348)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new f51742d20b Don't push secret in XCOM in BigQueryCreateDataTransferOperator (#29348) f51742d20b is described below commit f51742d20b2e53bcd90a19db21e4e12d2a287677 Author: Pankaj Singh <98807258+pankajas...@users.noreply.github.com> AuthorDate: Tue Feb 21 04:36:50 2023 +0530 Don't push secret in XCOM in BigQueryCreateDataTransferOperator (#29348) * Don't push secret in xcom in BigQueryCreateDataTransferOperator --- airflow/providers/google/cloud/operators/bigquery_dts.py| 3 +++ tests/providers/google/cloud/operators/test_bigquery_dts.py | 10 -- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/airflow/providers/google/cloud/operators/bigquery_dts.py b/airflow/providers/google/cloud/operators/bigquery_dts.py index ee1a3b548b..d786c903e7 100644 --- a/airflow/providers/google/cloud/operators/bigquery_dts.py +++ b/airflow/providers/google/cloud/operators/bigquery_dts.py @@ -138,6 +138,9 @@ class BigQueryCreateDataTransferOperator(BaseOperator): result = TransferConfig.to_dict(response) self.log.info("Created DTS transfer config %s", get_object_id(result)) self.xcom_push(context, key="transfer_config_id", value=get_object_id(result)) +# don't push AWS secret in XCOM +result.get("params", {}).pop("secret_access_key", None) +result.get("params", {}).pop("access_key_id", None) return result diff --git a/tests/providers/google/cloud/operators/test_bigquery_dts.py b/tests/providers/google/cloud/operators/test_bigquery_dts.py index 78c92d52ed..aa52169a77 100644 --- a/tests/providers/google/cloud/operators/test_bigquery_dts.py +++ b/tests/providers/google/cloud/operators/test_bigquery_dts.py @@ -46,12 +46,15 @@ TRANSFER_CONFIG_ID = "id1234" TRANSFER_CONFIG_NAME = "projects/123abc/locations/321cba/transferConfig/1a2b3c" RUN_NAME = "projects/123abc/locations/321cba/transferConfig/1a2b3c/runs/123" +transfer_config = TransferConfig( +name=TRANSFER_CONFIG_NAME, params={"secret_access_key": "AIRFLOW_KEY", "access_key_id": "AIRFLOW_KEY_ID"} +) class BigQueryCreateDataTransferOperatorTestCase(unittest.TestCase): @mock.patch( "airflow.providers.google.cloud.operators.bigquery_dts.BiqQueryDataTransferServiceHook", -**{"return_value.create_transfer_config.return_value": TransferConfig(name=TRANSFER_CONFIG_NAME)}, +**{"return_value.create_transfer_config.return_value": transfer_config}, ) def test_execute(self, mock_hook): op = BigQueryCreateDataTransferOperator( @@ -59,7 +62,7 @@ class BigQueryCreateDataTransferOperatorTestCase(unittest.TestCase): ) ti = mock.MagicMock() -op.execute({"ti": ti}) +return_value = op.execute({"ti": ti}) mock_hook.return_value.create_transfer_config.assert_called_once_with( authorization_code=None, @@ -71,6 +74,9 @@ class BigQueryCreateDataTransferOperatorTestCase(unittest.TestCase): ) ti.xcom_push.assert_called_with(execution_date=None, key="transfer_config_id", value="1a2b3c") +assert "secret_access_key" not in return_value.get("params", {}) +assert "access_key_id" not in return_value.get("params", {}) + class BigQueryDeleteDataTransferConfigOperatorTestCase(unittest.TestCase): @mock.patch("airflow.providers.google.cloud.operators.bigquery_dts.BiqQueryDataTransferServiceHook")
[GitHub] [airflow] potiuk merged pull request #29348: Don't push secret in XCOM in BigQueryCreateDataTransferOperator
potiuk merged PR #29348: URL: https://github.com/apache/airflow/pull/29348 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk closed issue #29209: BigQueryCreateDataTransferOperator will log AWS credentials when transferring from S3
potiuk closed issue #29209: BigQueryCreateDataTransferOperator will log AWS credentials when transferring from S3 URL: https://github.com/apache/airflow/issues/29209 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #29648: Optionally exclude task deferral time from overall runtime of the task
potiuk commented on issue #29648: URL: https://github.com/apache/airflow/issues/29648#issuecomment-1437654137 Maybe, but this would not be accurate - there are lots of nuances - when the task start running and when it is already deferred (both bringing back and deferring tasks take time). Plus it would require DB changes to keep the total time of "execution". CC: @andrewgodwin WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated: Update google cloud dlp package and adjust hook and operators (#29234)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 3dbcf99d20 Update google cloud dlp package and adjust hook and operators (#29234) 3dbcf99d20 is described below commit 3dbcf99d20d47cde0debdd5faf9bd9b2ebde1718 Author: Łukasz Wyszomirski AuthorDate: Tue Feb 21 00:01:13 2023 +0100 Update google cloud dlp package and adjust hook and operators (#29234) --- airflow/providers/google/cloud/hooks/dlp.py| 373 ++--- airflow/providers/google/cloud/operators/dlp.py| 63 ++-- airflow/providers/google/provider.yaml | 4 +- generated/provider_dependencies.json | 2 +- tests/providers/google/cloud/hooks/test_dlp.py | 327 -- tests/providers/google/cloud/operators/test_dlp.py | 41 ++- 6 files changed, 534 insertions(+), 276 deletions(-) diff --git a/airflow/providers/google/cloud/hooks/dlp.py b/airflow/providers/google/cloud/hooks/dlp.py index cf3883516e..b7a71f5dd1 100644 --- a/airflow/providers/google/cloud/hooks/dlp.py +++ b/airflow/providers/google/cloud/hooks/dlp.py @@ -33,7 +33,7 @@ from typing import Sequence from google.api_core.gapic_v1.method import DEFAULT, _MethodDefault from google.api_core.retry import Retry -from google.cloud.dlp_v2 import DlpServiceClient +from google.cloud.dlp import DlpServiceClient from google.cloud.dlp_v2.types import ( ByteContentItem, ContentItem, @@ -41,7 +41,6 @@ from google.cloud.dlp_v2.types import ( DeidentifyContentResponse, DeidentifyTemplate, DlpJob, -FieldMask, InspectConfig, InspectContentResponse, InspectJobConfig, @@ -55,6 +54,7 @@ from google.cloud.dlp_v2.types import ( StoredInfoType, StoredInfoTypeConfig, ) +from google.protobuf.field_mask_pb2 import FieldMask from airflow.exceptions import AirflowException from airflow.providers.google.common.consts import CLIENT_INFO @@ -101,7 +101,7 @@ class CloudDLPHook(GoogleBaseHook): delegate_to=delegate_to, impersonation_chain=impersonation_chain, ) -self._client = None +self._client: DlpServiceClient | None = None def get_conn(self) -> DlpServiceClient: """ @@ -113,6 +113,15 @@ class CloudDLPHook(GoogleBaseHook): self._client = DlpServiceClient(credentials=self.get_credentials(), client_info=CLIENT_INFO) return self._client +def _project_deidentify_template_path(self, project_id, template_id): +return f"{DlpServiceClient.common_project_path(project_id)}/deidentifyTemplates/{template_id}" + +def _project_stored_info_type_path(self, project_id, info_type_id): +return f"{DlpServiceClient.common_project_path(project_id)}/storedInfoTypes/{info_type_id}" + +def _project_inspect_template_path(self, project_id, inspect_template_id): +return f"{DlpServiceClient.common_project_path(project_id)}/inspectTemplates/{inspect_template_id}" + @GoogleBaseHook.fallback_to_default_project_id def cancel_dlp_job( self, @@ -142,7 +151,14 @@ class CloudDLPHook(GoogleBaseHook): raise AirflowException("Please provide the ID of the DLP job resource to be cancelled.") name = DlpServiceClient.dlp_job_path(project_id, dlp_job_id) -client.cancel_dlp_job(name=name, retry=retry, timeout=timeout, metadata=metadata) +client.cancel_dlp_job( +request=dict( +name=name, +), +retry=retry, +timeout=timeout, +metadata=metadata, +) def create_deidentify_template( self, @@ -177,16 +193,18 @@ class CloudDLPHook(GoogleBaseHook): project_id = project_id or self.project_id if organization_id: -parent = DlpServiceClient.organization_path(organization_id) +parent = DlpServiceClient.common_organization_path(organization_id) elif project_id: -parent = DlpServiceClient.project_path(project_id) +parent = DlpServiceClient.common_project_path(project_id) else: raise AirflowException("Please provide either organization_id or project_id.") return client.create_deidentify_template( -parent=parent, -deidentify_template=deidentify_template, -template_id=template_id, +request=dict( +parent=parent, +deidentify_template=deidentify_template, +template_id=template_id, +), retry=retry, timeout=timeout, metadata=metadata, @@ -227,12 +245,14 @@ class CloudDLPHook(GoogleBaseHook): """ client = self.get_conn() -parent = DlpServiceClient.project_path(project_id) +parent = Dlp
[GitHub] [airflow] potiuk merged pull request #29234: Update google cloud dlp package and adjust hook and operators
potiuk merged PR #29234: URL: https://github.com/apache/airflow/pull/29234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #29643: Add google-cloud-run python client to GCP requirements
potiuk commented on issue #29643: URL: https://github.com/apache/airflow/issues/29643#issuecomment-1437649040 Feel free to attempt it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated (5835b08e8b -> 901774718c)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git from 5835b08e8b Adding possibility for annotations in logs pvc (#29270) add 901774718c Fix adding annotations for dag persistence PVC (#29622) No new revisions were added by this update. Summary of changes: chart/templates/dags-persistent-volume-claim.yaml | 6 +++--- chart/values.yaml | 2 ++ 2 files changed, 5 insertions(+), 3 deletions(-)
[GitHub] [airflow] potiuk merged pull request #29622: Fix adding annotations for dag persistence PVC
potiuk merged PR #29622: URL: https://github.com/apache/airflow/pull/29622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated: Adding possibility for annotations in logs pvc (#29270)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 5835b08e8b Adding possibility for annotations in logs pvc (#29270) 5835b08e8b is described below commit 5835b08e8bc3e11f4f98745266d10bbae510b258 Author: Amogh Desai AuthorDate: Tue Feb 21 04:27:35 2023 +0530 Adding possibility for annotations in logs pvc (#29270) * Adding possibility for annotations in logs pvc Co-authored-by: Amogh --- chart/templates/logs-persistent-volume-claim.yaml | 4 chart/values.schema.json | 8 chart/values.yaml | 2 ++ 3 files changed, 14 insertions(+) diff --git a/chart/templates/logs-persistent-volume-claim.yaml b/chart/templates/logs-persistent-volume-claim.yaml index deeaa2ef97..07374c1fd5 100644 --- a/chart/templates/logs-persistent-volume-claim.yaml +++ b/chart/templates/logs-persistent-volume-claim.yaml @@ -29,6 +29,10 @@ metadata: {{- with .Values.labels }} {{- toYaml . | nindent 4 }} {{- end }} + {{- with .Values.logs.persistence.annotations }} + annotations: +{{- toYaml . | nindent 4 }} + {{- end }} spec: accessModes: ["ReadWriteMany"] resources: diff --git a/chart/values.schema.json b/chart/values.schema.json index db6d3b3fa9..12ab8a99b0 100644 --- a/chart/values.schema.json +++ b/chart/values.schema.json @@ -5491,6 +5491,14 @@ ], "default": null }, +"annotations": { +"description": "Annotations to add to logs PVC", +"type": "object", +"default": {}, +"additionalProperties": { +"type": "string" +} +}, "existingClaim": { "description": "The name of an existing PVC to use.", "type": [ diff --git a/chart/values.yaml b/chart/values.yaml index ef59660773..b0c06bc086 100644 --- a/chart/values.yaml +++ b/chart/values.yaml @@ -1944,6 +1944,8 @@ logs: enabled: false # Volume size for logs size: 100Gi +# Annotations for the logs PVC +annotations: {} # If using a custom storageClass, pass name here storageClassName: ## the name of an existing PVC to use
[GitHub] [airflow] potiuk merged pull request #29270: Adding possibility for annotations in logs pvc
potiuk merged PR #29270: URL: https://github.com/apache/airflow/pull/29270 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #28647: Intermitent log on deferrable operator
potiuk commented on issue #28647: URL: https://github.com/apache/airflow/issues/28647#issuecomment-1437642318 cc: @andrewgodwin maybe that rings a bell for you ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #29094: Add `max_active_tis_per_dagrun` for Dynamic Task Mapping
potiuk commented on PR #29094: URL: https://github.com/apache/airflow/pull/29094#issuecomment-1437640955 I think this one is great - though I do not understand all the details of mapped tasks and concurrency there. @uranusjr @ashb - maybe you can take a look, sounds like a great feature of Dynamic Task Mapping. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated (ee0a56a2ca -> 6ef5ba9104)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git from ee0a56a2ca Trigger Class migration to internal API (#29099) add 6ef5ba9104 Refactor Dataproc Trigger (#29364) No new revisions were added by this update. Summary of changes: .../providers/google/cloud/operators/dataproc.py | 4 +- .../providers/google/cloud/triggers/dataproc.py| 191 - .../google/cloud/triggers/test_dataproc.py | 4 +- 3 files changed, 74 insertions(+), 125 deletions(-)
[GitHub] [airflow] potiuk merged pull request #29364: Refactor Dataproc Trigger
potiuk merged PR #29364: URL: https://github.com/apache/airflow/pull/29364 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #25060: Airflow scheduler crashes due to 'duplicate key value violates unique constraint "task_instance_pkey"'
potiuk commented on issue #25060: URL: https://github.com/apache/airflow/issues/25060#issuecomment-1437625953 @uranusjr - wasn't the "null" mapped task instance fixed since 2.4.3 ? I cannot find it easily -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] pankajastro commented on pull request #29348: Don't push secret in XCOM in BigQueryCreateDataTransferOperator
pankajastro commented on PR #29348: URL: https://github.com/apache/airflow/pull/29348#issuecomment-1437626057 > > And I guess we need to add tests for that, for avoid situation that somehow it return in the future. > > Agree Added here 4eade3237e25f92095d74b76cd1f16f99945c842 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #29648: Optionally exclude task deferral time from overall runtime of the task
boring-cyborg[bot] commented on issue #29648: URL: https://github.com/apache/airflow/issues/29648#issuecomment-1437605958 Thanks for opening your first issue here! Be sure to follow the issue template! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] joshuaghezzi opened a new issue, #29648: Optionally exclude task deferral time from overall runtime of the task
joshuaghezzi opened a new issue, #29648: URL: https://github.com/apache/airflow/issues/29648 ### Description Ability to optionally exclude the time that the task was deferred from the overall execution time of the task. ### Use case/motivation I've created a custom operator which defers execution for a specified amount of time when a specific exception occurs. The task that uses the operator has a timeout on it which is less than the duration of the deferral so once the deferral resumes the task will timeout. Instead of raising the timeout to account for the time the task could be deferred which changes the intent of the timeout, I feel it'd useful to optionally exclude the time the task was deferred. The execution time would essentially be paused while the task is deferred and would start from where it was paused once the task resumes. ### Related issues #19382 - brought expected behaviour in line with documentation. ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] pgagnon commented on a diff in pull request #29623: Implement file credentials provider for AWS hook AssumeRoleWithWebIdentity
pgagnon commented on code in PR #29623: URL: https://github.com/apache/airflow/pull/29623#discussion_r1112360505 ## airflow/providers/amazon/aws/hooks/base_aws.py: ## @@ -312,19 +312,35 @@ def _get_web_identity_credential_fetcher( base_session = self.basic_session._session or botocore.session.get_session() client_creator = base_session.create_client federation = self.extra_config.get("assume_role_with_web_identity_federation") -if federation == "google": -web_identity_token_loader = self._get_google_identity_token_loader() -else: -raise AirflowException( -f'Unsupported federation: {federation}. Currently "google" only are supported.' -) + +web_identity_token_loader = ( +{ +"file": self._get_file_token_loader, +"google": self._get_google_identity_token_loader, +}.get(federation)() +if type(federation) == str +else None +) Review Comment: @Taragolis I think two areas are being raised here. On the implementation, I am well aware that it can be improved, and I fully expected to go through a few rounds to get this in shape. If you have specific review comments, I would happily work with you on this. 🙂 Regarding the alternatives that you have outlined, I have considered them previously, and I do not think they address the objective. Specifically: 1. The objective is more about using multiple `role_arn` than multiple tokens. It is widespread in practice for users not to have the ability to override config files (for instance, in corporate environments or on PaaS), so defining profiles in the AWS config or shared credentials file is not always practical. 2. This shares the same concern as point 1 regarding the inability to overwrite config files. Also, I think this option would be dismissed as too complex and burdensome by 99% of Airflow users. 3. This does not permit configuration through the Airflow connections subsystem without relying on external configs, which are (1) not necessarily mutable and (2) more-or-less static. This would be possible if `boto3` allowed passing `web_identity_token_file,` `role_arn`, or `role_session_name` as parameters to clients or sessions, but sadly it does not at this point. This being said, I share your concern about the lack of system tests for connections, and this is something worthwhile to look at but probably out of the scope of this discussion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #29432: Jinja templating doesn't work with container_resources when using dymanic task mapping with Kubernetes Pod Operator
potiuk commented on issue #29432: URL: https://github.com/apache/airflow/issues/29432#issuecomment-1437596856 Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #29621: Fix adding annotations for dag persistence PVC
potiuk commented on issue #29621: URL: https://github.com/apache/airflow/issues/29621#issuecomment-1437596195 Feel free to fix them @amoghrajesh . Assigned you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #29393: S3TaskHandler continuously returns "*** Falling back to local log" even if log_pos is provided when log not in s3
potiuk commented on issue #29393: URL: https://github.com/apache/airflow/issues/29393#issuecomment-1437595099 cc: @dstandish - maybe you can comment on how it changes in 2.6 (and whether the problem is going to be fixed there). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #29580: Allow to specify which connection, variable or config are being looked up in the backend using *_lookup_pattern parameters
potiuk commented on PR #29580: URL: https://github.com/apache/airflow/pull/29580#issuecomment-1437592624 Yep. I thouthg about it and this seems like non-obvious great way of solving the "cost" problem connected with secret backends. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #29406: Add Fail Fast feature for DAGs
potiuk commented on PR #29406: URL: https://github.com/apache/airflow/pull/29406#issuecomment-1437591152 @ashb @uranusjr - can you think of some hidden side-effects. Seems like this implementation of "fail-fast" is simple and might just work - unless I missed something. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis commented on a diff in pull request #29195: Fixing Task Duration view in case of manual DAG runs only (#22015)
Taragolis commented on code in PR #29195: URL: https://github.com/apache/airflow/pull/29195#discussion_r1112342301 ## airflow/models/dag.py: ## @@ -112,6 +113,8 @@ log = logging.getLogger(__name__) +_T = TypeVar("_T") + Review Comment: ```suggestion ``` ## airflow/models/dag.py: ## @@ -1534,6 +1540,8 @@ def get_task_instances( start_date = (timezone.utcnow() - timedelta(30)).replace( hour=0, minute=0, second=0, microsecond=0 ) + +print("=S=", start_date) Review Comment: ```suggestion ``` ## airflow/models/dag.py: ## @@ -43,6 +43,7 @@ Iterator, List, Sequence, +TypeVar, Review Comment: ```suggestion ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #29406: Add Fail Fast feature for DAGs
potiuk commented on code in PR #29406: URL: https://github.com/apache/airflow/pull/29406#discussion_r1112342672 ## airflow/models/taskinstance.py: ## @@ -172,6 +172,19 @@ def set_current_context(context: Context) -> Generator[Context, None, None]: ) +def stop_all_tasks_in_dag(tis: list[TaskInstance], session: Session, task_id_to_ignore: int): +for ti in tis: +if ti.task_id == task_id_to_ignore or ti.state in ( +TaskInstanceState.SUCCESS, +TaskInstanceState.FAILED, +): +continue +if ti.state == TaskInstanceState.RUNNING: +ti.error(session) Review Comment: Otherwise it will be quite magical -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #29406: Add Fail Fast feature for DAGs
potiuk commented on code in PR #29406: URL: https://github.com/apache/airflow/pull/29406#discussion_r1112342504 ## airflow/models/taskinstance.py: ## @@ -172,6 +172,19 @@ def set_current_context(context: Context) -> Generator[Context, None, None]: ) +def stop_all_tasks_in_dag(tis: list[TaskInstance], session: Session, task_id_to_ignore: int): +for ti in tis: +if ti.task_id == task_id_to_ignore or ti.state in ( +TaskInstanceState.SUCCESS, +TaskInstanceState.FAILED, +): +continue +if ti.state == TaskInstanceState.RUNNING: +ti.error(session) Review Comment: We shoudl add some logging telling that we are doing it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #29518: Google Providers - Fix _MethodDefault deepcopy failure
potiuk commented on PR #29518: URL: https://github.com/apache/airflow/pull/29518#issuecomment-1437581881 @IKholopov - can you please split this one in two PRs: 1) introduce base class 2) adding deepcopy That will be much better when it comes to historical look at the changes. It's quite hard to find the deepcopy fix among all the changed base operators. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #29518: Google Providers - Fix _MethodDefault deepcopy failure
potiuk commented on code in PR #29518: URL: https://github.com/apache/airflow/pull/29518#discussion_r1112340761 ## airflow/providers/google/cloud/operators/cloud_base.py: ## @@ -0,0 +1,34 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +"""This module contains a Google API base operator.""" +from __future__ import annotations + +from google.api_core.gapic_v1.method import DEFAULT + +from airflow.models import BaseOperator + + +class GoogleCloudBaseOperator(BaseOperator): +""" +Abstract base class that takes care of common specifics of the operators built +on top of Google API client libraries. +""" + +def __deepcopy__(self, memo): +memo[id(DEFAULT)] = DEFAULT Review Comment: (Link to some description/issue explaining it because otherwise it is pretty magical. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #29518: Google Providers - Fix _MethodDefault deepcopy failure
potiuk commented on code in PR #29518: URL: https://github.com/apache/airflow/pull/29518#discussion_r1112340285 ## airflow/providers/google/cloud/operators/cloud_base.py: ## @@ -0,0 +1,34 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +"""This module contains a Google API base operator.""" +from __future__ import annotations + +from google.api_core.gapic_v1.method import DEFAULT + +from airflow.models import BaseOperator + + +class GoogleCloudBaseOperator(BaseOperator): +""" +Abstract base class that takes care of common specifics of the operators built +on top of Google API client libraries. +""" + +def __deepcopy__(self, memo): +memo[id(DEFAULT)] = DEFAULT Review Comment: Would be nice to add explanation why we are doing it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] EKami commented on issue #13311: pendulum.tz.zoneinfo.exceptions.InvalidTimezone
EKami commented on issue #13311: URL: https://github.com/apache/airflow/issues/13311#issuecomment-1437579039 Nope, sorry guys, I gave up on Airflow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis opened a new pull request, #29647: Revert "Improve health checks in example docker-compose and clarify usage"
Taragolis opened a new pull request, #29647: URL: https://github.com/apache/airflow/pull/29647 Just in case check is same error happen as described here #29616 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] 01/01: Revert "Improve health checks in example docker-compose and clarify usage (#29408)"
This is an automated email from the ASF dual-hosted git repository. taragolis pushed a commit to branch revert-29408-docker-compose-change-example in repository https://gitbox.apache.org/repos/asf/airflow.git commit 9da37717449f5267b4ef8c4962cb89f6fa5f102c Author: Andrey Anshin AuthorDate: Tue Feb 21 01:33:42 2023 +0400 Revert "Improve health checks in example docker-compose and clarify usage (#29408)" This reverts commit 08bec93e5cb651c5180af13bad1c43d763c7583d. --- .../howto/docker-compose/docker-compose.yaml | 40 ++ 1 file changed, 11 insertions(+), 29 deletions(-) diff --git a/docs/apache-airflow/howto/docker-compose/docker-compose.yaml b/docs/apache-airflow/howto/docker-compose/docker-compose.yaml index 719fffe8ab..38e8e8b471 100644 --- a/docs/apache-airflow/howto/docker-compose/docker-compose.yaml +++ b/docs/apache-airflow/howto/docker-compose/docker-compose.yaml @@ -36,15 +36,11 @@ # _AIRFLOW_WWW_USER_PASSWORD - Password for the administrator account (if requested). #Default: airflow # _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when starting all containers. -#Use this option ONLY for quick checks. Installing requirements at container -#startup is done EVERY TIME the service is started. -#A better way is to build a custom image or extend the official image -#as described in https://airflow.apache.org/docs/docker-stack/build.html. #Default: '' # # Feel free to modify this file to suit your needs. --- -version: '3.8' +version: '3' x-airflow-common: &airflow-common # In order to add custom dependencies or upgrade provider packages you can use your extended image. @@ -64,13 +60,6 @@ x-airflow-common: AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true' AIRFLOW__CORE__LOAD_EXAMPLES: 'true' AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session' -# yamllint disable rule:line-length -# Use simple http server on scheduler for health checks -# See https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/check-health.html#scheduler-health-check-server -# yamllint enable rule:line-length -AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true' -# WARNING: Use _PIP_ADDITIONAL_REQUIREMENTS option ONLY for a quick checks -# for other purpose (development, test and especially production usage) build/extend Airflow image. _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-} volumes: - ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags @@ -95,9 +84,8 @@ services: - postgres-db-volume:/var/lib/postgresql/data healthcheck: test: ["CMD", "pg_isready", "-U", "airflow"] - interval: 10s + interval: 5s retries: 5 - start_period: 5s restart: always redis: @@ -106,23 +94,21 @@ services: - 6379 healthcheck: test: ["CMD", "redis-cli", "ping"] - interval: 10s + interval: 5s timeout: 30s retries: 50 - start_period: 30s restart: always airflow-webserver: <<: *airflow-common command: webserver ports: - - "8080:8080" + - 8080:8080 healthcheck: test: ["CMD", "curl", "--fail", "http://localhost:8080/health";] - interval: 30s + interval: 10s timeout: 10s retries: 5 - start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on @@ -133,11 +119,10 @@ services: <<: *airflow-common command: scheduler healthcheck: - test: ["CMD", "curl", "--fail", "http://localhost:8974/health";] - interval: 30s + test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"'] + interval: 10s timeout: 10s retries: 5 - start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on @@ -151,10 +136,9 @@ services: test: - "CMD-SHELL" - 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"' - interval: 30s + interval: 10s timeout: 10s retries: 5 - start_period: 30s environment: <<: *airflow-common-env # Required to handle warm shutdown of the celery workers properly @@ -171,10 +155,9 @@ services: command: triggerer healthcheck: test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"'] - interval: 30s + interval: 10s timeout: 10s retries: 5 - start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on @@ -281,13 +264,12 @@ services: profiles: - flower ports: - - ":" + -
[airflow] branch revert-29408-docker-compose-change-example created (now 9da3771744)
This is an automated email from the ASF dual-hosted git repository. taragolis pushed a change to branch revert-29408-docker-compose-change-example in repository https://gitbox.apache.org/repos/asf/airflow.git at 9da3771744 Revert "Improve health checks in example docker-compose and clarify usage (#29408)" This branch includes the following new commits: new 9da3771744 Revert "Improve health checks in example docker-compose and clarify usage (#29408)" The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[GitHub] [airflow] raphaelauv commented on issue #29432: Jinja templating doesn't work with container_resources when using dymanic task mapping with Kubernetes Pod Operator
raphaelauv commented on issue #29432: URL: https://github.com/apache/airflow/issues/29432#issuecomment-1437573499 @potiuk could we add this one to https://github.com/apache/airflow/milestone/68 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] hussein-awala commented on issue #29432: Jinja templating doesn't work with container_resources when using dymanic task mapping with Kubernetes Pod Operator
hussein-awala commented on issue #29432: URL: https://github.com/apache/airflow/issues/29432#issuecomment-1437570393 @jose-lpa using `Variable.get()` in the dag script is not recommended because the `DagFileProcessor` process the script each X minutes, and it loads this variable from the DB. Also this solution works with Variable but not all the other jinja templates. @pshrivastava27 here is a solution for your need ```python import datetime import os from airflow import models from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import ( KubernetesPodOperator, ) from kubernetes.client import models as k8s_models dvt_image = os.environ.get("DVT_IMAGE", "dev") default_dag_args = {"start_date": datetime.datetime(2022, 1, 1)} class PatchedResourceRequirements(k8s_models.V1ResourceRequirements): template_fields = ("limits", "requests") def pod_mem(): return "4000M" def pod_cpu(): return "1000m" with models.DAG( "sample_dag", schedule_interval=None, default_args=default_dag_args, render_template_as_native_obj=True, user_defined_macros={ "pod_mem": pod_mem, "pod_cpu": pod_cpu, }, ) as dag: task_1 = KubernetesPodOperator( task_id="task_1", name="task_1", namespace="default", image=dvt_image, cmds=["bash", "-cx"], arguments=["echo hello"], service_account_name="sa-k8s", container_resources=PatchedResourceRequirements( limits={ "memory": "{{ pod_mem() }}", "cpu": "{{ pod_cpu() }}", } ), startup_timeout_seconds=1800, get_logs=True, image_pull_policy="Always", config_file="/home/airflow/composer_kube_config", dag=dag, ) task_2 = KubernetesPodOperator.partial( task_id="task_2", name="task_2", namespace="default", image=dvt_image, cmds=["bash", "-cx"], service_account_name="sa-k8s", container_resources=PatchedResourceRequirements( limits={ "memory": "{{ pod_mem() }}", "cpu": "{{ pod_cpu() }}", } ), startup_timeout_seconds=1800, get_logs=True, image_pull_policy="Always", config_file="/home/airflow/composer_kube_config", dag=dag, ).expand(arguments=[["echo hello"]]) task_1 >> task_2 ``` You can do the same for the other classes if needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #29315: AIP-44 Migrate BaseJob.heartbeat to InternalAPI
potiuk commented on issue #29315: URL: https://github.com/apache/airflow/issues/29315#issuecomment-1437565241 Assigned you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated (69babdcf74 -> ee0a56a2ca)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git from 69babdcf74 Use provided `USE_AIRFLOW_VERSION` env in verify providers step (#29081) add ee0a56a2ca Trigger Class migration to internal API (#29099) No new revisions were added by this update. Summary of changes: airflow/api_internal/endpoints/rpc_api_endpoint.py | 9 - airflow/models/trigger.py | 10 +- 2 files changed, 17 insertions(+), 2 deletions(-)
[GitHub] [airflow] potiuk closed issue #28613: AIP-44 Migrate Trigger class to Internal API
potiuk closed issue #28613: AIP-44 Migrate Trigger class to Internal API URL: https://github.com/apache/airflow/issues/28613 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk merged pull request #29099: Trigger Class migration to internal API
potiuk merged PR #29099: URL: https://github.com/apache/airflow/pull/29099 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #29348: Don't push secret in XCOM in BigQueryCreateDataTransferOperator
potiuk commented on PR #29348: URL: https://github.com/apache/airflow/pull/29348#issuecomment-1437563814 > And I guess we need to add tests for that, for avoid situation that somehow it return in the future. Agree -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated (54c4d590ca -> 69babdcf74)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git from 54c4d590ca fix clear dag run openapi spec responses by adding additional return type (#29600) add 69babdcf74 Use provided `USE_AIRFLOW_VERSION` env in verify providers step (#29081) No new revisions were added by this update. Summary of changes: scripts/in_container/verify_providers.py | 30 -- 1 file changed, 16 insertions(+), 14 deletions(-)
[GitHub] [airflow] potiuk merged pull request #29081: Use provided `USE_AIRFLOW_VERSION` env in verify providers step
potiuk merged PR #29081: URL: https://github.com/apache/airflow/pull/29081 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook
Taragolis commented on issue #29640: URL: https://github.com/apache/airflow/issues/29640#issuecomment-1437551366 Just for confirmation do you have same problem with same version of `boto3` and `botocore` if you call [S3.Client.download_file](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.download_file) or [S3.Client.download_fileobj](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.download_fileobj) over this file? Could you tried to run this instead of your code? ```python def download_from_s3_native(key: str, bucket_name: str, local_path: str) -> str: hook = S3Hook(aws_conn_id='s3_conn') s3_client = hook.conn with open(local_path, "wb") as data: s3_client.download_fileobj(key, bucket_name, data) return local_path ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #29495: Change permissions of config/password files created by airflow
potiuk commented on code in PR #29495: URL: https://github.com/apache/airflow/pull/29495#discussion_r1112322251 ## airflow/configuration.py: ## @@ -1545,6 +1548,12 @@ def initialize_config() -> AirflowConfigParser: return local_conf +def make_group_other_inaccessible(file_path: str): +with suppress(Exception): +permissions = os.stat(file_path) +os.chmod(file_path, permissions.st_mode & (stat.S_IRUSR | stat.S_IWUSR)) Review Comment: Changed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #29495: Change permissions of config/password files created by airflow
potiuk commented on code in PR #29495: URL: https://github.com/apache/airflow/pull/29495#discussion_r1112320830 ## airflow/configuration.py: ## @@ -1545,6 +1548,12 @@ def initialize_config() -> AirflowConfigParser: return local_conf +def make_group_other_inaccessible(file_path: str): +with suppress(Exception): +permissions = os.stat(file_path) +os.chmod(file_path, permissions.st_mode & (stat.S_IRUSR | stat.S_IWUSR)) Review Comment: I'd say log an error, not to raise it. There might be various reasons why changing permissions is not possible - for example when the filesystem config file gets generated on happens to be NFS. Failing hard in this case is kinda strange at this point as the file has already been generated (so next time if we run airflow it will succeed because the whole branch is skiped in this case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis commented on issue #29640: NoBoundaryInMultipartDefect raised using S3Hook
Taragolis commented on issue #29640: URL: https://github.com/apache/airflow/issues/29640#issuecomment-1437543247 > shouldn't be such an exception :) `¯\_(ツ)_/¯` https://github.com/boto/botocore/issues/2608 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] steren commented on a diff in pull request #28525: Add CloudRunExecuteJobOperator
steren commented on code in PR #28525: URL: https://github.com/apache/airflow/pull/28525#discussion_r1112314074 ## airflow/providers/google/cloud/operators/cloud_run.py: ## @@ -0,0 +1,115 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +"""This module contains Google Cloud Run Jobs operators.""" +from __future__ import annotations + +from typing import TYPE_CHECKING, Sequence + +from airflow.models import BaseOperator +from airflow.providers.google.cloud.hooks.cloud_run import CloudRunJobHook +from airflow.providers.google.cloud.links.cloud_run import CloudRunJobExecutionLink + +if TYPE_CHECKING: +from airflow.utils.context import Context + + +class CloudRunExecuteJobOperator(BaseOperator): +""" +Executes an existing Cloud Run job. + +.. seealso:: +For more information on how to use this operator, take a look at the guide: +:ref:`howto/operator:CloudRunExecuteJobOperator` + +:param job_name: The name of the cloud run job to execute +:param region: The region of the Cloud Run job (for example europe-west1) +:param project_id: The ID of the GCP project that owns the job. +If set to ``None`` or missing, the default project_id +from the GCP connection is used. +:param gcp_conn_id: The connection ID to use to connect to Google Cloud. +:param delegate_to: The account to impersonate, if any. +For this to work, the service account making the request +must have domain-wide delegation enabled. +:param wait_until_finished: If True, wait for the end of job execution +before exiting. If False (default), only submits job. +:param impersonation_chain: Optional service account to impersonate +using short-term credentials, or chained list of accounts required +to get the access_token of the last account in the list, +which will be impersonated in the request. If set as a string, the +account must grant the originating account the Service Account Token +Creator IAM role. If set as a sequence, the identities from the list +must grant Service Account Token Creator IAM role to the directly +preceding identity, with first account from the list granting this +role to the originating account (templated). +""" + +template_fields: Sequence[str] = ("job_name", "region", "project_id", "gcp_conn_id") +operator_extra_links = (CloudRunJobExecutionLink(),) + +def __init__( +self, +job_name: str, Review Comment: Maybe this could be made optional? I assume Airflow users don't necessarily want to name the job created but could be fine with an auto-generated job name. I'm not sure how Airflow works for other operators. It's up to you. ## airflow/providers/google/cloud/hooks/cloud_run.py: ## @@ -0,0 +1,245 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +"""This module contains a Google Cloud Run Hook.""" +from __future__ import annotations + +import json +import time +from typing import Any, Callable, Dict, List, Sequence, Union, cast + +from google.api_core.client_options import ClientOptions +from googleapiclient.discovery import build + +from airflow.providers.google.common.hooks.base_google import GoogleBaseHook +from airflow.utils.log.logging_mixin import LoggingMixin + +DEFAULT_CLOUD_RUN_REGION = "us-central1" + + +class CloudRunJobSteps: +"
[GitHub] [airflow] steren commented on a diff in pull request #28525: Add CloudRunExecuteJobOperator
steren commented on code in PR #28525: URL: https://github.com/apache/airflow/pull/28525#discussion_r1112314074 ## airflow/providers/google/cloud/operators/cloud_run.py: ## @@ -0,0 +1,115 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +"""This module contains Google Cloud Run Jobs operators.""" +from __future__ import annotations + +from typing import TYPE_CHECKING, Sequence + +from airflow.models import BaseOperator +from airflow.providers.google.cloud.hooks.cloud_run import CloudRunJobHook +from airflow.providers.google.cloud.links.cloud_run import CloudRunJobExecutionLink + +if TYPE_CHECKING: +from airflow.utils.context import Context + + +class CloudRunExecuteJobOperator(BaseOperator): +""" +Executes an existing Cloud Run job. + +.. seealso:: +For more information on how to use this operator, take a look at the guide: +:ref:`howto/operator:CloudRunExecuteJobOperator` + +:param job_name: The name of the cloud run job to execute +:param region: The region of the Cloud Run job (for example europe-west1) +:param project_id: The ID of the GCP project that owns the job. +If set to ``None`` or missing, the default project_id +from the GCP connection is used. +:param gcp_conn_id: The connection ID to use to connect to Google Cloud. +:param delegate_to: The account to impersonate, if any. +For this to work, the service account making the request +must have domain-wide delegation enabled. +:param wait_until_finished: If True, wait for the end of job execution +before exiting. If False (default), only submits job. +:param impersonation_chain: Optional service account to impersonate +using short-term credentials, or chained list of accounts required +to get the access_token of the last account in the list, +which will be impersonated in the request. If set as a string, the +account must grant the originating account the Service Account Token +Creator IAM role. If set as a sequence, the identities from the list +must grant Service Account Token Creator IAM role to the directly +preceding identity, with first account from the list granting this +role to the originating account (templated). +""" + +template_fields: Sequence[str] = ("job_name", "region", "project_id", "gcp_conn_id") +operator_extra_links = (CloudRunJobExecutionLink(),) + +def __init__( +self, +job_name: str, Review Comment: Maybe name could be optional? I assume Airflow users don't necessarily want to name the job created but could be fine with an auto-generated job name. I'm not sure how Airflow works for other operators. It's up to you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Taragolis commented on pull request #29489: Fix failing multiple-output-inference tests for Python 3.8+
Taragolis commented on PR #29489: URL: https://github.com/apache/airflow/pull/29489#issuecomment-1437535260 > Then we will fix it there :) That why I mention this PR, might be it would save some time when it time come to Python 3.11 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated: fix clear dag run openapi spec responses by adding additional return type (#29600)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 54c4d590ca fix clear dag run openapi spec responses by adding additional return type (#29600) 54c4d590ca is described below commit 54c4d590caa9399f9fb331811531e0ea8d56aa41 Author: Alexander Liotta AuthorDate: Mon Feb 20 15:53:17 2023 -0500 fix clear dag run openapi spec responses by adding additional return type (#29600) --- airflow/api_connexion/openapi/v1.yaml| 4 +++- airflow/www/static/js/types/api-generated.ts | 3 ++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/airflow/api_connexion/openapi/v1.yaml b/airflow/api_connexion/openapi/v1.yaml index f64cd7c8c8..505d2b6439 100644 --- a/airflow/api_connexion/openapi/v1.yaml +++ b/airflow/api_connexion/openapi/v1.yaml @@ -890,7 +890,9 @@ paths: content: application/json: schema: -$ref: '#/components/schemas/DAGRun' +anyOf: + - $ref: '#/components/schemas/DAGRun' + - $ref: '#/components/schemas/TaskInstanceCollection' '400': $ref: '#/components/responses/BadRequest' '401': diff --git a/airflow/www/static/js/types/api-generated.ts b/airflow/www/static/js/types/api-generated.ts index 7228d5fdf9..e206bc1ba4 100644 --- a/airflow/www/static/js/types/api-generated.ts +++ b/airflow/www/static/js/types/api-generated.ts @@ -3002,7 +3002,8 @@ export interface operations { /** Success. */ 200: { content: { - "application/json": components["schemas"]["DAGRun"]; + "application/json": Partial & +Partial; }; }; 400: components["responses"]["BadRequest"];