[GitHub] [airflow] uranusjr opened a new pull request, #28982: Resolve all variables in pickled XCom iterator

2023-01-16 Thread GitBox


uranusjr opened a new pull request, #28982:
URL: https://github.com/apache/airflow/pull/28982

   This is needed for cross-process communication.
   
   See discussion starting here: 
https://github.com/apache/airflow/pull/28191#issuecomment-1384931149
   
   One thing I’m not sure about is the test; testing against a raw SQL string 
feels very brittle and can break from engine to engine, or SQLAlchemy version 
to version. But I can’t think of a better way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #28191: Add custom pickling hooks to LazyXComAccess

2023-01-16 Thread GitBox


uranusjr commented on PR #28191:
URL: https://github.com/apache/airflow/pull/28191#issuecomment-1384977262

   Only when you want to pass results of a dynamic task into a subprocess (most 
commonly virtualenv, Docker, and Kubernetes, I think).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #28191: Add custom pickling hooks to LazyXComAccess

2023-01-16 Thread GitBox


potiuk commented on PR #28191:
URL: https://github.com/apache/airflow/pull/28191#issuecomment-1384945654

   > Edit: Found it. We need literal_binds. 
https://docs.sqlalchemy.org/en/14/faq/sqlexpressions.html#rendering-bound-parameters-inline
   
   Yeah. That's cool. Just wonder - is the current form happening often? Or is 
it only when there is a specific way of using dynamic tasks (was not obvious 
for me when reading the description/error?). I think answer to that question 
might determine  whether 2.5.1rc1 is good to release or not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #28191: Add custom pickling hooks to LazyXComAccess

2023-01-16 Thread GitBox


uranusjr commented on PR #28191:
URL: https://github.com/apache/airflow/pull/28191#issuecomment-1384938029

   Yes it is. The SQLAlchemy says that should give us a fully compiled SQL, but 
apparently it does not actually. I don’t know why (or _how_ can we produce a 
fully compiled SQL).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #28191: Add custom pickling hooks to LazyXComAccess

2023-01-16 Thread GitBox


potiuk commented on PR #28191:
URL: https://github.com/apache/airflow/pull/28191#issuecomment-1384934338

   Isn't that SQL automatically generated by `query.statement.compile` from an 
ORM mapping ? At least that was my impression what happens here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #28191: Add custom pickling hooks to LazyXComAccess

2023-01-16 Thread GitBox


uranusjr commented on PR #28191:
URL: https://github.com/apache/airflow/pull/28191#issuecomment-1384931149

   Actually I think this is something simpler and unrelated. With SQLite I get
   
   ```
   INFO - sqlalchemy.exc.ProgrammingError: (sqlite3.ProgrammingError) Incorrect 
number of bindings supplied. The current statement uses 4, and there are 0 
supplied.
   INFO - [SQL: SELECT xcom.value
   INFO - FROM xcom JOIN dag_run ON xcom.dag_run_id = dag_run.id
   INFO - WHERE xcom."key" = ? AND xcom.task_id = ? AND xcom.dag_id = ? AND 
xcom.run_id = ? ORDER BY xcom.map_index ASC]
   ```
   
   Looks like a simple SQL error. Please open a new issue with better 
description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #28191: Add custom pickling hooks to LazyXComAccess

2023-01-16 Thread GitBox


potiuk commented on PR #28191:
URL: https://github.com/apache/airflow/pull/28191#issuecomment-1384921406

   cc: @pierrejeambrun @ephraimbuddy -> potential blocker for 2.5.1 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #28191: Add custom pickling hooks to LazyXComAccess

2023-01-16 Thread GitBox


potiuk commented on PR #28191:
URL: https://github.com/apache/airflow/pull/28191#issuecomment-1384921032

   cc: @uranusjr -> seems that MySQL hits us back again. I am not 100% sure 
whether this is a blocker to 2.5.1 - but it might be if it turns out that it is 
going to affect all MySQL users with Dynamic Task mapping (certainly it has 
potential to impact them).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] hterik commented on a diff in pull request #26639: Multi-threads support for processing diff queues in Kubernetes Executor

2023-01-16 Thread GitBox


hterik commented on code in PR #26639:
URL: https://github.com/apache/airflow/pull/26639#discussion_r1071805534


##
airflow/executors/kubernetes_executor.py:
##
@@ -634,13 +690,37 @@ def sync(self) -> None:
 self.log.debug("self.queued: %s", self.queued_tasks)
 self.kube_scheduler.sync()
 
-last_resource_version: dict[str, str] = defaultdict(lambda: "0")
-while True:
+"""processing result queue"""
+self.last_resource_version = defaultdict(lambda: "0")
+multi_threads_queue_process(

Review Comment:
   multi_threads_queue_process will start and stop multiple threads for each 
sync() tick. I'm a bit out of touch on the real world overhead of this but I've 
always been taught that starting a thread comes with a lot of overhead. 
   
   Can one use a fixed ThreadPool instead?  This would also make the batching 
and queueing logic a lot easier, as dstandish suggested 
[above](https://github.com/apache/airflow/pull/26639/files#r1036345333) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] hterik commented on a diff in pull request #26639: Multi-threads support for processing diff queues in Kubernetes Executor

2023-01-16 Thread GitBox


hterik commented on code in PR #26639:
URL: https://github.com/apache/airflow/pull/26639#discussion_r1071796755


##
airflow/executors/kubernetes_executor.py:
##
@@ -599,8 +655,31 @@ def sync(self) -> None:
 raise AirflowException(NOT_STARTED_MESSAGE)
 self.kube_scheduler.sync()
 
-last_resource_version = None
-while True:
+"""processing result queue"""
+multi_threads_queue_process(
+queue_size=self.result_queue.qsize(),
+queue_type='result',
+process_method=self.process_result_queue,

Review Comment:
   Today they might only use the `[]` and `in` operators, which in isolation 
are thread-safe, but tomorrow someone might add a for-loop or other 
side-effects, giving you "changed size during iteration" error. It is a lot of 
complexity to keep in mind if this can be modified from several threads at the 
same time.
   
   I'm not 100% sure but i believe this piece would potentially break,
   
https://github.com/apache/airflow/blob/65010fda091242870a410c65478eae362899763b/airflow/executors/base_executor.py#L334-L336,
 if some thread is modifying event_buffer while one thread is iterating the 
loop, because it takes a snapshot of the keys before starting to pop them off.
   
   My suggestion is to isolate the multithreaded operations to more pure 
functions, then return their results on a queue which is consumed in the same 
main thread as is handling it today. Otherwise this could benefit from more 
documentation, on which member-properties that must be protected by locks, or 
what type of operations that are allowed in each function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on a diff in pull request #28979: Fix rendering parameters in PapermillOperator

2023-01-16 Thread GitBox


potiuk commented on code in PR #28979:
URL: https://github.com/apache/airflow/pull/28979#discussion_r1071795648


##
airflow/providers/papermill/operators/papermill.py:
##
@@ -57,36 +59,43 @@ class PapermillOperator(BaseOperator):
 def __init__(
 self,
 *,
-input_nb: str | None = None,
-output_nb: str | None = None,
+input_nb: str | NoteBook | None = None,
+output_nb: str | NoteBook | None = None,

Review Comment:
   I think it's OK to check their existence in the constructor.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[airflow] 01/01: Read logs from all containers in KPO

2023-01-16 Thread dstandish
This is an automated email from the ASF dual-hosted git repository.

dstandish pushed a commit to branch mlnsharma/main
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 292c23283e3f97b8d281fbcd581c2f4d7abfa98e
Author: Narasimha Sharma 
AuthorDate: Mon Jan 16 22:10:55 2023 -0800

Read logs from all containers in KPO
---
 .../cncf/kubernetes/operators/kubernetes_pod.py| 13 +++-
 .../providers/cncf/kubernetes/utils/pod_manager.py | 87 --
 kubernetes_tests/test_kubernetes_pod_operator.py   |  2 +-
 .../cncf/kubernetes/utils/test_pod_manager.py  | 23 ++
 4 files changed, 117 insertions(+), 8 deletions(-)

diff --git a/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py 
b/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
index c34fc5c02b..96fb98fb53 100644
--- a/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
+++ b/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
@@ -168,6 +168,9 @@ class KubernetesPodOperator(BaseOperator):
 :param labels: labels to apply to the Pod. (templated)
 :param startup_timeout_seconds: timeout in seconds to startup the pod.
 :param get_logs: get the stdout of the container as logs of the tasks.
+:param log_containers: list of container names or bool value to collect 
logs.
+If bool value is True, all container logs are collected,
+if False, only 'base' container logs are collected.
 :param image_pull_policy: Specify a policy to cache or always pull an 
image.
 :param annotations: non-identifying metadata you can attach to the Pod.
 Can be a large range of data, and can include characters
@@ -248,6 +251,7 @@ class KubernetesPodOperator(BaseOperator):
 reattach_on_restart: bool = True,
 startup_timeout_seconds: int = 120,
 get_logs: bool = True,
+log_containers: list[str] | bool = False,
 image_pull_policy: str | None = None,
 annotations: dict | None = None,
 container_resources: k8s.V1ResourceRequirements | None = None,
@@ -311,6 +315,7 @@ class KubernetesPodOperator(BaseOperator):
 self.cluster_context = cluster_context
 self.reattach_on_restart = reattach_on_restart
 self.get_logs = get_logs
+self.log_containers = log_containers
 self.image_pull_policy = image_pull_policy
 self.node_selector = node_selector or {}
 self.annotations = annotations or {}
@@ -496,9 +501,13 @@ class KubernetesPodOperator(BaseOperator):
 self.await_pod_start(pod=self.pod)
 
 if self.get_logs:
-self.pod_manager.fetch_container_logs(
+# if log_containers is False, fetch logs from base container, 
otherwise
+# fetch logs for all containers or for the specified input 
list of container names
+self.pod_manager.fetch_input_container_logs(
 pod=self.pod,
-container_name=self.BASE_CONTAINER_NAME,
+log_containers=(
+self.log_containers if self.log_containers else 
[self.BASE_CONTAINER_NAME]
+),
 follow=True,
 )
 else:
diff --git a/airflow/providers/cncf/kubernetes/utils/pod_manager.py 
b/airflow/providers/cncf/kubernetes/utils/pod_manager.py
index 56ef95eef0..1d1b2f06a6 100644
--- a/airflow/providers/cncf/kubernetes/utils/pod_manager.py
+++ b/airflow/providers/cncf/kubernetes/utils/pod_manager.py
@@ -84,6 +84,20 @@ def container_is_running(pod: V1Pod, container_name: str) -> 
bool:
 return container_status.state.running is not None
 
 
+def container_is_completed(pod: V1Pod, container_name: str) -> bool:
+"""
+Examines V1Pod ``pod`` to determine whether ``container_name`` is running.
+If that container is present and completed, returns True.  Returns False 
otherwise.
+"""
+container_statuses = pod.status.container_statuses if pod and pod.status 
else None
+if not container_statuses:
+return False
+container_status = next((status for status in container_statuses if 
status.name == container_name), None)
+if not container_status:
+return False
+return container_status.state.terminated is not None
+
+
 def get_container_termination_message(pod: V1Pod, container_name: str):
 with suppress(AttributeError, TypeError):
 container_statuses = pod.status.container_statuses
@@ -264,15 +278,68 @@ class PodManager(LoggingMixin):
 )
 time.sleep(1)
 
-def await_container_completion(self, pod: V1Pod, container_name: str) -> 
None:
+def fetch_input_container_logs(
+self, pod: V1Pod, log_containers: list[str] | bool, follow=False
+) -> list[PodLoggingStatus]:
+"""
+Follow the logs of containers in the pod specified by input parameter 
and stream to airflow logging.
+Returns when all the contain

[airflow] branch mlnsharma/main created (now 292c23283e)

2023-01-16 Thread dstandish
This is an automated email from the ASF dual-hosted git repository.

dstandish pushed a change to branch mlnsharma/main
in repository https://gitbox.apache.org/repos/asf/airflow.git


  at 292c23283e Read logs from all containers in KPO

This branch includes the following new commits:

 new 292c23283e Read logs from all containers in KPO

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[GitHub] [airflow] mlnsharma opened a new pull request, #28981: Implemented log_containers option to read from all containers in KubernetesPodOperator

2023-01-16 Thread GitBox


mlnsharma opened a new pull request, #28981:
URL: https://github.com/apache/airflow/pull/28981

   Added a new 'log_containers' param to KubernetesPodOperator. This param can 
take a list of container names or a boolean True/False as input with default 
value being boolean False. 
   
   If boolean value False (default) is provided, only 'base' container logs are 
displayed, which is backward compatible. If boolean value True is provided, all 
container logs (except for airflow-xcom-sidecar container) are displayed. If a 
list of container names is provided, only those container logs are displayed.
   
   related: #27282


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] mlnsharma closed pull request #27802: Read logs from all containers in pod for KubernetesPodOperator

2023-01-16 Thread GitBox


mlnsharma closed pull request #27802: Read logs from all containers in pod for 
KubernetesPodOperator
URL: https://github.com/apache/airflow/pull/27802


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] AndrewTsao commented on pull request #28191: Add custom pickling hooks to LazyXComAccess

2023-01-16 Thread GitBox


AndrewTsao commented on PR #28191:
URL: https://github.com/apache/airflow/pull/28191#issuecomment-1384749135

   Hi, I tested #28146 got another exception like this. My SQL version 5.7. 
Python 3.8, OS is centos 7.9.
   
   python modules list:
   
   ```
   apache-airflow   2.5.1rc1
   apache-airflow-providers-celery  3.0.0
   apache-airflow-providers-common-sql  1.0.0
   apache-airflow-providers-ftp 3.0.0
   apache-airflow-providers-http3.0.0
   apache-airflow-providers-imap3.0.0
   apache-airflow-providers-microsoft-mssql 3.1.0
   apache-airflow-providers-microsoft-psrp  2.1.0
   apache-airflow-providers-microsoft-winrm 3.0.0
   apache-airflow-providers-mysql   3.0.0
   apache-airflow-providers-redis   3.0.0
   apache-airflow-providers-samba   4.0.0
   apache-airflow-providers-sftp3.0.0
   apache-airflow-providers-sqlite  3.0.0
   apache-airflow-providers-ssh 3.0.0
   
   apache-airflow-providers-common-sql  1.0.0
   apache-airflow-providers-microsoft-mssql 3.1.0
   apache-airflow-providers-mysql   3.0.0
   apache-airflow-providers-sqlite  3.0.0
   Flask-SQLAlchemy 2.5.1
   marshmallow-sqlalchemy   0.26.1
   mysql-connector-python   8.0.29
   mysqlclient  2.1.1
   pymssql  2.2.5
   SQLAlchemy   1.4.46
   SQLAlchemy-JSONField 1.0.0
   SQLAlchemy-Utils 0.38.3
   ```
   ```sh
   [2023-01-17, 09:38:01 CST] {process_utils.py:179} INFO - Executing cmd: 
/home/andi/airflow/venv38/bin/python -m virtualenv /tmp/venvtb4apfe2 
--system-site-packages
   [2023-01-17, 09:38:01 CST] {process_utils.py:183} INFO - Output:
   [2023-01-17, 09:38:02 CST] {process_utils.py:187} INFO - created virtual 
environment CPython3.8.0.final.0-64 in 236ms
   [2023-01-17, 09:38:02 CST] {process_utils.py:187} INFO -   creator 
CPython3Posix(dest=/tmp/venvtb4apfe2, clear=False, no_vcs_ignore=False, 
global=True)
   [2023-01-17, 09:38:02 CST] {process_utils.py:187} INFO -   seeder 
FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, 
via=copy, app_data_dir=/home/andi/.local/share/virtualenv)
   [2023-01-17, 09:38:02 CST] {process_utils.py:187} INFO - added seed 
packages: pip==22.3.1, setuptools==65.6.3, wheel==0.38.4
   [2023-01-17, 09:38:02 CST] {process_utils.py:187} INFO -   activators 
BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
   [2023-01-17, 09:38:02 CST] {process_utils.py:179} INFO - Executing cmd: 
/tmp/venvtb4apfe2/bin/pip install -r /tmp/venvtb4apfe2/requirements.txt
   [2023-01-17, 09:38:02 CST] {process_utils.py:183} INFO - Output:
   [2023-01-17, 09:38:03 CST] {process_utils.py:187} INFO - Looking in indexes: 
http://pypi:8081
   [2023-01-17, 09:38:08 CST] {process_utils.py:179} INFO - Executing cmd: 
/tmp/venvtb4apfe2/bin/python /tmp/venvtb4apfe2/script.py 
/tmp/venvtb4apfe2/script.in /tmp/venvtb4apfe2/script.out 
/tmp/venvtb4apfe2/string_args.txt
   [2023-01-17, 09:38:08 CST] {process_utils.py:183} INFO - Output:
   [2023-01-17, 09:38:10 CST] {process_utils.py:187} INFO - Traceback (most 
recent call last):
   [2023-01-17, 09:38:10 CST] {process_utils.py:187} INFO -   File 
"/home/andi/airflow/venv38/lib/python3.8/site-packages/sqlalchemy/engine/base.py",
 line 1900, in _execute_context
   [2023-01-17, 09:38:10 CST] {process_utils.py:187} INFO - 
self.dialect.do_execute(
   [2023-01-17, 09:38:10 CST] {process_utils.py:187} INFO -   File 
"/home/andi/airflow/venv38/lib/python3.8/site-packages/sqlalchemy/engine/default.py",
 line 736, in do_execute
   [2023-01-17, 09:38:10 CST] {process_utils.py:187} INFO - 
cursor.execute(statement, parameters)
   [2023-01-17, 09:38:10 CST] {process_utils.py:187} INFO -   File 
"/home/andi/airflow/venv38/lib/python3.8/site-packages/MySQLdb/cursors.py", 
line 206, in execute
   [2023-01-17, 09:38:10 CST] {process_utils.py:187} INFO - res = 
self._query(query)
   [2023-01-17, 09:38:10 CST] {process_utils.py:187} INFO -   File 
"/home/andi/airflow/venv38/lib/python3.8/site-packages/MySQLdb/cursors.py", 
line 319, in _query
   [2023-01-17, 09:38:10 CST] {process_utils.py:187} INFO - db.query(q)
   [2023-01-17, 09:38:10 CST] {process_utils.py:187} INFO -   File 
"/home/andi/airflow/venv38/lib/python3.8/site-packages/MySQLdb/connections.py", 
line 254, in query
   [2023-01-17, 09:38:10 CST] {process_utils.py:187} INFO - 
_mysql.connection.query(self, query)
   [2023-01-17, 09:38:10 CST] {process_utils.py:187} INFO - 
MySQLdb.ProgrammingError: (1064, "You have an error in your SQL syntax; check 
the manual that corresponds to your MySQL server version for the right syntax 
to use near '%s AND xcom.task_id = %s AND xcom.dag_id = %s AND xcom.run_id = %s 
ORDER BY xcom' 

[GitHub] [airflow] uranusjr commented on a diff in pull request #28979: Fix rendering parameters in PapermillOperator

2023-01-16 Thread GitBox


uranusjr commented on code in PR #28979:
URL: https://github.com/apache/airflow/pull/28979#discussion_r1071672038


##
airflow/providers/papermill/operators/papermill.py:
##
@@ -33,6 +33,8 @@
 class NoteBook(File):
 """Jupyter notebook"""
 
+template_fields: ClassVar = {*File.template_fields, "parameters"}

Review Comment:
   ```suggestion
   template_fields: ClassVar[str] = {*File.template_fields, "parameters"}
   ```
   
   (I don’t think the annotation is actually needed at all though, it should 
inherit the type from `File`.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] uranusjr opened a new pull request, #28980: Ensure non-JSON types go through encoder

2023-01-16 Thread GitBox


uranusjr opened a new pull request, #28980:
URL: https://github.com/apache/airflow/pull/28980

   Additional encoding logic is added for frozenset and set so the output is 
JSON-compatible.
   
   Also removed some unnecessary error handling. The default encoder simply 
raises TypeError, so we don't need to catch TypeError from serialize().
   
   Fix #28741.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] Taragolis commented on a diff in pull request #28979: Fix rendering parameters in PapermillOperator

2023-01-16 Thread GitBox


Taragolis commented on code in PR #28979:
URL: https://github.com/apache/airflow/pull/28979#discussion_r1071666872


##
airflow/providers/papermill/operators/papermill.py:
##
@@ -57,36 +59,43 @@ class PapermillOperator(BaseOperator):
 def __init__(
 self,
 *,
-input_nb: str | None = None,
-output_nb: str | None = None,
+input_nb: str | NoteBook | None = None,
+output_nb: str | NoteBook | None = None,

Review Comment:
   may be we should also make this fields mandatory



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] Taragolis commented on pull request #28979: Fix rendering parameters in PapermillOperator

2023-01-16 Thread GitBox


Taragolis commented on PR #28979:
URL: https://github.com/apache/airflow/pull/28979#issuecomment-1384729239

   cc: @TPapajCin @marvinfretly @nicnguyen3103


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] Taragolis opened a new pull request, #28979: Fix rendering parameters in PapermillOperator

2023-01-16 Thread GitBox


Taragolis opened a new pull request, #28979:
URL: https://github.com/apache/airflow/pull/28979

   Add missing "parameters" into `NoteBook`'s templated fields and remove 
redundant loop on inlets/outlets
   
   related: #28977


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] uranusjr opened a new pull request, #28978: Minor improvements to serde helpers

2023-01-16 Thread GitBox


uranusjr opened a new pull request, #28978:
URL: https://github.com/apache/airflow/pull/28978

   No functional changes, just some cleanups and docstring formatting. 
Extracted from #28742


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] uranusjr closed pull request #28742: Fix serialize() usages in custom JSON encoders

2023-01-16 Thread GitBox


uranusjr closed pull request #28742: Fix serialize() usages in custom JSON 
encoders
URL: https://github.com/apache/airflow/pull/28742


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #28742: Fix serialize() usages in custom JSON encoders

2023-01-16 Thread GitBox


uranusjr commented on PR #28742:
URL: https://github.com/apache/airflow/pull/28742#issuecomment-1384707382

   Rethinking this, I _think_ this can simply be achieved by a better 
implemented `default`. I’ll split out that main change and other minor 
improvements here into two PRs so things are easier to manage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] shyft-mike commented on a diff in pull request #28846: Updated app to support configuring the caching hash method for FIPS

2023-01-16 Thread GitBox


shyft-mike commented on code in PR #28846:
URL: https://github.com/apache/airflow/pull/28846#discussion_r1071642334


##
airflow/www/app.py:
##
@@ -131,7 +132,29 @@ def create_app(config=None, testing=False):
 
 init_robots(flask_app)
 
+# Configure caching
+webserver_caching_hash_method = conf.get(section="webserver", 
key="CACHING_HASH_METHOD", fallback=None)
 cache_config = {"CACHE_TYPE": "flask_caching.backends.filesystem", 
"CACHE_DIR": gettempdir()}

Review Comment:
   Ah, great, thanks! Addressing the other comments now 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on issue #28399: Dataproc Generator should allow creating spot instances

2023-01-16 Thread GitBox


github-actions[bot] commented on issue #28399:
URL: https://github.com/apache/airflow/issues/28399#issuecomment-1384687968

   This issue has been automatically marked as stale because it has been open 
for 30 days with no response from the author. It will be closed in next 7 days 
if no further activity occurs from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #28816: introduce a method to convert dictionaries to boto-style key-value lists

2023-01-16 Thread GitBox


ferruzzi commented on code in PR #28816:
URL: https://github.com/apache/airflow/pull/28816#discussion_r1071626275


##
airflow/providers/amazon/aws/hooks/s3.py:
##
@@ -42,6 +42,7 @@
 from airflow.exceptions import AirflowException
 from airflow.providers.amazon.aws.exceptions import S3HookUriParseFailure
 from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook
+from airflow.providers.amazon.aws.utils.aws_api import format_tags

Review Comment:
   I could get behind that.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk closed issue #26931: Templating (like {{ ds }} ) stopped working in papermill after upgrade from 2.3.x to 2.4.x

2023-01-16 Thread GitBox


potiuk closed issue #26931: Templating (like {{ ds }} ) stopped working in 
papermill after upgrade from 2.3.x to 2.4.x
URL: https://github.com/apache/airflow/issues/26931


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] Dinghang commented on a diff in pull request #26639: Multi-threads support for processing diff queues in Kubernetes Executor

2023-01-16 Thread GitBox


Dinghang commented on code in PR #26639:
URL: https://github.com/apache/airflow/pull/26639#discussion_r1071622999


##
airflow/executors/kubernetes_executor.py:
##
@@ -62,6 +64,50 @@
 KubernetesWatchType = Tuple[str, str, Optional[str], Dict[str, str], str]
 
 
+def multi_threads_queue_process(
+queue_size: int,
+queue_type: str,
+process_method: Callable,
+max_threads: int,
+log: Logger,
+batch_size: Optional[int] = None,
+) -> None:
+"""
+Helper method to enable multi-threads for processing queues used with 
kubernetes executor
+:param queue_size: the size of the queue getting processed
+:param queue_type: the type of the queue
+:param process_method: the real method processing the queue
+:param max_threads: the max num of threads to be used
+:param log: log
+:param batch_size: the max num of items we want to process in this round.
+   If it's not set, the current queue size will be used.
+"""
+if queue_size == 0:
+log.info(f'There is no item to process in the {queue_type} queue.')
+return
+
+start_time = time.time()
+log.info(f'Start processing {queue_type} queue with at most {max_threads} 
threads.')
+
+batch_size = min(batch_size or queue_size, queue_size)
+max_threads = min(max_threads, queue_size)
+
+threads = []
+quotient, remainder = divmod(batch_size, max_threads)
+for i in range(max_threads):
+sub_batch_size = quotient + 1 if i < remainder else quotient
+t = Thread(target=process_method, args=[sub_batch_size])
+threads.append(t)
+t.start()
+for t in threads:
+t.join()

Review Comment:
   Hi @dstandish  thanks for the info. Most likely would like to keep the 
current implementation. About the "one thread may finish more quickly than the 
others". This could happen but it's not a problem since we are focusing on 
improving is for extreme case with hundreds or thousands items in the queue. 
Overall threads are making it more efficient. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #28548: Move comment to the right position

2023-01-16 Thread GitBox


potiuk commented on PR #28548:
URL: https://github.com/apache/airflow/pull/28548#issuecomment-1384656059

   Closing the change - seems not justified.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk closed pull request #28548: Move comment to the right position

2023-01-16 Thread GitBox


potiuk closed pull request #28548: Move comment to the right position
URL: https://github.com/apache/airflow/pull/28548


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] hussein-awala commented on a diff in pull request #28961: Emit DataDog statsd metrics with metadata tags

2023-01-16 Thread GitBox


hussein-awala commented on code in PR #28961:
URL: https://github.com/apache/airflow/pull/28961#discussion_r1071620930


##
airflow/config_templates/config.yml:
##
@@ -851,6 +851,13 @@ metrics:
   type: string
   example: ~
   default: ""
+statsd_datadog_metrics_tags:
+  description: |
+If set to True, Airflow will add metadata as tags for some of the 
emitted metrics
+  version_added: 2.6.0
+  type: boolean
+  example: ~
+  default: "False"

Review Comment:
   Nothing needs to be changed, I'll change the default value to `True`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[airflow] branch main updated (3adaa9b34e -> 65010fda09)

2023-01-16 Thread pierrejeambrun
This is an automated email from the ASF dual-hosted git repository.

pierrejeambrun pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


from 3adaa9b34e AWS system tests selective log purge (#28819)
 add 65010fda09 Fix for LocalKubernetesExecutor scheduler is not serving 
logs (#28638)

No new revisions were added by this update.

Summary of changes:
 airflow/cli/commands/scheduler_command.py  | 2 +-
 airflow/executors/base_executor.py | 2 ++
 airflow/executors/celery_kubernetes_executor.py| 2 ++
 airflow/executors/local_executor.py| 2 ++
 airflow/executors/local_kubernetes_executor.py | 2 ++
 airflow/executors/sequential_executor.py   | 2 ++
 tests/cli/commands/test_scheduler_command.py   | 1 +
 tests/executors/test_base_executor.py  | 4 
 tests/executors/test_celery_kubernetes_executor.py | 3 +++
 tests/executors/test_local_executor.py | 3 +++
 tests/executors/test_local_kubernetes_executor.py  | 3 +++
 tests/executors/test_sequential_executor.py| 3 +++
 12 files changed, 28 insertions(+), 1 deletion(-)



[GitHub] [airflow] hussein-awala commented on a diff in pull request #28961: Emit DataDog statsd metrics with metadata tags

2023-01-16 Thread GitBox


hussein-awala commented on code in PR #28961:
URL: https://github.com/apache/airflow/pull/28961#discussion_r1071620409


##
airflow/dag_processing/processor.py:
##
@@ -484,7 +486,13 @@ def manage_slas(self, dag: DAG, session: Session = None) 
-> None:
 callback(dag, task_list, blocking_task_list, slas, 
blocking_tis)
 notification_sent = True
 except Exception:
-Stats.incr("sla_callback_notification_failure")
+Stats.incr(
+"sla_callback_notification_failure",
+tags={
+"dag_id": dag.dag_id,
+"func_name": callback.func_name,  # type: 
ignore[attr-defined]

Review Comment:
   Actually this is the way to get the callback name in python 2, but it's 
removed in python 3 and replaced by `callback.__name__`, so we need to fix this.
   Would you prefer I fix it here or in a separate PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] pierrejeambrun merged pull request #28638: Fix for LocalKubernetesExecutor scheduler is not serving logs

2023-01-16 Thread GitBox


pierrejeambrun merged PR #28638:
URL: https://github.com/apache/airflow/pull/28638


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] Dinghang commented on a diff in pull request #26639: Multi-threads support for processing diff queues in Kubernetes Executor

2023-01-16 Thread GitBox


Dinghang commented on code in PR #26639:
URL: https://github.com/apache/airflow/pull/26639#discussion_r1052930708


##
airflow/executors/kubernetes_executor.py:
##
@@ -370,7 +416,17 @@ def sync(self) -> None:
 """
 self.log.debug("Syncing KubernetesExecutor")
 self._health_check_kube_watcher()
-while True:
+
+multi_threads_queue_process(
+queue_size=self.watcher_queue.qsize(),
+queue_type='watcher',
+process_method=self.process_watcher_queue,

Review Comment:
   Hi @dstandish , not sure what you meant network calls here. Here, 
process_watcher_queue is just a method to process items in the watcher queue.



##
airflow/executors/kubernetes_executor.py:
##
@@ -599,8 +655,31 @@ def sync(self) -> None:
 raise AirflowException(NOT_STARTED_MESSAGE)
 self.kube_scheduler.sync()
 
-last_resource_version = None
-while True:
+"""processing result queue"""
+multi_threads_queue_process(
+queue_size=self.result_queue.qsize(),
+queue_type='result',
+process_method=self.process_result_queue,

Review Comment:
   Hi @hterik , that's a good point! both self.running and self.event_buffer 
are doing some atomic operations. AFAIK, they are thread-safe.



##
airflow/executors/kubernetes_executor.py:
##
@@ -370,7 +416,17 @@ def sync(self) -> None:
 """
 self.log.debug("Syncing KubernetesExecutor")
 self._health_check_kube_watcher()
-while True:
+
+multi_threads_queue_process(
+queue_size=self.watcher_queue.qsize(),
+queue_type='watcher',
+process_method=self.process_watcher_queue,

Review Comment:
   Hi @dstandish , sorry for the late reply. Was AFK due to some personal 
issues. Just got back. Will be addressing comments accordingly. Gonna ping you 
again once it's ready. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ashb commented on a diff in pull request #28816: introduce a method to convert dictionaries to boto-style key-value lists

2023-01-16 Thread GitBox


ashb commented on code in PR #28816:
URL: https://github.com/apache/airflow/pull/28816#discussion_r1071617889


##
airflow/providers/amazon/aws/hooks/s3.py:
##
@@ -42,6 +42,7 @@
 from airflow.exceptions import AirflowException
 from airflow.providers.amazon.aws.exceptions import S3HookUriParseFailure
 from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook
+from airflow.providers.amazon.aws.utils.aws_api import format_tags

Review Comment:
   `airflow/providers/amazon/aws/utils/tags.py`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] Dinghang commented on pull request #26639: Multi-threads support for processing diff queues in Kubernetes Executor

2023-01-16 Thread GitBox


Dinghang commented on PR #26639:
URL: https://github.com/apache/airflow/pull/26639#issuecomment-1384642241

   Hi Team, gonna get some time addressing comments this week. Expect to reply 
back next week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] snjypl commented on pull request #28638: Fix for LocalKubernetesExecutor scheduler is not serving logs

2023-01-16 Thread GitBox


snjypl commented on PR #28638:
URL: https://github.com/apache/airflow/pull/28638#issuecomment-1384637725

   > Could use a rebase before merging, it's 20+ commits behind master
   
   Done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] snjypl opened a new pull request, #28976: Migrate DagFileProcessorManager.clear_nonexistent_import_errors to internal API

2023-01-16 Thread GitBox


snjypl opened a new pull request, #28976:
URL: https://github.com/apache/airflow/pull/28976

   Fixes: #28785 
   
   
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{issue_number}.significant.rst`, in 
[newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk closed pull request #28540: Using pypi proxy for speeding up development

2023-01-16 Thread GitBox


potiuk closed pull request #28540: Using pypi proxy for speeding up development
URL: https://github.com/apache/airflow/pull/28540


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #28540: Using pypi proxy for speeding up development

2023-01-16 Thread GitBox


potiuk commented on PR #28540:
URL: https://github.com/apache/airflow/pull/28540#issuecomment-1384632819

   Now when I think of it, I think this is not really needed, it seems like a 
really niche case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] JGoldman110 commented on issue #28947: Status of testing of Apache Airflow 2.5.1rc1

2023-01-16 Thread GitBox


JGoldman110 commented on issue #28947:
URL: https://github.com/apache/airflow/issues/28947#issuecomment-1384632106

   #28788 - Swagger UI looks good


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #28457: Add deferrable mode to DataprocCreateBatchOperator

2023-01-16 Thread GitBox


potiuk commented on PR #28457:
URL: https://github.com/apache/airflow/pull/28457#issuecomment-1384630289

   Needs conflict resolving.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #28520: Set pip install opts in _PIP_ADDITIONAL_REQUIREMENTS

2023-01-16 Thread GitBox


potiuk commented on PR #28520:
URL: https://github.com/apache/airflow/pull/28520#issuecomment-1384614065

   closing this one as it seems to be achievable by PIP_* variables.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk closed pull request #28520: Set pip install opts in _PIP_ADDITIONAL_REQUIREMENTS

2023-01-16 Thread GitBox


potiuk closed pull request #28520: Set pip install opts in 
_PIP_ADDITIONAL_REQUIREMENTS
URL: https://github.com/apache/airflow/pull/28520


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] vincbeck commented on a diff in pull request #28964: Add transfer operator S3 to (generic) SQL

2023-01-16 Thread GitBox


vincbeck commented on code in PR #28964:
URL: https://github.com/apache/airflow/pull/28964#discussion_r1071602052


##
airflow/providers/amazon/aws/transfers/s3_to_sql.py:
##
@@ -0,0 +1,158 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import os
+from typing import TYPE_CHECKING, Any, Sequence
+
+from airflow.exceptions import AirflowException
+from airflow.hooks.base import BaseHook
+from airflow.models import BaseOperator
+from airflow.providers.amazon.aws.hooks.s3 import S3Hook
+from airflow.providers.common.sql.hooks.sql import DbApiHook
+
+if TYPE_CHECKING:
+from airflow.utils.context import Context
+
+from typing_extensions import Literal
+
+try:
+import csv as csv
+except ImportError as e:
+from airflow.exceptions import AirflowOptionalProviderFeatureException
+
+raise AirflowOptionalProviderFeatureException from e
+
+
+class S3ToSqlOperator(BaseOperator):
+"""
+Loads Data from S3 into a SQL Database.
+Data should be readable as CSV.
+
+This operator downloads a file from an S3, reads it via `csv.reader`
+and inserts the data into a SQL database using `insert_rows` method.
+All SQL hooks are supported, as long as it is of type DbApiHook

Review Comment:
   I dont think it makes sense as default but I do think that providing at 
least one parser in code makes sense. A CSV parser would be excellent!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] vincbeck commented on a diff in pull request #28964: Add transfer operator S3 to (generic) SQL

2023-01-16 Thread GitBox


vincbeck commented on code in PR #28964:
URL: https://github.com/apache/airflow/pull/28964#discussion_r1071602052


##
airflow/providers/amazon/aws/transfers/s3_to_sql.py:
##
@@ -0,0 +1,158 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import os
+from typing import TYPE_CHECKING, Any, Sequence
+
+from airflow.exceptions import AirflowException
+from airflow.hooks.base import BaseHook
+from airflow.models import BaseOperator
+from airflow.providers.amazon.aws.hooks.s3 import S3Hook
+from airflow.providers.common.sql.hooks.sql import DbApiHook
+
+if TYPE_CHECKING:
+from airflow.utils.context import Context
+
+from typing_extensions import Literal
+
+try:
+import csv as csv
+except ImportError as e:
+from airflow.exceptions import AirflowOptionalProviderFeatureException
+
+raise AirflowOptionalProviderFeatureException from e
+
+
+class S3ToSqlOperator(BaseOperator):
+"""
+Loads Data from S3 into a SQL Database.
+Data should be readable as CSV.
+
+This operator downloads a file from an S3, reads it via `csv.reader`
+and inserts the data into a SQL database using `insert_rows` method.
+All SQL hooks are supported, as long as it is of type DbApiHook

Review Comment:
   I dont think it makes sense as default but I do think that providing at 
least one parser in code or documentation makes sense. A CSV parser would be 
excellent!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #27012: External python operator logging

2023-01-16 Thread GitBox


potiuk commented on issue #27012:
URL: https://github.com/apache/airflow/issues/27012#issuecomment-1384609843

   Are there any actions form that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #27802: Read logs from all containers in pod for KubernetesPodOperator

2023-01-16 Thread GitBox


potiuk commented on PR #27802:
URL: https://github.com/apache/airflow/pull/27802#issuecomment-1384606835

   CAn you solve the conflicts please @mlnsharma before others can take a look 
at it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #28452: TaskInstances do not succeed when using enable_logging=True option in DockerSwarmOperator

2023-01-16 Thread GitBox


potiuk commented on issue #28452:
URL: https://github.com/apache/airflow/issues/28452#issuecomment-1384606025

   Marked it as good first issue, hopefully someone wil take a look at this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] harishkrao commented on a diff in pull request #28950: Sensor for Databricks partition and table changes

2023-01-16 Thread GitBox


harishkrao commented on code in PR #28950:
URL: https://github.com/apache/airflow/pull/28950#discussion_r1071586211


##
airflow/providers/databricks/sensors/databricks.py:
##
@@ -0,0 +1,231 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+"""This module contains Databricks sensors."""
+
+from __future__ import annotations
+
+from datetime import datetime, timedelta
+from typing import Any, Callable, Sequence
+
+from airflow.exceptions import AirflowException
+from airflow.providers.common.sql.hooks.sql import fetch_all_handler
+from airflow.providers.databricks.hooks.databricks_sql import DatabricksSqlHook
+from airflow.sensors.base import BaseSensorOperator
+from airflow.utils.context import Context
+
+
+class DatabricksSqlSensor(BaseSensorOperator):
+"""
+Generic Databricks SQL sensor.

Review Comment:
   Thank you, good points keeping the SQL sensor generic. I am working on 
modifying the branch accordingly. I will push the changes soon.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[airflow] branch main updated (cd637c223f -> 3adaa9b34e)

2023-01-16 Thread onikolas
This is an automated email from the ASF dual-hosted git repository.

onikolas pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


from cd637c223f Fix label name for `reauth` field in Docker Connection 
(#28974)
 add 3adaa9b34e AWS system tests selective log purge (#28819)

No new revisions were added by this update.

Summary of changes:
 tests/system/providers/amazon/aws/example_batch.py | 21 +--
 tests/system/providers/amazon/aws/example_glue.py  | 30 +++-
 .../system/providers/amazon/aws/example_lambda.py  | 22 ++--
 .../providers/amazon/aws/example_sagemaker.py  | 27 +++---
 .../amazon/aws/example_sagemaker_endpoint.py   | 32 +
 .../system/providers/amazon/aws/utils/__init__.py  | 42 --
 6 files changed, 103 insertions(+), 71 deletions(-)



[GitHub] [airflow] o-nikolas merged pull request #28819: AWS system tests selective log purge

2023-01-16 Thread GitBox


o-nikolas merged PR #28819:
URL: https://github.com/apache/airflow/pull/28819


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] pierrejeambrun commented on pull request #28638: Fix for LocalKubernetesExecutor scheduler is not serving logs

2023-01-16 Thread GitBox


pierrejeambrun commented on PR #28638:
URL: https://github.com/apache/airflow/pull/28638#issuecomment-1384545009

   Could use a rebase before merging, it's 20+ commits behind master 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #28816: introduce a method to convert dictionaries to boto-style key-value lists

2023-01-16 Thread GitBox


ferruzzi commented on code in PR #28816:
URL: https://github.com/apache/airflow/pull/28816#discussion_r1071576124


##
airflow/providers/amazon/aws/hooks/s3.py:
##
@@ -42,6 +42,7 @@
 from airflow.exceptions import AirflowException
 from airflow.providers.amazon.aws.exceptions import S3HookUriParseFailure
 from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook
+from airflow.providers.amazon.aws.utils.aws_api import format_tags

Review Comment:
   Pretty sure Jarek had mentioned not liking stuff in the __init__ files, so 
I've been trying to avoid it when I can think of a better name.  Sometimes it's 
easier than others.  Naming is hard.
   
   In this case I think aws_api.py is a sound option.  @eladkal - do you 
definitely want it changed or were you just making sure I was good with it? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[airflow] branch constraints-main updated: Updating constraints. Build id:

2023-01-16 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch constraints-main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/constraints-main by this push:
 new 66a94cc546 Updating constraints. Build id:
66a94cc546 is described below

commit 66a94cc546000f646d09fe7bb6e04fe181cf2797
Author: Automated GitHub Actions commit 
AuthorDate: Mon Jan 16 20:55:37 2023 +

Updating constraints. Build id:

This update in constraints is automatically committed by the CI 
'constraints-push' step based on
HEAD of '' in ''
with commit sha .

All tests passed in this build so we determined we can push the updated 
constraints.

See 
https://github.com/apache/airflow/blob/main/README.md#installing-from-pypi for 
details.
---
 constraints-3.10.txt  | 9 -
 constraints-3.7.txt   | 8 
 constraints-3.8.txt   | 8 
 constraints-3.9.txt   | 9 -
 constraints-no-providers-3.10.txt | 2 +-
 constraints-no-providers-3.7.txt  | 2 +-
 constraints-no-providers-3.8.txt  | 2 +-
 constraints-no-providers-3.9.txt  | 2 +-
 constraints-source-providers-3.10.txt | 9 -
 constraints-source-providers-3.7.txt  | 8 
 constraints-source-providers-3.8.txt  | 8 
 constraints-source-providers-3.9.txt  | 9 -
 12 files changed, 36 insertions(+), 40 deletions(-)

diff --git a/constraints-3.10.txt b/constraints-3.10.txt
index 378869f182..191fb8b770 100644
--- a/constraints-3.10.txt
+++ b/constraints-3.10.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2023-01-16T00:27:32Z
+# This constraints file was automatically generated on 2023-01-16T20:54:58Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install uses the HEAD of the branch version for 
'apache-airflow' but installs
 # the providers from PIP-released packages at the moment of the constraint 
generation.
@@ -323,7 +323,6 @@ idna==3.4
 ijson==3.2.0.post0
 imagesize==1.4.1
 importlib-metadata==6.0.0
-importlib-resources==5.10.2
 impyla==0.18.0
 incremental==22.10.0
 inflection==0.5.1
@@ -400,8 +399,8 @@ ntlm-auth==1.5.0
 numpy==1.22.4
 oauthlib==3.2.2
 objsize==0.6.1
-openapi-schema-validator==0.3.4
-openapi-spec-validator==0.5.1
+openapi-schema-validator==0.4.0
+openapi-spec-validator==0.5.2
 opsgenie-sdk==2.1.5
 oracledb==1.2.1
 orjson==3.8.5
@@ -513,7 +512,7 @@ rfc3986==1.5.0
 rich-click==1.6.0
 rich==13.1.0
 rsa==4.9
-ruff==0.0.222
+ruff==0.0.223
 s3transfer==0.6.0
 sarif-om==1.0.4
 sasl==0.3.1
diff --git a/constraints-3.7.txt b/constraints-3.7.txt
index ea660f9657..d6eba00248 100644
--- a/constraints-3.7.txt
+++ b/constraints-3.7.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2023-01-16T00:28:07Z
+# This constraints file was automatically generated on 2023-01-16T20:55:34Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install uses the HEAD of the branch version for 
'apache-airflow' but installs
 # the providers from PIP-released packages at the moment of the constraint 
generation.
@@ -400,8 +400,8 @@ ntlm-auth==1.5.0
 numpy==1.21.1
 oauthlib==3.2.2
 objsize==0.6.1
-openapi-schema-validator==0.3.4
-openapi-spec-validator==0.5.1
+openapi-schema-validator==0.4.0
+openapi-spec-validator==0.5.2
 opsgenie-sdk==2.1.5
 oracledb==1.2.1
 orjson==3.8.5
@@ -513,7 +513,7 @@ rfc3986==1.5.0
 rich-click==1.6.0
 rich==13.1.0
 rsa==4.9
-ruff==0.0.222
+ruff==0.0.223
 s3transfer==0.6.0
 sarif-om==1.0.4
 sasl==0.3.1
diff --git a/constraints-3.8.txt b/constraints-3.8.txt
index 55d1ca9c01..a6ec001096 100644
--- a/constraints-3.8.txt
+++ b/constraints-3.8.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2023-01-16T00:27:59Z
+# This constraints file was automatically generated on 2023-01-16T20:55:28Z
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install uses the HEAD of the branch version for 
'apache-airflow' but installs
 # the providers from PIP-released packages at the moment of the constraint 
generation.
@@ -401,8 +401,8 @@ ntlm-auth==1.5.0
 numpy==1.22.4
 oauthlib==3.2.2
 objsize==0.6.1
-openapi-schema-validator==0.3.4
-openapi-spec-validator==0.5.1
+openapi-schema-validator==0.4.0
+openapi-spec-validator==0.5.2
 opsgenie-sdk==2.1.5
 oracledb==1.2.1
 orjson==3.8.5
@@ -515,7 +515,7 @@ rfc3986==1.5.0
 rich-click==1.6.0
 rich==13.1.0
 rsa==4.9
-ruff==0.0.222
+ruff==0.0.223
 s3transfer==0.6.0
 sarif-om==1.0.4
 sasl==0.3.1
diff --git a/constraints-3.9.txt b/constraints-3.9.txt
index 4de6daf953..e2bad7f3e6 100644
--- a/constraints-3.9.txt
+++ b/constraints-3.9.txt
@@ -1,5 +1,5 @@
 #
-# This constraints file was automatically generated on 2023-01-16T00:27:57Z
+# This

[GitHub] [airflow] ferruzzi commented on a diff in pull request #28816: introduce a method to convert dictionaries to boto-style key-value lists

2023-01-16 Thread GitBox


ferruzzi commented on code in PR #28816:
URL: https://github.com/apache/airflow/pull/28816#discussion_r1071472110


##
airflow/providers/amazon/aws/hooks/s3.py:
##
@@ -42,6 +42,7 @@
 from airflow.exceptions import AirflowException
 from airflow.providers.amazon.aws.exceptions import S3HookUriParseFailure
 from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook
+from airflow.providers.amazon.aws.utils.aws_api import format_tags

Review Comment:
   I'm not sure what else you;'d name the module here that would make more 
sense?  It could be `boto_api` or `boto_utils` maybe but I'm not sure either of 
those are any better, really?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] maggesssss commented on a diff in pull request #28964: Add transfer operator S3 to (generic) SQL

2023-01-16 Thread GitBox


magges commented on code in PR #28964:
URL: https://github.com/apache/airflow/pull/28964#discussion_r1071573009


##
airflow/providers/amazon/aws/transfers/s3_to_sql.py:
##
@@ -0,0 +1,158 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import os
+from typing import TYPE_CHECKING, Any, Sequence
+
+from airflow.exceptions import AirflowException
+from airflow.hooks.base import BaseHook
+from airflow.models import BaseOperator
+from airflow.providers.amazon.aws.hooks.s3 import S3Hook
+from airflow.providers.common.sql.hooks.sql import DbApiHook
+
+if TYPE_CHECKING:
+from airflow.utils.context import Context
+
+from typing_extensions import Literal
+
+try:
+import csv as csv
+except ImportError as e:
+from airflow.exceptions import AirflowOptionalProviderFeatureException
+
+raise AirflowOptionalProviderFeatureException from e
+
+
+class S3ToSqlOperator(BaseOperator):
+"""
+Loads Data from S3 into a SQL Database.
+Data should be readable as CSV.
+
+This operator downloads a file from an S3, reads it via `csv.reader`
+and inserts the data into a SQL database using `insert_rows` method.
+All SQL hooks are supported, as long as it is of type DbApiHook

Review Comment:
   @vincbeck I agree. I like the Idea of using a `parser` parameter. Do you 
think it makes sense to use a default parser?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #28819: AWS system tests selective log purge

2023-01-16 Thread GitBox


ferruzzi commented on code in PR #28819:
URL: https://github.com/apache/airflow/pull/28819#discussion_r1071571584


##
tests/system/providers/amazon/aws/utils/__init__.py:
##
@@ -249,7 +251,43 @@ def set_env_id() -> str:
 return env_id
 
 
-def purge_logs(
+def all_tasks_passed(ti) -> bool:
+task_runs = ti.get_dagrun().get_task_instances()
+return all([_task.state != State.FAILED for _task in task_runs])
+
+
+@task(trigger_rule=TriggerRule.ALL_DONE)
+def prune_logs(
+logs: list[tuple[str, str | None]],
+force_delete: bool = False,
+retry: bool = False,
+retry_times: int = 3,
+ti=None,
+):
+"""
+If all tasks in this dagrun have succeeded, then delete the associated 
logs.
+Otherwise, append the logs with a retention policy.  This allows the logs
+to be used for troubleshooting but assures they won't build up 
indefinitely.
+
+:param logs: A list of log_group/stream_prefix tuples to delete.
+:param force_delete: Whether to check log streams within the log group 
before
+removal. If True, removes the log group and all its log streams inside 
it.
+:param retry: Whether to retry if the log group/stream was not found. In 
some
+cases, the log group/stream is created seconds after the main resource 
has
+been created. By default, it retries for 3 times with a 5s waiting 
period.
+:param retry_times: Number of retries.
+:param ti: Used to check the status of the tasks. This gets pulled from the
+DAG's context and does not need to be passed manually.
+"""
+if all_tasks_passed(ti):
+_purge_logs(logs, force_delete, retry, retry_times)
+else:
+client: BaseClient = boto3.client("logs")
+for group, _ in logs:
+client.put_retention_policy(logGroupName=group, retentionInDays=30)

Review Comment:
   This is a pretty small blast-radius and a very easy thing to change later.  
Let's merge it with 30 and if Ash (or anyone else, for that matter) has any 
strong feelings on it, it's a super easy change to make.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] o-nikolas commented on pull request #28638: Fix for LocalKubernetesExecutor scheduler is not serving logs

2023-01-16 Thread GitBox


o-nikolas commented on PR #28638:
URL: https://github.com/apache/airflow/pull/28638#issuecomment-1384527196

   Anyone have some time to give this a second review/approval? Would be nice 
to get this merged for @snjypl
   Maybe @potiuk, @dstandish or @pierrejeambrun?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] o-nikolas commented on pull request #28161: AIP-51 - Executor Coupling in Logging

2023-01-16 Thread GitBox


o-nikolas commented on PR #28161:
URL: https://github.com/apache/airflow/pull/28161#issuecomment-1384526101

   Anyone have some time to give this a second review/approval? Would be nice 
to get this merged for @snjypl
   Maybe @potiuk, @eladkal or @pierrejeambrun?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] o-nikolas commented on a diff in pull request #28925: Add log for AWS Glue Job Console URL

2023-01-16 Thread GitBox


o-nikolas commented on code in PR #28925:
URL: https://github.com/apache/airflow/pull/28925#discussion_r1071561782


##
airflow/providers/amazon/aws/operators/glue.py:
##
@@ -144,6 +149,20 @@ def execute(self, context: Context):
 self.wait_for_completion,
 )
 glue_job_run = glue_job.initialize_job(self.script_args, 
self.run_job_kwargs)
+glue_job_run_url = (
+f"{BASE_AWS_CONSOLE_LINK}/gluestudio/home?"
++ 
f"region={glue_job.conn_region_name}#/job/{urllib.parse.quote(self.job_name, 
safe='')}/run/"
++ glue_job_run["JobRunId"]
+)

Review Comment:
   Thanks for the PR!
   
   Could you not reuse the format string from the Link class so as to not 
duplicate it here? The two could easily get out of sync. Something like (NOTE: 
untested):
   
   ```suggestion
   glue_job_run_url = GlueJobRunDetailsLink.format_str.format(
   region_name=glue_job.conn_region_name,
   job_name=urllib.parse.quote(self.job_name, safe=''),
   job_run_id=glue_job_run["JobRunId"])
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow-site] potiuk commented on pull request #623: Remove Bash from the process of building the site

2023-01-16 Thread GitBox


potiuk commented on PR #623:
URL: https://github.com/apache/airflow-site/pull/623#issuecomment-1384518413

   There are a number of problems with dependencies the current approach has 
(and it caused us some problems in the recent past) so I think this is 
somethign we have to do anyway soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow-site] potiuk commented on pull request #623: Remove Bash from the process of building the site

2023-01-16 Thread GitBox


potiuk commented on PR #623:
URL: https://github.com/apache/airflow-site/pull/623#issuecomment-1384517156

   H,,, I honestly think we should not invest too much in it. We are discussing 
about switching to Pelikan which is going to change the process completely 
anyway.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #28938: Status of testing Providers that were prepared on January 14, 2023

2023-01-16 Thread GitBox


eladkal commented on issue #28938:
URL: https://github.com/apache/airflow/issues/28938#issuecomment-1384512167

   Docker provider will be excluded from this release and I will cut RC2 for it 
after this wave


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] vincbeck commented on a diff in pull request #28900: [WIP] Convert DagFileProcessor.execute_callbacks to Internal API

2023-01-16 Thread GitBox


vincbeck commented on code in PR #28900:
URL: https://github.com/apache/airflow/pull/28900#discussion_r1071556627


##
airflow/api_internal/actions/dag.py:
##
@@ -0,0 +1,42 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+from sqlalchemy import exc
+from sqlalchemy.orm import Session
+
+from airflow.api_internal.internal_api_call import internal_api_call
+from airflow.exceptions import TaskNotFound
+from airflow.models import Operator
+from airflow.utils.session import NEW_SESSION, provide_session
+
+
+class InternalApiDagActions:
+@staticmethod
+@internal_api_call
+@provide_session
+def get_serialized_dag(dag_id: str, task_id: str, session: Session = 
NEW_SESSION) -> Operator | None:

Review Comment:
   We can, I was just scared of introducing some cyclic dependencies but we can 
deal with them if they show up I guess



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] o-nikolas commented on a diff in pull request #28934: Remove hard-coded executor-database coupling

2023-01-16 Thread GitBox


o-nikolas commented on code in PR #28934:
URL: https://github.com/apache/airflow/pull/28934#discussion_r1071551207


##
airflow/cli/commands/standalone_command.py:
##
@@ -159,14 +159,28 @@ def calculate_env(self):
 # Make sure we're using a local executor flavour
 executor_class, _ = ExecutorLoader.import_default_executor_cls()
 if not executor_class.is_local:
-if "sqlite" in conf.get("database", "sql_alchemy_conn"):
-self.print_output("standalone", "Forcing executor to 
SequentialExecutor")
-env["AIRFLOW__CORE__EXECUTOR"] = 
executor_constants.SEQUENTIAL_EXECUTOR
-else:
-self.print_output("standalone", "Forcing executor to 
LocalExecutor")
-env["AIRFLOW__CORE__EXECUTOR"] = 
executor_constants.LOCAL_EXECUTOR
+env["AIRFLOW__CORE__EXECUTOR"] = self.get_local_executor(
+database=conf.get("database", "sql_alchemy_conn")
+)
+self.print_output("standalone", f"Forcing executor to 
{env['AIRFLOW__CORE__EXECUTOR']}")

Review Comment:
   The more I reflect on this, the more I think the original code is actually 
fine. This logic is only executed if the executor selected by the user (or 
provided by the user via plugin) is not local, in which case we select a 
working default for them. However, they are completely able to implement/select 
a compatible local executor themselves if they wish to be opinionated about 
which executor to use.
   
   I think this logic and the helper method below actually might complicate 
things a bit more than it helps. WDYT?
   
   Also CC: @pierrejeambrun  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk closed issue #28975: Additional info about Segmentation Violation in LocalTaskJob (#27381): @Taragolis

2023-01-16 Thread GitBox


potiuk closed issue #28975: Additional info about Segmentation Violation in 
LocalTaskJob (#27381): @Taragolis
URL: https://github.com/apache/airflow/issues/28975


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #28975: Additional info about Segmentation Violation in LocalTaskJob (#27381): @Taragolis

2023-01-16 Thread GitBox


potiuk commented on issue #28975:
URL: https://github.com/apache/airflow/issues/28975#issuecomment-1384495920

   Wrong click I guess.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #28941: Tasks are in the queued status after restarting the redis container

2023-01-16 Thread GitBox


potiuk commented on issue #28941:
URL: https://github.com/apache/airflow/issues/28941#issuecomment-1384493990

   Then it nees to be looked up by somoene who knows celery and redis more than 
I do.  I do not know redis that much - I guess there are delays in processing 
and saving the stored data. If you kill redis abruptly by kill -9 for example , 
then (obviously like any other software) it might loose some data that it keeps 
in memory and absolutely nothing can be done about it. There will be hangning 
tasks in this case which you will have to clear. That's the usual recovery 
mechanism from catastrophic failures. No system in the world can be made 
resilient to it really unless you do a lot of operational overhead and 
redundancy (and if you would like to do that, then it is more of a deployment 
issue).
   
   I think you should make sure that you are stopping redis in the "gentle" way 
that gives it a chance to flush everything to the disk and make sure that is 
actually restoring it from there
   
   Please then open a new issue with Helm chart and ideally showing all the 
logs (incuding debug logs showing redis storing and restoring the data - to 
make sure that it actually happens). If you can reproduce that knowng tha 
tredis is storing/restoring the queue, then I think that's something that 
somone who is a celery expert should take a look at so it's worth opening an 
issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] o-nikolas commented on a diff in pull request #28934: Remove hard-coded executor-database coupling

2023-01-16 Thread GitBox


o-nikolas commented on code in PR #28934:
URL: https://github.com/apache/airflow/pull/28934#discussion_r1071544358


##
airflow/sensors/base.py:
##
@@ -257,11 +258,14 @@ def _get_next_poke_interval(
 
 def prepare_for_execution(self) -> BaseOperator:
 task = super().prepare_for_execution()
+
 # Sensors in `poke` mode can block execution of DAGs when running
 # with single process executor, thus we change the mode to`reschedule`
 # to allow parallel task being scheduled and executed
-if conf.get("core", "executor") == "DebugExecutor":
-self.log.warning("DebugExecutor changes sensor mode to 
'reschedule'.")
+executor_name = conf.get("core", "executor")
+executor = ExecutorLoader.load_executor(executor_name)

Review Comment:
   You can just use `import_default_executor_cls` here I reckon



##
airflow/cli/commands/standalone_command.py:
##
@@ -159,14 +159,28 @@ def calculate_env(self):
 # Make sure we're using a local executor flavour
 executor_class, _ = ExecutorLoader.import_default_executor_cls()
 if not executor_class.is_local:
-if "sqlite" in conf.get("database", "sql_alchemy_conn"):
-self.print_output("standalone", "Forcing executor to 
SequentialExecutor")
-env["AIRFLOW__CORE__EXECUTOR"] = 
executor_constants.SEQUENTIAL_EXECUTOR
-else:
-self.print_output("standalone", "Forcing executor to 
LocalExecutor")
-env["AIRFLOW__CORE__EXECUTOR"] = 
executor_constants.LOCAL_EXECUTOR
+env["AIRFLOW__CORE__EXECUTOR"] = self.get_local_executor(
+database=conf.get("database", "sql_alchemy_conn")
+)
+self.print_output("standalone", f"Forcing executor to 
{env['AIRFLOW__CORE__EXECUTOR']}")
 return env
 
+@classmethod
+def get_local_executor(cls, database: str) -> str:
+"""
+Get local executor for standalone command
+for sqlite we need SEQUENTIAL_EXECUTOR otherwise LOCAL_EXECUTOR.
+"""
+try:
+return [
+executor_name
+for executor_name in ExecutorLoader.executors.keys()
+if executor_name != "DaskExecutor"

Review Comment:
   This looks like executor coupling, why are we excluding the Dask Executor 
here specifically? Could there ever be other executors that would need 
excluding in the same way?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[airflow] branch main updated (dc3a3c7c52 -> cd637c223f)

2023-01-16 Thread potiuk
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


from dc3a3c7c52 Add DataprocCancelOperationOperator (#28456)
 add cd637c223f Fix label name for `reauth` field in Docker Connection 
(#28974)

No new revisions were added by this update.

Summary of changes:
 airflow/providers/docker/hooks/docker.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)



[GitHub] [airflow] potiuk merged pull request #28974: Fix label name for `reauth` field in Docker Connection

2023-01-16 Thread GitBox


potiuk merged PR #28974:
URL: https://github.com/apache/airflow/pull/28974


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[airflow] branch main updated (d24527bf75 -> dc3a3c7c52)

2023-01-16 Thread potiuk
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


from d24527bf75 Add documentation about cli `add connection` and AWS 
connection URI (#28852)
 add dc3a3c7c52 Add DataprocCancelOperationOperator (#28456)

No new revisions were added by this update.

Summary of changes:
 airflow/providers/google/cloud/hooks/dataproc.py   |  4 ++
 .../providers/google/cloud/operators/dataproc.py   | 73 +++--
 airflow/providers/google/cloud/sensors/dataproc.py | 76 +-
 .../operators/cloud/dataproc.rst   | 22 +++
 .../google/cloud/sensors/test_dataproc.py  | 73 -
 .../cloud/dataproc/example_dataproc_batch.py   | 60 -
 6 files changed, 298 insertions(+), 10 deletions(-)



[GitHub] [airflow] potiuk merged pull request #28456: Add DataprocCancelOperationOperator

2023-01-16 Thread GitBox


potiuk merged PR #28456:
URL: https://github.com/apache/airflow/pull/28456


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #28947: Status of testing of Apache Airflow 2.5.1rc1

2023-01-16 Thread GitBox


potiuk commented on issue #28947:
URL: https://github.com/apache/airflow/issues/28947#issuecomment-1384439268

   All Good (I left one change for @webster-chainalys to confirm :)
   
   * https://github.com/apache/airflow/pull/27720 is difficult to trigger but 
this is just exception handling and I checked it is present in the sources and 
it's really and edge case
   * https://github.com/apache/airflow/pull/28011 works fine - tested it with 
REALLY BAD zip file :)
   * https://github.com/apache/airflow/pull/28090 - doc change only
   * https://github.com/apache/airflow/pull/28094 - doc/docker-compose change 
only (I checked it is updated)
   * https://github.com/apache/airflow/pull/28138 - works. no warning generated
   * https://github.com/apache/airflow/pull/28170 - just a structure change for 
tests
   * https://github.com/apache/airflow/pull/28205 - just a structure change for 
tests
   * https://github.com/apache/airflow/pull/28207 - just a structure change for 
tests
   * https://github.com/apache/airflow/pull/28247 - test fix only
   * https://github.com/apache/airflow/pull/28283 - I will defer to 
@webster-chainalysis to test if it works for them
   * https://github.com/apache/airflow/pull/28443 - ARM image has plyvel
   * https://github.com/apache/airflow/pull/28498 - test only change
   * https://github.com/apache/airflow/pull/28514 
https://github.com/apache/airflow/pull/28518 
https://github.com/apache/airflow/pull/28519  - CI only changes
   * https://github.com/apache/airflow/pull/28537 - doc change, checked that 
empty_plugin is there.
   * https://github.com/apache/airflow/pull/28627 - internal refactor only. 
Seems to detect architecture properly
   * https://github.com/apache/airflow/pull/28672 - proper help printed
   * https://github.com/apache/airflow/pull/28725 - proper metadata in wheel: 
Requires-Dist: sqlalchemy (<2.0,>=1.4)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] Taragolis commented on issue #28975: Additional info about Segmentation Violation in LocalTaskJob (#27381): @Taragolis

2023-01-16 Thread GitBox


Taragolis commented on issue #28975:
URL: https://github.com/apache/airflow/issues/28975#issuecomment-1384396913

   Oh.. It is probably miss click to "Convert to issue" on list of tasks. Isn't 
it? @potiuk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] vandonr-amz commented on issue #28938: Status of testing Providers that were prepared on January 14, 2023

2023-01-16 Thread GitBox


vandonr-amz commented on issue #28938:
URL: https://github.com/apache/airflow/issues/28938#issuecomment-1384396111

   my Sagemaker changes were covered by a system test that's still green, so 
good to go :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] Taragolis commented on issue #28975: Additional info about Segmentation Violation in LocalTaskJob (#27381): @Taragolis

2023-01-16 Thread GitBox


Taragolis commented on issue #28975:
URL: https://github.com/apache/airflow/issues/28975#issuecomment-1384394131

   👀 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] johannaojeling commented on pull request #28764: Add support for running a Beam Go pipeline with an executable binary

2023-01-16 Thread GitBox


johannaojeling commented on PR #28764:
URL: https://github.com/apache/airflow/pull/28764#issuecomment-1384389915

   @uranusjr can this be merged?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] Taragolis commented on a diff in pull request #28816: introduce a method to convert dictionaries to boto-style key-value lists

2023-01-16 Thread GitBox


Taragolis commented on code in PR #28816:
URL: https://github.com/apache/airflow/pull/28816#discussion_r1071479962


##
airflow/providers/amazon/aws/hooks/s3.py:
##
@@ -42,6 +42,7 @@
 from airflow.exceptions import AirflowException
 from airflow.providers.amazon.aws.exceptions import S3HookUriParseFailure
 from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook
+from airflow.providers.amazon.aws.utils.aws_api import format_tags

Review Comment:
   Maybe a stupid idea but what if we place it in 
`airflow/providers/amazon/aws/utils/__init__.py` or it for generic stuff only?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] vandonr-amz commented on a diff in pull request #28816: introduce a method to convert dictionaries to boto-style key-value lists

2023-01-16 Thread GitBox


vandonr-amz commented on code in PR #28816:
URL: https://github.com/apache/airflow/pull/28816#discussion_r1071478212


##
airflow/providers/amazon/aws/hooks/s3.py:
##
@@ -42,6 +42,7 @@
 from airflow.exceptions import AirflowException
 from airflow.providers.amazon.aws.exceptions import S3HookUriParseFailure
 from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook
+from airflow.providers.amazon.aws.utils.aws_api import format_tags

Review Comment:
   I named it boto_ something in my first commit, but then as Taragolis 
mentioned:
   
   > this is not boto3 behaviour this how what AWS API expected this parameters
   
   I changed it to aws



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #28816: introduce a method to convert dictionaries to boto-style key-value lists

2023-01-16 Thread GitBox


ferruzzi commented on code in PR #28816:
URL: https://github.com/apache/airflow/pull/28816#discussion_r1071472110


##
airflow/providers/amazon/aws/hooks/s3.py:
##
@@ -42,6 +42,7 @@
 from airflow.exceptions import AirflowException
 from airflow.providers.amazon.aws.exceptions import S3HookUriParseFailure
 from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook
+from airflow.providers.amazon.aws.utils.aws_api import format_tags

Review Comment:
   I'm not sure what else you;'d name the module here that would make more 
sense?  It cold be boto_api or boto_utils maybe but I'm not sure either of 
those are any better, really?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #27328: SFTPOperator throws object of type 'PlainXComArg' has no len() when using with Taskflow API

2023-01-16 Thread GitBox


potiuk commented on issue #27328:
URL: https://github.com/apache/airflow/issues/27328#issuecomment-1384371084

   Woul you like to fix it @jtommi ? 
   
   And @uranusjr - we both approved the change. I am starting to think that we 
should attempt to do do some automated prevention of similar issues - this is 
all too easy (and seemingly obvious) to perform such valiations and conversions 
in the constructor rather than in execute methods. And apparently it is easy to 
get past the aproval of both of us, so possibly that's a sign we should summon 
our CI to prevent such things. Though I am not sure yet how to do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #27328: SFTPOperator throws object of type 'PlainXComArg' has no len() when using with Taskflow API

2023-01-16 Thread GitBox


potiuk commented on issue #27328:
URL: https://github.com/apache/airflow/issues/27328#issuecomment-1384367659

   Yep. It looks like this change https://github.com/apache/airflow/pull/2 
broke it (the number indicates it's a bit hell-ish change). the checks are in 
constructor but should be in the execute() method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] Romsik788 commented on issue #28941: Tasks are in the queued status after restarting the redis container

2023-01-16 Thread GitBox


Romsik788 commented on issue #28941:
URL: https://github.com/apache/airflow/issues/28941#issuecomment-1384367636

   > But it's up to you to make it robust and production-ready and resilient to 
any kind of failures, I am afraid.
   Ok, I have deployed Apache Airflow on kubernetes 1.22 using the official 
[helm chart](https://artifacthub.io/packages/helm/apache-airflow/airflow/1.7.0) 
with the following settings:
   ```
   elasticsearch:
 enabled: true
   workers:
 persistence:
   storageClassName: oci-bv
 podAnnotations:
   log_format: json
   redis:
 persistence:
   storageClassName: oci-bv
   images:
 useDefaultImageForMigration: true
   createUserJob:
 useHelmHooks: false
   migrateDatabaseJob:
 useHelmHooks: false
   ```
   But still I get the same result.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] jtommi opened a new issue, #27328: SFTPOperator throws object of type 'PlainXComArg' has no len() when using with Taskflow API

2023-01-16 Thread GitBox


jtommi opened a new issue, #27328:
URL: https://github.com/apache/airflow/issues/27328

   ### Apache Airflow Provider(s)
   
   sftp
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-sftp==4.1.0
   
   ### Apache Airflow version
   
   2.4.2 Python 3.10
   
   ### Operating System
   
   Debian 11 (Official docker image)
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   Base image is apache/airflow:2.4.2-python3.10
   
   
   ### What happened
   
   When combining Taskflow API and SFTPOperator, it throws an exception that 
didn't happen with apache-airflow-providers-sftp 4.0.0
   
   ### What you think should happen instead
   
   The DAG should work as expected
   
   ### How to reproduce
   
   ```python
   import pendulum
   
   from airflow import DAG
   from airflow.decorators import task
   from airflow.providers.sftp.operators.sftp import SFTPOperator
   
   with DAG(
   "example_sftp",
   schedule="@once",
   start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
   catchup=False,
   tags=["example"],
   ) as dag:
   
   @task
   def get_file_path():
   return "test.csv"
   
   local_filepath = get_file_path()
   
   upload = SFTPOperator(
   task_id=f"upload_file_to_sftp",
   ssh_conn_id="sftp_connection",
   local_filepath=local_filepath,
   remote_filepath="test.csv",
   )
   ```
   
   ### Anything else
   
   ```logs
   [2022-10-27T15:21:38.106+]` {logging_mixin.py:120} INFO - 
[2022-10-27T15:21:38.102+] {dagbag.py:342} ERROR - Failed to import: 
/opt/airflow/dags/test.py
   Traceback (most recent call last):
 File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/dagbag.py", 
line 338, in parse
   loader.exec_module(new_module)
 File "", line 883, in exec_module
 File "", line 241, in 
_call_with_frames_removed
 File "/opt/airflow/dags/test.py", line 21, in 
   upload = SFTPOperator(
 File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/baseoperator.py",
 line 408, in apply_defaults
   result = func(self, **kwargs, default_args=default_args)
 File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/sftp/operators/sftp.py",
 line 116, in __init__
   if len(self.local_filepath) != len(self.remote_filepath):
   TypeError: object of type 'PlainXComArg' has no len()
   ```
   
   It looks like the offending code was introduced in commit 
5f073e38dd46217b64dbc16d7b1055d89e8c3459
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[airflow] branch main updated: Add documentation about cli `add connection` and AWS connection URI (#28852)

2023-01-16 Thread potiuk
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
 new d24527bf75 Add documentation about cli `add connection` and AWS 
connection URI (#28852)
d24527bf75 is described below

commit d24527bf759c80dd22684a0fb51c283bbafb9298
Author: Arkadiusz Rudny <93520526+aru-tracku...@users.noreply.github.com>
AuthorDate: Mon Jan 16 18:26:15 2023 +0100

Add documentation about cli `add connection` and AWS connection URI (#28852)
---
 .../connections/aws.rst  | 12 
 .../logging/s3-task-handler.rst  | 20 ++--
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/docs/apache-airflow-providers-amazon/connections/aws.rst 
b/docs/apache-airflow-providers-amazon/connections/aws.rst
index 14956d1de1..e885887980 100644
--- a/docs/apache-airflow-providers-amazon/connections/aws.rst
+++ b/docs/apache-airflow-providers-amazon/connections/aws.rst
@@ -146,6 +146,18 @@ Snippet to create Connection and convert to URI
 os.environ[env_key] = conn_uri
 print(conn.test_connection())
 
+
+  .. warning:: When using the Airflow CLI, a ``@`` may need to be added when:
+
+- login
+- password
+- host
+- port
+
+are not given, see example below. This is a known airflow limitation.
+
+``airflow connections add aws_conn --conn-uri 
aws://@/?region_name=eu-west-1``
+
 Using instance profile
 ^^
 
diff --git a/docs/apache-airflow-providers-amazon/logging/s3-task-handler.rst 
b/docs/apache-airflow-providers-amazon/logging/s3-task-handler.rst
index 8352432c22..016c2d5165 100644
--- a/docs/apache-airflow-providers-amazon/logging/s3-task-handler.rst
+++ b/docs/apache-airflow-providers-amazon/logging/s3-task-handler.rst
@@ -113,17 +113,25 @@ We are  using the existing ``serviceAccount`` hence 
``create: false`` with exist
 delete_worker_pods: 'False'
 encrypt_s3_logs: 'True'
 
-Step3: Create Amazon Web Services connection in Airflow Web UI
+Step3: Create Amazon Web Services connection
 ~~
 With the above configurations, Webserver and Worker Pods can access Amazon S3 
bucket and write logs without using any Access Key and Secret Key or Instance 
profile credentials.
 
-The final step to create connections under Airflow UI before executing the 
DAGs.
+- Using Airflow Web UI
 
-* Login to Airflow Web UI with ``admin`` credentials and Navigate to ``Admin 
-> Connections``
-* Create connection for ``Amazon Web Services`` and select the 
options(Connection ID and Connection Type) as shown in the image.
-* Select the correct region where S3 bucket is created in ``Extra`` text box.
+  The final step to create connections under Airflow UI before executing the 
DAGs.
 
-.. image:: /img/aws-base-conn-airflow.png
+  * Login to Airflow Web UI with ``admin`` credentials and Navigate to ``Admin 
-> Connections``
+  * Create connection for ``Amazon Web Services`` and select the options 
(Connection ID and Connection Type) as shown in the image.
+  * Select the correct region where S3 bucket is created in ``Extra`` text box.
+
+  .. image:: /img/aws-base-conn-airflow.png
+
+- Using Airflow CLI
+
+  ``airflow connections add aws_conn --conn-uri aws://@/?egion_name=eu-west-1``
+
+  Note that ``@`` used in ``-conn-uri`` parameter usually separates password 
and host but in this case it complies with uri validator used.
 
 Step4: Verify the logs
 ~~



[GitHub] [airflow] potiuk merged pull request #28852: Add documentation about cli `add connection` and AWS connection URI

2023-01-16 Thread GitBox


potiuk merged PR #28852:
URL: https://github.com/apache/airflow/pull/28852


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #28852: Add documentation about cli `add connection` and AWS connection URI

2023-01-16 Thread GitBox


boring-cyborg[bot] commented on PR #28852:
URL: https://github.com/apache/airflow/pull/28852#issuecomment-1384362745

   Awesome work, congrats on your first merged pull request!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] rafalh commented on a diff in pull request #28970: Fix support for macros with dots in DataProcJobBuilder

2023-01-16 Thread GitBox


rafalh commented on code in PR #28970:
URL: https://github.com/apache/airflow/pull/28970#discussion_r1071454703


##
tests/providers/google/cloud/hooks/test_dataproc.py:
##
@@ -1006,17 +1006,9 @@ def test_set_python_main(self):
 self.builder.set_python_main(main)
 assert main == 
self.builder.job["job"][self.job_type]["main_python_file_uri"]
 
-@pytest.mark.parametrize(
-"job_name",
-[
-pytest.param("name", id="simple"),
-pytest.param("name_with_dash", id="name with underscores"),
-pytest.param("group.name", id="name with dot"),
-pytest.param("group.name_with_dash", id="name with dot and 
underscores"),
-],
-)

Review Comment:
   They were added with sanitization (dot substitution) in 
`DataProcJobBuilder`. Now that I removed sanitization from that class all test 
cases with dots would fail. I just reverted this test to the state before the 
changes in `DataProcJobBuilder` and added a test that checks if proper job_id 
is passed from the operator instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[airflow] branch main updated: Mark license block in doc as text (#28965)

2023-01-16 Thread potiuk
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
 new 4fff0555b5 Mark license block in doc as text (#28965)
4fff0555b5 is described below

commit 4fff0555b505f7a51efb2bbd108378780eb850d9
Author: Tzu-ping Chung 
AuthorDate: Tue Jan 17 01:07:30 2023 +0800

Mark license block in doc as text (#28965)

Sphinx was helpfully (but incorrectly) adding Python syntax highlighting
to the block. This marks the block explicitly as pure text to avoid
that.
---
 docs/apache-airflow/license.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/apache-airflow/license.rst b/docs/apache-airflow/license.rst
index ceec3e78b6..e5bb13e36b 100644
--- a/docs/apache-airflow/license.rst
+++ b/docs/apache-airflow/license.rst
@@ -23,7 +23,7 @@ License
 .. image:: img/apache.jpg
 :width: 150
 
-::
+.. code-block:: text
 
   Apache License
 Version 2.0, January 2004



[GitHub] [airflow] potiuk merged pull request #28965: Mark license block in doc as text

2023-01-16 Thread GitBox


potiuk merged PR #28965:
URL: https://github.com/apache/airflow/pull/28965


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #28965: Mark license block in doc as text

2023-01-16 Thread GitBox


potiuk commented on PR #28965:
URL: https://github.com/apache/airflow/pull/28965#issuecomment-1384337521

   I wish there was `::code-block:: licence`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #28780: Fix hot loop with a burst of log messages (#28778)

2023-01-16 Thread GitBox


potiuk merged PR #28780:
URL: https://github.com/apache/airflow/pull/28780


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk closed issue #28778: Script "clean-logs.sh" has an unexpected burst behavior

2023-01-16 Thread GitBox


potiuk closed issue #28778: Script "clean-logs.sh" has an unexpected burst 
behavior
URL: https://github.com/apache/airflow/issues/28778


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #28780: Fix hot loop with a burst of log messages (#28778)

2023-01-16 Thread GitBox


boring-cyborg[bot] commented on PR #28780:
URL: https://github.com/apache/airflow/pull/28780#issuecomment-1384335195

   Awesome work, congrats on your first merged pull request!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[airflow] branch main updated: Fix hot loop with a burst of log messages (#28778) (#28780)

2023-01-16 Thread potiuk
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
 new 4b1a36f833 Fix hot loop with a burst of log messages (#28778) (#28780)
4b1a36f833 is described below

commit 4b1a36f833b77d3f0bec78958d1fb9f360b7b11b
Author: Ivan Reche 
AuthorDate: Mon Jan 16 14:05:34 2023 -0300

Fix hot loop with a burst of log messages (#28778) (#28780)

* Fix hot loop with a burst of log messages (#28778)

* Add missing inline in Dockerfile
---
 Dockerfile   | 3 ++-
 scripts/docker/clean-logs.sh | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/Dockerfile b/Dockerfile
index 42b9db931f..74634887a9 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1025,7 +1025,8 @@ while true; do
 xargs -0 rm -f
 
   seconds=$(( $(date -u +%s) % EVERY))
-  (( seconds < 1 )) || sleep $((EVERY - seconds))
+  (( seconds < 1 )) || sleep $((EVERY - seconds - 1))
+  sleep 1
 done
 EOF
 
diff --git a/scripts/docker/clean-logs.sh b/scripts/docker/clean-logs.sh
index 57b6e8b605..0c775b14e6 100644
--- a/scripts/docker/clean-logs.sh
+++ b/scripts/docker/clean-logs.sh
@@ -36,5 +36,6 @@ while true; do
 xargs -0 rm -f
 
   seconds=$(( $(date -u +%s) % EVERY))
-  (( seconds < 1 )) || sleep $((EVERY - seconds))
+  (( seconds < 1 )) || sleep $((EVERY - seconds - 1))
+  sleep 1
 done



[GitHub] [airflow] csm10495 commented on pull request #28808: Allow setting the name for the base container within K8s Pod Operator

2023-01-16 Thread GitBox


csm10495 commented on PR #28808:
URL: https://github.com/apache/airflow/pull/28808#issuecomment-1384319670

   @eladkal can you check again?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >