[GitHub] [airflow] boring-cyborg[bot] commented on issue #29456: How to automaticlly rerun the zombie tasks

2023-02-09 Thread via GitHub


boring-cyborg[bot] commented on issue #29456:
URL: https://github.com/apache/airflow/issues/29456#issuecomment-1425334083

   Thanks for opening your first issue here! Be sure to follow the issue 
template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] linhgao opened a new issue, #29456: How to automaticlly rerun the zombie tasks

2023-02-09 Thread via GitHub


linhgao opened a new issue, #29456:
URL: https://github.com/apache/airflow/issues/29456

   ### Description
   
   My airflow cluster is deployed on the k8s cluster, when the worker container 
is restarted, the long-run jobs will failed. 
   And now I only rerun manually, if I don't rerun in time, it will affect the 
normal operation of workflow.
   
   So, I want to achieve the feature that the airflow scheduler can 
automaticlly rerun the zombie tasks.
   
   ### Use case/motivation
   
   When the scheduler find the zombie tasks, we can choose by the airflow 
configuration that mark failed these tasks or automaticlly add these task into 
queue.
   
   ### Related issues
   
   no
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] wolfier opened a new issue, #29455: Not all zombies are caused by missed heartbeat

2023-02-09 Thread via GitHub


wolfier opened a new issue, #29455:
URL: https://github.com/apache/airflow/issues/29455

   ### Apache Airflow version
   
   2.5.1
   
   ### What happened
   
   When the scheduler [finds 
zombies](https://github.com/apache/airflow/blob/2.5.1/airflow/jobs/scheduler_job.py#L1541-L1542),
 a log emitted to indicate how many jobs was found without heartbeats.
   
   ```
   [2023-01-12T03:26:33.347+] {scheduler_job.py:1533} WARNING - Failing (1) 
jobs without heartbeat after 2023-01-12 03:21:33.317638+00:00
   ```
   
   An odd case where a task instance became a zombie right after being executed.
   
   ---
   
   The task is scheduled and queued by the scheduler and passed to the executor.
   
   ```
   [2023-01-12T03:26:14.983+] {scheduler_job.py:360} INFO - 1 tasks up for 
execution:
   
   [2023-01-12T03:26:14.984+] {scheduler_job.py:511} INFO - Setting the 
following tasks to queued state:
   
   [2023-01-12T03:26:14.991+] {base_executor.py:95} INFO - Adding to queue: 
['airflow', 'tasks', 'run', 'model_contracts', 'wait_for_last_product_astro', 
'scheduled__2023-01-12T02:00:00+00:00', '--local', '--subdir', 
'DAGS_FOLDER/declarative/gusty/hourly/hourly.py']
   [2023-01-12T03:26:14.991+] {scheduler_job.py:550} INFO - Sending 
TaskInstanceKey(dag_id='model_contracts', 
task_id='wait_for_last_product_astro', 
run_id='scheduled__2023-01-12T02:00:00+00:00', try_number=1, map_index=-1) to 
executor with priority 15500 and queue default
   ```
   
   Celery worker picks up task instance, assigns celery task id (uuid), and 
emits executor event into event_buffer.
   
   ```
   [2023-01-12 03:26:15,005: INFO/MainProcess] Task 
airflow.executors.celery_executor.execute_command[f4242c9e-9426-4b2d-b55c-323731d74e09]
 received
   [2023-01-12 03:26:15,022: INFO/ForkPoolWorker-2] 
[f4242c9e-9426-4b2d-b55c-323731d74e09] Executing command in Celery: ['airflow', 
'tasks', 'run', 'model_contracts', 'wait_for_last_product_astro', 
'scheduled__2023-01-12T02:00:00+00:00', '--local', '--subdir', 
'DAGS_FOLDER/declarative/gusty/hourly/hourly.py']
   ```
   
   Scheduler reads event_buffer and acknowledges the task instances as assigned 
in Celery.
   
   ```
   [2023-01-12T03:26:15.145+] {scheduler_job.py:635} INFO - Setting 
external_id for  to 
f4242c9e-9426-4b2d-b55c-323731d74e09
   ```
   
   The task instance is marked as zombie soon after.
   
   ```
   [2023-01-12T03:26:33.347+] {scheduler_job.py:1533} WARNING - Failing (1) 
jobs without heartbeat after 2023-01-12 03:21:33.317638+00:00
   [2023-01-12T03:26:33.355+] {scheduler_job.py:1543} ERROR - Detected 
zombie job: {'full_filepath': 
'/usr/local/airflow/dags/declarative/gusty/hourly/hourly.py', 
'processor_subdir': '/usr/local/airflow/dags', 'msg': "{'DAG Id': 
'model_contracts', 'Task Id': 'wait_for_last_product_astro', 'Run Id': 
'scheduled__2023-01-12T02:00:00+00:00', 'Hostname': '10.2.124.99', 'External 
Executor Id': 'f4242c9e-9426-4b2d-b55c-323731d74e09'}", 'simple_task_instance': 
, 
'is_failure_callback': True}
   [2023-01-12T03:26:36.044+] {taskinstance.py:1774} ERROR - {'DAG Id': 
'model_contracts', 'Task Id': 'wait_for_last_product_astro', 'Run Id': 
'scheduled__2023-01-12T02:00:00+00:00', 'Hostname': '10.2.124.99', 'External 
Executor Id': 'f4242c9e-9426-4b2d-b55c-323731d74e09'}
   ```
   
   Based on the task logs, the task run command never got to the task execution 
part.
   
   ```
   [2023-01-12, 03:26:17 UTC] {standard_task_runner.py:83} INFO - Job 4184045: 
Subtask wait_for_last_product_astro
   [2023-01-12, 03:26:22 UTC] {standard_task_runner.py:100} ERROR - Failed to 
execute job 4184045 for task wait_for_last_product_astro 
((psycopg2.OperationalError) could not translate host name 
"geocentric-spacecraft-1886-pgbouncer.geocentric-spacecraft-1886.svc.cluster.local"
 to address: Temporary failure in name resolution
   
   (Background on this error at: https://sqlalche.me/e/14/e3q8); 7131)
   [2023-01-12, 03:26:23 UTC] {local_task_job.py:159} INFO - Task exited with 
return code 1
   ```
   
   Given the [command execution encounter an 
exception](https://github.com/apache/airflow/blob/2.5.0/airflow/cli/commands/task_command.py#L383-L389)
 before running the execute method, the StandardTaskRunner exited followed by 
the LocalTaskJob also exiting with the state success without handling the state 
of the task instance. At this point the state of the [task instance is 
running](https://github.com/apache/airflow/blob/2.5.0/airflow/jobs/local_task_job.py#L88-L97)
 because the LocalTaskJob successfully created the StandardTaskRunner.
   
   A task instance in the running state with its corresponding LocalTaskJob in 
the success state means the task instance is now a zombie but not because of 
the lack of heartbeats.
   
   ### What you think should happen instead
   
   As explained above not all zombies are caused by missed heartbeat. When a 
`LocalTaskJob` succeeds or fails while the task instance is 

[GitHub] [airflow-client-python] VeereshPatil commented on issue #21: trigger dag api is broken

2023-02-09 Thread via GitHub


VeereshPatil commented on issue #21:
URL: 
https://github.com/apache/airflow-client-python/issues/21#issuecomment-1425257059

   you can not pass **dag_id** and **external_trigger**  as part of DagRun ( 
dag_id is passed in post_dag_run() function)
   The following worked for me,
   
   ```
   dag_run = DAGRun(
   dag_run_id='some_test_run',
   conf={"key":"values"}
   )
   api_response = dag_run_api_instance.post_dag_run(dag_id, dag_run)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] uranusjr commented on a diff in pull request #29440: Bring back fixed tenacity

2023-02-09 Thread via GitHub


uranusjr commented on code in PR #29440:
URL: https://github.com/apache/airflow/pull/29440#discussion_r1102278823


##
setup.cfg:
##
@@ -134,9 +134,7 @@ install_requires =
 sqlalchemy>=1.4,<2.0
 sqlalchemy_jsonfield>=1.0
 tabulate>=0.7.5
-# The 8.2.0 release of tenacity has a mypy error that breaks our CI
-# The upper-bound limit can be removed after 
https://github.com/jd/tenacity/issues/389 is resolved
-tenacity>=6.2.0,<8.2.0
+tenacity>=8.2.1

Review Comment:
   ```suggestion
   tenacity>=6.2.0,!=8.2.0
   ```
   
   since the older versions aren’t incompatible?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] uranusjr commented on a diff in pull request #29451: Fix nested fields rendering in mapped operators

2023-02-09 Thread via GitHub


uranusjr commented on code in PR #29451:
URL: https://github.com/apache/airflow/pull/29451#discussion_r1102278121


##
airflow/models/mappedoperator.py:
##
@@ -687,7 +687,7 @@ def render_template_fields(
 unmapped_task = self.unmap(mapped_kwargs)
 context_update_for_unmapped(context, unmapped_task)
 
-self._do_render_template_fields(
+unmapped_task._do_render_template_fields(

Review Comment:
   A comment here containing the explaination you gave in the PR description 
would be a good idea.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] BrunoGrandePhD commented on issue #29137: Add ability to access context in functions decorated by task.sensor

2023-02-09 Thread via GitHub


BrunoGrandePhD commented on issue #29137:
URL: https://github.com/apache/airflow/issues/29137#issuecomment-1425156771

   I see that a PR to address this is already open and already approved by at 
least one reviewer, but if anyone else stumbles upon this issue, you can use 
`airflow.operators.python.get_current_context()` as a temporary solution to 
access context variables 
([docs](https://airflow.apache.org/docs/apache-airflow/stable/tutorial/taskflow.html#:~:text=Also%2C%20sometimes%20you%20might%20want%20to%20access%20the%20context%20somewhere%20deep%20in%20the%20stack%2C%20but%20you%20do%20not%20want%20to%20pass%20the%20context%20variables%20from%20the%20task%20callable.%20You%20can%20still%20access%20execution%20context%20via%20the%20get_current_context%20method.)).
 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] blag commented on a diff in pull request #29441: datasets, next_run_datasets, remove unnecessary timestamp filter

2023-02-09 Thread via GitHub


blag commented on code in PR #29441:
URL: https://github.com/apache/airflow/pull/29441#discussion_r1102128972


##
airflow/www/views.py:
##
@@ -3715,7 +3715,6 @@ def next_run_datasets(self, dag_id):
 DatasetEvent,
 and_(
 DatasetEvent.dataset_id == DatasetModel.id,
-DatasetEvent.timestamp > DatasetDagRunQueue.created_at,

Review Comment:
     It's been a long minute since I wrote this, but...
   
   I believe when I wrote this the intent with the `lastUpdate` field was to 
only show last updates _since the last time the dag was queued/run_. But yeah, 
the `lastUpdate` label isn't descriptive enough for that.
   
   Option 1: Personally, I would consider changing the name of what the 
`lastUpdate` field is rendered to, something like "Last update since last run" 
or something more wordy.
   
   Option 2: But if you don't want to do that, and you want to display the last 
update for every dataset regardless of whether has already been "consumed" by a 
DagRun (eg: in either the DatasetDagRunQueue or actually scheduled into a 
DagRun), then yeah it makes sense to remove this filter. However, I would also 
remove the `and_` around it since then there would only be one filter condition 
in that join:
   
   ```python
   .join(
   DatasetEvent,
   DatasetEvent.dataset_id == DatasetModel.id,
   isouter=True,
   )
   ```
   
   If you go for option 2, I think you should be able to compare the existence 
and creation time of the DDRQ with the DatasetEvent timestamp to figure out 
whether or not the last update time has already triggered a DDRQ/DagRun or if 
it has partially satisfied the conditions of a future DagRun.
   
   Hope this makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] hussein-awala commented on issue #29432: Jinja templating doesn't work with container_resources when using dymanic task mapping with Kubernetes Pod Operator

2023-02-09 Thread via GitHub


hussein-awala commented on issue #29432:
URL: https://github.com/apache/airflow/issues/29432#issuecomment-1425036759

   @pshrivastava27 I created this PR #29451 and it solved the problem for me, 
can you test it on your  Dev environment?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] hussein-awala commented on issue #29442: Logging from inside on_failure_callback not propagated with Cloud Composer

2023-02-09 Thread via GitHub


hussein-awala commented on issue #29442:
URL: https://github.com/apache/airflow/issues/29442#issuecomment-1425031828

   The callback is executed when the task/run finishes, and it is executed by 
the scheduler, you will find this log in the scheduler log files:
   ```
   [2023-02-10T00:01:08.696+] {logging_mixin.py:151} INFO - 
[2023-02-10T00:01:08.696+] {test_dag.py:19} INFO - Minimal example - 
failure callback with context: {'conf': 
, 'dag': 
, 'dag_run': , 'data_interval_end': 
DateTime(2023, 2, 10, 0, 0, 0, tzinfo=Timezone('UTC')), 'data_interval_start': 
DateTime(2023, 2, 9, 0, 0, 0, tzinfo=Timezone('UTC')), 'ds': '2023-02-09', 
'ds_nodash': '20230209', 'execution_date': DateTime(2023, 2, 9, 0, 0, 0, 
tzinfo=Timezone('UTC')), 'expanded_ti_count': None, 'inlets': [], 
'logical_date': DateTime(2023, 2, 9, 0, 0, 0, tzinfo=Timezone('UTC')), 
'macros': , 'next_ds': '2023-02-10', 'next_ds_nodash': '20230210', 
'next_execution_date': DateTime(2023, 2, 10, 0, 0, 0, tzinfo=Timezone('UTC')), 
'outlets': [], 'params': {}, 'prev_data_interval_start_success': None, 
'prev_data_interval_end_success': None, 'prev_ds': '2023-02-08', 
'prev_ds_nodash': '20230208', 'prev_execution_date': DateTime(2023, 2, 8, 0, 0, 
0, tzinfo=Timezone('UTC')), 'prev_execution_date_success': None, 
'prev_start_date_success': None, 'run_id': 
'scheduled__2023-02-09T00:00:00+00:00', 'task': , 'task_instance': , 'task_instance_key_str': 
'INT_tableau_others_recommendation_classifications__sleep_a_while__20230209', 
'test_mode': False, 'ti': , 'tomorrow_ds': '2023-02-10', 
'tomorrow_ds_no
 dash': '20230210', 'triggering_dataset_events': .get_triggering_events at 
0x7f2c14fa0290>>, 'ts': '2023-02-09T00:00:00+00:00', 'ts_nodash': 
'20230209T00', 'ts_nodash_with_tz': '20230209T00+', 'var': {'json': 
None, 'value': None}, 'conn': None, 'yesterday_ds': '2023-02-08', 
'yesterday_ds_nodash': '20230208', 'reason': 'timed_out'}
   ```
   You can find this log in the path `$AIRFLOW_HOME/logs/scheduler`, and for 
cloud composer you will find it on gcs `gs:///logs/scheduler`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #27797: Add tests to PythonOperator

2023-02-09 Thread via GitHub


github-actions[bot] commented on PR #27797:
URL: https://github.com/apache/airflow/pull/27797#issuecomment-1425005377

   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed in 5 days if no further activity occurs. 
Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] akrava opened a new pull request, #29454: Make `prev_logical_date` variable offset-aware

2023-02-09 Thread via GitHub


akrava opened a new pull request, #29454:
URL: https://github.com/apache/airflow/pull/29454

   Can fix https://github.com/apache/airflow/issues/29453 
   
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{issue_number}.significant.rst`, in 
[newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] Taragolis commented on pull request #29447: Remove non-public interface usage in EcsRunTaskOperator

2023-02-09 Thread via GitHub


Taragolis commented on PR #29447:
URL: https://github.com/apache/airflow/pull/29447#issuecomment-1424981137

   Not sure is it changes, breeze itself or something else but I have 
interesting behaviour with logs during execution this operator
   
   ```console
   
   [2023-02-09, 23:33:53 UTC] {ecs.py:489} INFO - ECS task ID is: 
6beb6333b8dc4cf7afbd36b524efd026*** Could not read served logs: Parent instance 
 is not bound to a Session; lazy load operation 
of attribute 'trigger' cannot proceed (Background on this error at: 
https://sqlalche.me/e/14/bhk3)
   *** Found local files:
   ***   * 
/root/airflow/logs/dag_id=test_ecs_operator_reatach/run_id=manual__2023-02-09T23:33:49.777801+00:00/task_id=ecs-task-regular-no-reattach/attempt=1.log
   *** Could not read served logs: Parent instance  is not bound to a Session; lazy load operation of attribute 
'trigger' cannot proceed (Background on this error at: 
https://sqlalche.me/e/14/bhk3)
   *** Found local files:
   ***   * 
/root/airflow/logs/dag_id=test_ecs_operator_reatach/run_id=manual__2023-02-09T23:33:49.777801+00:00/task_id=ecs-task-regular-no-reattach/attempt=1.log
   *** Could not read served logs: Parent instance  is not bound to a Session; lazy load operation of attribute 
'trigger' cannot proceed (Background on this error at: 
https://sqlalche.me/e/14/bhk3)
   *** Found local files:
   ***   * 
/root/airflow/logs/dag_id=test_ecs_operator_reatach/run_id=manual__2023-02-09T23:33:49.777801+00:00/task_id=ecs-task-regular-no-reattach/attempt=1.log
   *** Could not read served logs: Parent instance  is not bound to a Session; lazy load operation of attribute 
'trigger' cannot proceed (Background on this error at: 
https://sqlalche.me/e/14/bhk3)
   *** Found local files:
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] akrava opened a new issue, #29453: Couldn't open Calendar tab of DAG after triggering DAG

2023-02-09 Thread via GitHub


akrava opened a new issue, #29453:
URL: https://github.com/apache/airflow/issues/29453

   ### Apache Airflow version
   
   2.5.1
   
   ### What happened
   
   After triggering any DAG and trying to open Calendar tab I see this:
   
![image](https://user-images.githubusercontent.com/13131083/217962405-d82f6058-0799-468b-84a7-b519685c626a.png)
   
   Here is the logs:
   ```
   [2023-02-09T23:19:27.482+] {app.py:1741} ERROR - Exception on 
/dags/dtap_dag_with_bash/calendar [GET]
   Traceback (most recent call last):
 File "/home/airflow/.local/lib/python3.9/site-packages/flask/app.py", line 
2525, in wsgi_app
   response = self.full_dispatch_request()
 File "/home/airflow/.local/lib/python3.9/site-packages/flask/app.py", line 
1822, in full_dispatch_request
   rv = self.handle_user_exception(e)
 File "/home/airflow/.local/lib/python3.9/site-packages/flask/app.py", line 
1820, in full_dispatch_request
   rv = self.dispatch_request()
 File "/home/airflow/.local/lib/python3.9/site-packages/flask/app.py", line 
1796, in dispatch_request
   return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
 File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/www/auth.py", line 
47, in decorated
   return func(*args, **kwargs)
 File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/www/decorators.py", 
line 166, in view_func
   return f(*args, **kwargs)
 File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/www/decorators.py", 
line 125, in wrapper
   return f(*args, **kwargs)
 File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/session.py", 
line 75, in wrapper
   return func(*args, session=session, **kwargs)
 File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/www/views.py", line 
2756, in calendar
   if curr_info.logical_date <= prev_logical_date:
   TypeError: can't compare offset-naive and offset-aware datetimes
   ```
   
   ### What you think should happen instead
   
   _No response_
   
   ### How to reproduce
   
   _No response_
   
   ### Operating System
   
   Ubuntu 20.04.5
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] blag commented on a diff in pull request #29441: datasets, next_run_datasets, remove unnecessary timestamp filter

2023-02-09 Thread via GitHub


blag commented on code in PR #29441:
URL: https://github.com/apache/airflow/pull/29441#discussion_r1102128972


##
airflow/www/views.py:
##
@@ -3715,7 +3715,6 @@ def next_run_datasets(self, dag_id):
 DatasetEvent,
 and_(
 DatasetEvent.dataset_id == DatasetModel.id,
-DatasetEvent.timestamp > DatasetDagRunQueue.created_at,

Review Comment:
     It's been a long minute since I wrote this, but...
   
   I believe when I wrote this the intent with the `lastUpdate` field was to 
only show last updates _since the last time the dag was queued/run_. But yeah, 
the `lastUpdate` label isn't descriptive enough for that.
   
   Option 1: Personally, I would consider changing the name of what the 
`lastUpdate` field is rendered to, something like "Last update since last run" 
or something more wordy.
   
   Option 2: But if you don't want to do that, and you want to display the last 
update for every dataset regardless of whether has already been "consumed" by a 
DagRun (eg: in either the DatasetDagRunQueue or actually scheduled into a 
DagRun), then yeah it makes sense to remove this filter. However, I would also 
remove the `and_` around it since then there would only be one filter condition 
in that join:
   
   ```python
   .join(
   DatasetEvent,
   DatasetEvent.dataset_id == DatasetModel.id,
   isouter=True,
   )
   ```
   
   If you go for option 2, I think you should be able to compare the existence 
and creation time of the DDRQ with the DatasetEvent timestamp to figure out 
whether or not the last update time has already triggered a DagRun or if it has 
partially satisfied the conditions of a future DagRun.
   
   Hope this makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] Taragolis commented on a diff in pull request #29452: Add support of a different AWS connection for DynamoDB

2023-02-09 Thread via GitHub


Taragolis commented on code in PR #29452:
URL: https://github.com/apache/airflow/pull/29452#discussion_r1102125057


##
airflow/providers/amazon/aws/transfers/dynamodb_to_s3.py:
##
@@ -103,6 +106,7 @@ def __init__(
 self,
 *,
 dynamodb_table_name: str,
+dynamodb_conn_id: str | None = None,

Review Comment:
   It is more about less known behaviour of boto3-based Hooks, `None` it is a 
legit value and could force to use boto3 default behaviour strategy (without 
any call to Connections), e.g. env Variables, IAM Profile, ECS Task Role and 
etc.
   
   I think better to use specific sentinel as default value instead of `None` 
here
   
   
https://github.com/apache/airflow/blob/a76e0fe16ef12749c3fea1b68d82936b238fafbb/airflow/utils/types.py#L28-L44



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] harishkrao commented on a diff in pull request #28950: Sensor for Databricks partition and table changes

2023-02-09 Thread via GitHub


harishkrao commented on code in PR #28950:
URL: https://github.com/apache/airflow/pull/28950#discussion_r1102114019


##
airflow/providers/databricks/sensors/databricks_table_changes.py:
##
@@ -0,0 +1,164 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+"""This module contains Databricks sensors."""
+
+from __future__ import annotations
+
+from datetime import datetime, timedelta
+from typing import Any, Callable, Sequence
+
+from airflow.exceptions import AirflowException
+from airflow.providers.common.sql.hooks.sql import fetch_all_handler
+from airflow.providers.databricks.hooks.databricks_sql import DatabricksSqlHook
+from airflow.providers.databricks.sensors.databricks_sql import 
DatabricksSqlSensor
+from airflow.utils.context import Context
+
+
+class DatabricksTableChangesSensor(DatabricksSqlSensor):
+"""Sensor to detect changes in a Delta table.
+
+
+:param databricks_conn_id: Reference to :ref:`Databricks
+connection id` (templated), defaults to
+DatabricksSqlHook.default_conn_name
+:param http_path: Optional string specifying HTTP path of Databricks SQL 
Endpoint or cluster.
+If not specified, it should be either specified in the Databricks 
connection's
+extra parameters, or ``sql_endpoint_name`` must be specified.
+:param sql_endpoint_name: Optional name of Databricks SQL Endpoint. If not 
specified, ``http_path`` must
+be provided as described above, defaults to None
+:param session_configuration: An optional dictionary of Spark session 
parameters. If not specified,
+it could be specified in the Databricks connection's extra 
parameters., defaults to None
+:param http_headers: An optional list of (k, v) pairs
+that will be set as HTTP headers on every request. (templated).
+:param catalog: An optional initial catalog to use.
+Requires DBR version 9.0+ (templated), defaults to ""
+:param schema: An optional initial schema to use.
+Requires DBR version 9.0+ (templated), defaults to "default"
+:param table_name: Table name to generate the SQL query, defaults to ""
+:param handler: Handler for DbApiHook.run() to return results, defaults to 
fetch_all_handler
+:param caller: String passed to name a hook to Databricks, defaults to 
"DatabricksPartitionSensor"
+:param client_parameters: Additional parameters internal to Databricks SQL 
Connector parameters.
+:param timestamp: Timestamp to check event history for a Delta table,
+defaults to datetime.now()-timedelta(days=7)
+"""
+
+template_fields: Sequence[str] = ("databricks_conn_id", "catalog", 
"schema", "table_name", "timestamp")
+
+def __init__(
+self,
+*,
+databricks_conn_id: str = DatabricksSqlHook.default_conn_name,
+http_path: str | None = None,
+sql_endpoint_name: str | None = None,
+session_configuration=None,
+http_headers: list[tuple[str, str]] | None = None,
+catalog: str = "",
+schema: str = "default",
+table_name: str = "",
+handler: Callable[[Any], Any] = fetch_all_handler,
+timestamp: datetime = datetime.now() - timedelta(days=7),
+caller: str = "DatabricksTableChangesSensor",
+client_parameters: dict[str, Any] | None = None,
+**kwargs,
+) -> None:
+super().__init__(**kwargs)
+self.databricks_conn_id = databricks_conn_id
+self._http_path = http_path
+self._sql_endpoint_name = sql_endpoint_name
+self.session_config = session_configuration
+self.http_headers = http_headers
+self.catalog = catalog
+self.schema = schema
+self.table_name = table_name
+self.timestamp = timestamp
+self.caller = caller
+self.client_parameters = client_parameters or {}
+self.hook_params = kwargs.pop("hook_params", {})
+self.handler = handler
+
+def _get_hook(self) -> DatabricksSqlHook:
+return DatabricksSqlHook(
+self.databricks_conn_id,
+self._http_path,
+self._sql_endpoint_name,
+self.session_config,
+self.http_headers,
+self.catalog,

[GitHub] [airflow] harishkrao commented on a diff in pull request #28950: Sensor for Databricks partition and table changes

2023-02-09 Thread via GitHub


harishkrao commented on code in PR #28950:
URL: https://github.com/apache/airflow/pull/28950#discussion_r1102113638


##
airflow/providers/databricks/sensors/databricks_table_changes.py:
##
@@ -0,0 +1,164 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+"""This module contains Databricks sensors."""
+
+from __future__ import annotations
+
+from datetime import datetime, timedelta
+from typing import Any, Callable, Sequence
+
+from airflow.exceptions import AirflowException
+from airflow.providers.common.sql.hooks.sql import fetch_all_handler
+from airflow.providers.databricks.hooks.databricks_sql import DatabricksSqlHook
+from airflow.providers.databricks.sensors.databricks_sql import 
DatabricksSqlSensor
+from airflow.utils.context import Context
+
+
+class DatabricksTableChangesSensor(DatabricksSqlSensor):
+"""Sensor to detect changes in a Delta table.
+
+
+:param databricks_conn_id: Reference to :ref:`Databricks
+connection id` (templated), defaults to
+DatabricksSqlHook.default_conn_name
+:param http_path: Optional string specifying HTTP path of Databricks SQL 
Endpoint or cluster.
+If not specified, it should be either specified in the Databricks 
connection's
+extra parameters, or ``sql_endpoint_name`` must be specified.
+:param sql_endpoint_name: Optional name of Databricks SQL Endpoint. If not 
specified, ``http_path`` must
+be provided as described above, defaults to None
+:param session_configuration: An optional dictionary of Spark session 
parameters. If not specified,
+it could be specified in the Databricks connection's extra 
parameters., defaults to None
+:param http_headers: An optional list of (k, v) pairs
+that will be set as HTTP headers on every request. (templated).
+:param catalog: An optional initial catalog to use.
+Requires DBR version 9.0+ (templated), defaults to ""
+:param schema: An optional initial schema to use.
+Requires DBR version 9.0+ (templated), defaults to "default"
+:param table_name: Table name to generate the SQL query, defaults to ""
+:param handler: Handler for DbApiHook.run() to return results, defaults to 
fetch_all_handler
+:param caller: String passed to name a hook to Databricks, defaults to 
"DatabricksPartitionSensor"
+:param client_parameters: Additional parameters internal to Databricks SQL 
Connector parameters.
+:param timestamp: Timestamp to check event history for a Delta table,
+defaults to datetime.now()-timedelta(days=7)
+"""
+
+template_fields: Sequence[str] = ("databricks_conn_id", "catalog", 
"schema", "table_name", "timestamp")
+
+def __init__(
+self,
+*,
+databricks_conn_id: str = DatabricksSqlHook.default_conn_name,
+http_path: str | None = None,
+sql_endpoint_name: str | None = None,
+session_configuration=None,
+http_headers: list[tuple[str, str]] | None = None,
+catalog: str = "",
+schema: str = "default",
+table_name: str = "",
+handler: Callable[[Any], Any] = fetch_all_handler,
+timestamp: datetime = datetime.now() - timedelta(days=7),
+caller: str = "DatabricksTableChangesSensor",
+client_parameters: dict[str, Any] | None = None,
+**kwargs,
+) -> None:
+super().__init__(**kwargs)
+self.databricks_conn_id = databricks_conn_id
+self._http_path = http_path
+self._sql_endpoint_name = sql_endpoint_name
+self.session_config = session_configuration
+self.http_headers = http_headers
+self.catalog = catalog
+self.schema = schema
+self.table_name = table_name
+self.timestamp = timestamp
+self.caller = caller
+self.client_parameters = client_parameters or {}
+self.hook_params = kwargs.pop("hook_params", {})
+self.handler = handler
+
+def _get_hook(self) -> DatabricksSqlHook:
+return DatabricksSqlHook(
+self.databricks_conn_id,
+self._http_path,
+self._sql_endpoint_name,
+self.session_config,
+self.http_headers,
+self.catalog,

[GitHub] [airflow] harishkrao commented on a diff in pull request #28950: Sensor for Databricks partition and table changes

2023-02-09 Thread via GitHub


harishkrao commented on code in PR #28950:
URL: https://github.com/apache/airflow/pull/28950#discussion_r1102111386


##
airflow/providers/databricks/sensors/databricks_table_changes.py:
##
@@ -0,0 +1,164 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+"""This module contains Databricks sensors."""
+
+from __future__ import annotations
+
+from datetime import datetime, timedelta
+from typing import Any, Callable, Sequence
+
+from airflow.exceptions import AirflowException
+from airflow.providers.common.sql.hooks.sql import fetch_all_handler
+from airflow.providers.databricks.hooks.databricks_sql import DatabricksSqlHook
+from airflow.providers.databricks.sensors.databricks_sql import 
DatabricksSqlSensor
+from airflow.utils.context import Context
+
+
+class DatabricksTableChangesSensor(DatabricksSqlSensor):
+"""Sensor to detect changes in a Delta table.
+
+
+:param databricks_conn_id: Reference to :ref:`Databricks
+connection id` (templated), defaults to
+DatabricksSqlHook.default_conn_name
+:param http_path: Optional string specifying HTTP path of Databricks SQL 
Endpoint or cluster.
+If not specified, it should be either specified in the Databricks 
connection's
+extra parameters, or ``sql_endpoint_name`` must be specified.
+:param sql_endpoint_name: Optional name of Databricks SQL Endpoint. If not 
specified, ``http_path`` must
+be provided as described above, defaults to None
+:param session_configuration: An optional dictionary of Spark session 
parameters. If not specified,
+it could be specified in the Databricks connection's extra 
parameters., defaults to None
+:param http_headers: An optional list of (k, v) pairs
+that will be set as HTTP headers on every request. (templated).
+:param catalog: An optional initial catalog to use.
+Requires DBR version 9.0+ (templated), defaults to ""
+:param schema: An optional initial schema to use.
+Requires DBR version 9.0+ (templated), defaults to "default"
+:param table_name: Table name to generate the SQL query, defaults to ""
+:param handler: Handler for DbApiHook.run() to return results, defaults to 
fetch_all_handler
+:param caller: String passed to name a hook to Databricks, defaults to 
"DatabricksPartitionSensor"
+:param client_parameters: Additional parameters internal to Databricks SQL 
Connector parameters.
+:param timestamp: Timestamp to check event history for a Delta table,
+defaults to datetime.now()-timedelta(days=7)
+"""
+
+template_fields: Sequence[str] = ("databricks_conn_id", "catalog", 
"schema", "table_name", "timestamp")
+
+def __init__(
+self,
+*,
+databricks_conn_id: str = DatabricksSqlHook.default_conn_name,
+http_path: str | None = None,
+sql_endpoint_name: str | None = None,
+session_configuration=None,
+http_headers: list[tuple[str, str]] | None = None,
+catalog: str = "",
+schema: str = "default",
+table_name: str = "",
+handler: Callable[[Any], Any] = fetch_all_handler,
+timestamp: datetime = datetime.now() - timedelta(days=7),
+caller: str = "DatabricksTableChangesSensor",
+client_parameters: dict[str, Any] | None = None,

Review Comment:
   Changed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] harishkrao commented on a diff in pull request #28950: Sensor for Databricks partition and table changes

2023-02-09 Thread via GitHub


harishkrao commented on code in PR #28950:
URL: https://github.com/apache/airflow/pull/28950#discussion_r110234


##
airflow/providers/databricks/sensors/databricks_partition.py:
##
@@ -0,0 +1,157 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+"""This module contains Databricks sensors."""
+
+from __future__ import annotations
+
+from typing import Any, Callable, Sequence
+
+from airflow.exceptions import AirflowException
+from airflow.providers.common.sql.hooks.sql import fetch_one_handler
+from airflow.providers.databricks.hooks.databricks_sql import DatabricksSqlHook
+from airflow.providers.databricks.sensors.databricks_sql import 
DatabricksSqlSensor
+from airflow.utils.context import Context
+
+template_fields: Sequence[str] = ("databricks_conn_id", "catalog", "schema", 
"table_name", "partition_name")
+
+
+class DatabricksPartitionSensor(DatabricksSqlSensor):
+"""Sensor to detect the existence of partitions in a Delta table.
+
+:param databricks_conn_id: Reference to :ref:`Databricks
+connection id` (templated).
+:param http_path: Optional string specifying HTTP path of Databricks SQL 
Endpoint or cluster.
+If not specified, it should be either specified in the Databricks 
connection's extra parameters,
+or ``sql_endpoint_name`` must be specified.
+:param sql_endpoint_name: Optional name of Databricks SQL Endpoint.
+If not specified, ``http_path`` must be provided as described above.
+:param session_configuration: An optional dictionary of Spark session 
parameters. If not specified,
+it could be specified in the Databricks connection's extra parameters.
+:param http_headers: An optional list of (k, v) pairs that will be set
+as HTTP headers on every request. (templated)
+:param catalog: An optional initial catalog to use. Requires DBR version 
9.0+ (templated)
+:param schema: An optional initial schema to use. Requires DBR version 
9.0+ (templated)
+:param table_name: Table name to generate the SQL query.
+:param partition_name: Partition to check.
+:param handler: Handler for DbApiHook.run() to return results, defaults to 
fetch_one_handler
+:param caller: String passed to name a hook to Databricks, defaults to 
"DatabricksPartitionSensor"
+:param client_parameters: Additional parameters internal to Databricks SQL 
Connector parameters.
+:param partition_operator: Comparison operator for partitions.
+"""
+
+template_fields: Sequence[str] = (
+"databricks_conn_id",
+"schema",
+"http_headers",
+"catalog",
+"table_name",
+"partition_name",
+)
+
+def __init__(
+self,
+*,
+databricks_conn_id: str = DatabricksSqlHook.default_conn_name,
+http_path: str | None = None,
+sql_endpoint_name: str | None = None,
+session_configuration=None,
+http_headers: list[tuple[str, str]] | None = None,
+catalog: str = "",
+schema: str = "default",
+table_name: str = "",
+partition_name: dict,
+handler: Callable[[Any], Any] = fetch_one_handler,
+caller: str = "DatabricksPartitionSensor",
+client_parameters: dict[str, Any] | None = None,
+partition_operator: str = "=",
+**kwargs,
+) -> None:
+super().__init__(**kwargs)
+self.databricks_conn_id = databricks_conn_id
+self._http_path = http_path
+self._sql_endpoint_name = sql_endpoint_name
+self.session_config = session_configuration
+self.http_headers = http_headers
+self.catalog = catalog
+self.schema = schema
+self.table_name = table_name
+self.partition_name = partition_name
+self.caller = caller
+self.client_parameters = client_parameters or {}
+self.hook_params = kwargs.pop("hook_params", {})
+self.handler = handler
+self.partition_operator = partition_operator
+
+def _get_hook(self) -> DatabricksSqlHook:
+return DatabricksSqlHook(
+self.databricks_conn_id,
+self._http_path,
+self._sql_endpoint_name,
+self.session_config,
+   

[GitHub] [airflow] harishkrao commented on a diff in pull request #28950: Sensor for Databricks partition and table changes

2023-02-09 Thread via GitHub


harishkrao commented on code in PR #28950:
URL: https://github.com/apache/airflow/pull/28950#discussion_r1102110885


##
airflow/providers/databricks/sensors/databricks_partition.py:
##
@@ -0,0 +1,157 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+"""This module contains Databricks sensors."""
+
+from __future__ import annotations
+
+from typing import Any, Callable, Sequence
+
+from airflow.exceptions import AirflowException
+from airflow.providers.common.sql.hooks.sql import fetch_one_handler
+from airflow.providers.databricks.hooks.databricks_sql import DatabricksSqlHook
+from airflow.providers.databricks.sensors.databricks_sql import 
DatabricksSqlSensor
+from airflow.utils.context import Context
+
+template_fields: Sequence[str] = ("databricks_conn_id", "catalog", "schema", 
"table_name", "partition_name")
+
+
+class DatabricksPartitionSensor(DatabricksSqlSensor):
+"""Sensor to detect the existence of partitions in a Delta table.
+
+:param databricks_conn_id: Reference to :ref:`Databricks
+connection id` (templated).
+:param http_path: Optional string specifying HTTP path of Databricks SQL 
Endpoint or cluster.
+If not specified, it should be either specified in the Databricks 
connection's extra parameters,
+or ``sql_endpoint_name`` must be specified.
+:param sql_endpoint_name: Optional name of Databricks SQL Endpoint.
+If not specified, ``http_path`` must be provided as described above.
+:param session_configuration: An optional dictionary of Spark session 
parameters. If not specified,
+it could be specified in the Databricks connection's extra parameters.
+:param http_headers: An optional list of (k, v) pairs that will be set
+as HTTP headers on every request. (templated)
+:param catalog: An optional initial catalog to use. Requires DBR version 
9.0+ (templated)
+:param schema: An optional initial schema to use. Requires DBR version 
9.0+ (templated)
+:param table_name: Table name to generate the SQL query.
+:param partition_name: Partition to check.
+:param handler: Handler for DbApiHook.run() to return results, defaults to 
fetch_one_handler
+:param caller: String passed to name a hook to Databricks, defaults to 
"DatabricksPartitionSensor"
+:param client_parameters: Additional parameters internal to Databricks SQL 
Connector parameters.
+:param partition_operator: Comparison operator for partitions.
+"""
+
+template_fields: Sequence[str] = (
+"databricks_conn_id",
+"schema",
+"http_headers",
+"catalog",
+"table_name",
+"partition_name",
+)
+
+def __init__(
+self,
+*,
+databricks_conn_id: str = DatabricksSqlHook.default_conn_name,
+http_path: str | None = None,
+sql_endpoint_name: str | None = None,
+session_configuration=None,
+http_headers: list[tuple[str, str]] | None = None,
+catalog: str = "",
+schema: str = "default",
+table_name: str = "",

Review Comment:
   Yes, changed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] Taragolis commented on a diff in pull request #29447: Remove non-public interface usage in EcsRunTaskOperator

2023-02-09 Thread via GitHub


Taragolis commented on code in PR #29447:
URL: https://github.com/apache/airflow/pull/29447#discussion_r1102108095


##
tests/providers/amazon/aws/utils/test_identifiers.py:
##
@@ -0,0 +1,74 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import random
+import string
+import uuid
+
+import pytest
+
+from airflow.providers.amazon.aws.utils.identifiers import generate_uuid
+from airflow.utils.types import NOTSET
+
+
+class TestGenerateUuid:
+@pytest.fixture(
+autouse=True,
+params=[
+pytest.param(NOTSET, id="default-namespace"),
+pytest.param(uuid.UUID(int=42), id="custom-namespace"),
+],
+)
+def setup_namespace(self, request):
+self.default_namespace = request.param is NOTSET
+self.namespace = uuid.NAMESPACE_OID if self.default_namespace else 
request.param
+self.kwargs = {"namespace": self.namespace} if not 
self.default_namespace else {}
+
+def test_deterministic(self):
+"""Test that result deterministic and valid UUID object"""
+args = [
+"".join(random.choice(string.ascii_letters) for _ in 
range(random.randint(3, 13)))
+for _ in range(100)
+]

Review Comment:
   That the basically a problem with any hashing algorithm and testing that.
   This case mostly for catch some changes in logic when two run with the same 
value could return different result (I hope it never happen).
   
   So static values here would be "foo", "bar", "spam", "egg", "airflow"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


potiuk commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1102103056


##
scripts/ci/docker-compose/integration-statsd.yml:
##
@@ -19,15 +19,37 @@ version: "3.7"
 services:
   statsd-exporter:
 image: apache/airflow:airflow-statsd-exporter-2020.09.05-v0.17.0
+container_name: "breeze-statsd-exporter"
 ports:
   - "9125:9125"
   - "9125:9125/udp"
   - "29102:9102"
 
+  prometheus:
+image: prom/prometheus
+container_name: "breeze-prometheus"
+user: "0"
+ports:
+  - "29090:9090"
+volumes:
+  - ./prometheus.yml:/etc/prometheus/prometheus.yml
+  - ./prometheus/volume:/prometheus
+
   grafana:
 image: grafana/grafana:8.2.4
+container_name: "breeze-grafana"
+environment:
+  GF_AUTH_ANONYMOUS_ENABLED: true
+  GF_AUTH_ANONYMOUS_ORG_NAME: "Main Org."
+  GF_AUTH_ANONYMOUS_ORG_ROLE: "Admin"

Review Comment:
   We have plenty of admin/admin , airflow/airflow in those files already :). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


potiuk commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1102102697


##
scripts/ci/docker-compose/integration-statsd.yml:
##
@@ -19,15 +19,36 @@ version: "3.7"
 services:
   statsd-exporter:
 image: apache/airflow:airflow-statsd-exporter-2020.09.05-v0.17.0
+container_name: "breeze-statsd-exporter"
 ports:
   - "9125:9125"
   - "9125:9125/udp"
   - "29102:9102"
 
+  prometheus:
+image: prom/prometheus
+container_name: "breeze-prometheus"
+user: "0"
+ports:
+  - "29090:9090"

Review Comment:
   This looks cool.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1102103235


##
scripts/ci/docker-compose/integration-statsd.yml:
##
@@ -19,15 +19,37 @@ version: "3.7"
 services:
   statsd-exporter:
 image: apache/airflow:airflow-statsd-exporter-2020.09.05-v0.17.0
+container_name: "breeze-statsd-exporter"
 ports:
   - "9125:9125"
   - "9125:9125/udp"
   - "29102:9102"
 
+  prometheus:
+image: prom/prometheus
+container_name: "breeze-prometheus"
+user: "0"
+ports:
+  - "29090:9090"
+volumes:
+  - ./prometheus.yml:/etc/prometheus/prometheus.yml
+  - ./prometheus/volume:/prometheus
+
   grafana:
 image: grafana/grafana:8.2.4
+container_name: "breeze-grafana"
+environment:
+  GF_AUTH_ANONYMOUS_ENABLED: true
+  GF_AUTH_ANONYMOUS_ORG_NAME: "Main Org."
+  GF_AUTH_ANONYMOUS_ORG_ROLE: "Admin"

Review Comment:
   "Main Org." is the default org and only available option without also 
configuring the organization/team/user structure which felt excessive in this 
case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1102099891


##
scripts/ci/docker-compose/integration-statsd.yml:
##
@@ -19,15 +19,36 @@ version: "3.7"
 services:
   statsd-exporter:
 image: apache/airflow:airflow-statsd-exporter-2020.09.05-v0.17.0
+container_name: "breeze-statsd-exporter"
 ports:
   - "9125:9125"
   - "9125:9125/udp"
   - "29102:9102"
 
+  prometheus:
+image: prom/prometheus
+container_name: "breeze-prometheus"
+user: "0"
+ports:
+  - "29090:9090"

Review Comment:
   In the end it felt a little better in the BREEZE doc.  I dropped it there, 
but I'll move it if you prefer it in TESTING



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1102099122


##
scripts/ci/docker-compose/integration-statsd.yml:
##
@@ -19,15 +19,37 @@ version: "3.7"
 services:
   statsd-exporter:
 image: apache/airflow:airflow-statsd-exporter-2020.09.05-v0.17.0
+container_name: "breeze-statsd-exporter"
 ports:
   - "9125:9125"
   - "9125:9125/udp"
   - "29102:9102"
 
+  prometheus:
+image: prom/prometheus
+container_name: "breeze-prometheus"
+user: "0"
+ports:
+  - "29090:9090"
+volumes:
+  - ./prometheus.yml:/etc/prometheus/prometheus.yml
+  - ./prometheus/volume:/prometheus
+
   grafana:
 image: grafana/grafana:8.2.4
+container_name: "breeze-grafana"
+environment:
+  GF_AUTH_ANONYMOUS_ENABLED: true
+  GF_AUTH_ANONYMOUS_ORG_NAME: "Main Org."
+  GF_AUTH_ANONYMOUS_ORG_ROLE: "Admin"

Review Comment:
   Not a good idea in production, but this configures Grafana to skip the login 
screen and auto-login as an admin.  Seems fine to me since this is a test/dev 
environment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1102099122


##
scripts/ci/docker-compose/integration-statsd.yml:
##
@@ -19,15 +19,37 @@ version: "3.7"
 services:
   statsd-exporter:
 image: apache/airflow:airflow-statsd-exporter-2020.09.05-v0.17.0
+container_name: "breeze-statsd-exporter"
 ports:
   - "9125:9125"
   - "9125:9125/udp"
   - "29102:9102"
 
+  prometheus:
+image: prom/prometheus
+container_name: "breeze-prometheus"
+user: "0"
+ports:
+  - "29090:9090"
+volumes:
+  - ./prometheus.yml:/etc/prometheus/prometheus.yml
+  - ./prometheus/volume:/prometheus
+
   grafana:
 image: grafana/grafana:8.2.4
+container_name: "breeze-grafana"
+environment:
+  GF_AUTH_ANONYMOUS_ENABLED: true
+  GF_AUTH_ANONYMOUS_ORG_NAME: "Main Org."
+  GF_AUTH_ANONYMOUS_ORG_ROLE: "Admin"

Review Comment:
   Not a good idea in production, but this configures Grafana to auto-login as 
an admin.  Seems fine to me since this is a test/dev environment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


ferruzzi commented on PR #29449:
URL: https://github.com/apache/airflow/pull/29449#issuecomment-1424931439

   Failing test is 
   
   ```
  tests/models/test_xcom.py .[2023-02-09T21:36:22.304+] 
{xcom.py:632} ERROR - Object of type PickleRce is not JSON serializable. If you 
are using pickle instead of JSON for XCom, then you need to enable pickle 
support for XCom in your airflow config or make sure to decorate your object 
with attr.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] jedcunningham commented on pull request #29136: Dataproc batches

2023-02-09 Thread via GitHub


jedcunningham commented on PR #29136:
URL: https://github.com/apache/airflow/pull/29136#issuecomment-1424928053

   It gets squashed automatically, so you don't need to worry about it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] o-nikolas commented on issue #29424: Status of testing Providers that were prepared on February 08, 2023

2023-02-09 Thread via GitHub


o-nikolas commented on issue #29424:
URL: https://github.com/apache/airflow/issues/29424#issuecomment-1424909095

   I added code to kill the process while running to test the emr `on_kill` 
changes, and the cleanup code ran successfully (and I confirmed the resource 
was deleted in my AWS account):
   
   ```
   INFO  [airflow.task.operators] Job flow with id j-1ADYKTIV4THO5 created
   INFO  [airflow.task.operators] SLEEPING NOW, KILL THE PROCESS TO TEST ON_KILL
   INFO  [airflow.task.operators] TRYING TO KILL MYSELF
   ERROR [airflow.task] Received SIGTERM. Terminating subprocesses.
   INFO  [airflow.task.operators] Terminating job flow j-1ADYKTIV4THO5
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] eladkal closed issue #28431: Celery Stalled Running Tasks (SRT)

2023-02-09 Thread via GitHub


eladkal closed issue #28431: Celery Stalled Running Tasks (SRT)
URL: https://github.com/apache/airflow/issues/28431


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #29452: Add support of a different AWS connection for DynamoDB

2023-02-09 Thread via GitHub


boring-cyborg[bot] commented on PR #29452:
URL: https://github.com/apache/airflow/pull/29452#issuecomment-1424905565

   Congratulations on your first Pull Request and welcome to the Apache Airflow 
community! If you have any issues or are unsure about any anything please check 
our Contribution Guide 
(https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (ruff, mypy and type 
annotations). Our [pre-commits]( 
https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks)
 will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in 
`docs/` directory). Adding a new operator? Check this short 
[guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst)
 Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze 
environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for 
testing locally, it's a heavy docker but it ships with a working Airflow and a 
lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get 
the final approval from Committers.
   - Please follow [ASF Code of 
Conduct](https://www.apache.org/foundation/policies/conduct) for all 
communication including (but not limited to) comments on Pull Requests, Mailing 
list and Slack.
   - Be sure to read the [Airflow Coding style]( 
https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it 
better .
   In case of doubts contact the developers at:
   Mailing List: d...@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] 1inuxoid opened a new pull request, #29452: Add support of a different AWS connection for DynamoDB

2023-02-09 Thread via GitHub


1inuxoid opened a new pull request, #29452:
URL: https://github.com/apache/airflow/pull/29452

   This change adds a new optional argument `dynamodb_conn_id` to 
`DynamoDBToS3Operator` so that a separate AWS connection can be used to scan a 
DynamoDB table. If not specified, the connection from `aws_conn_id` is used, as 
before.
   
   This makes it useful for cross-account transfers.
   
   closes: #29422 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] kevingoss2 commented on pull request #26786: Fix parsing of optional `mode` field in BigQuery Result Schema

2023-02-09 Thread via GitHub


kevingoss2 commented on PR #26786:
URL: https://github.com/apache/airflow/pull/26786#issuecomment-1424890246

   @patricker You are correct.  That version of the code does not have the 
error.  The issue is that when I do a pip freeze on the docker container it 
shows 8.8.0.  Will try to see what is going on.  Thanks for the response!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] hussein-awala opened a new pull request, #29451: Fix/render nested fields

2023-02-09 Thread via GitHub


hussein-awala opened a new pull request, #29451:
URL: https://github.com/apache/airflow/pull/29451

   closes: #29432
   related: #25588
   
   ---
   **^ Add meaningful description above**
   
   By default, to render nested fields, we need to provide a nested argument 
`template_fields` contains the list of nested fields names to render. In some 
of the operator (ex `KubernetesPodOperator`) we override the method 
`_render_nested_template_fields` to define how we want to render the nested 
fields.
   
   This works fine with the normal operator because they extend the 
`BaseOperator`, and since they are not sub classes of `MappedOperator`, when it 
calls this method we call directly `Templater._render_nested_template_fields` 
instead of calling the method we overrode.
   
   To solve this problem, I replaced `self._do_render_template_fields` by 
`unmapped_task._do_render_template_fields` where `unmapped_task` is an instance 
of the operator where we override the method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ephraimbuddy opened a new pull request, #29450: Rename `db export-cleaned` to `db export-archived`

2023-02-09 Thread via GitHub


ephraimbuddy opened a new pull request, #29450:
URL: https://github.com/apache/airflow/pull/29450

   This is more appropriate because what is exported are the contents of the 
archived tables. Even though the contents are the cleaned data, they are still 
archived and we are 'exporting' from the archived tables. 
   
   This also aligns with the `--drop-archives` option and `db drop-archived` 
command in terms of naming.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] o-nikolas commented on issue #29422: Multiple AWS connections support in DynamoDBToS3Operator

2023-02-09 Thread via GitHub


o-nikolas commented on issue #29422:
URL: https://github.com/apache/airflow/issues/29422#issuecomment-1424875593

   @eladkal Are you thinking of something like a generic "source" and 
"destination" connection that would be implemented in a base AWS transfer 
operator that all concrete transfer operators would use? Would both those 
connections be based on the aws connection? I suppose we could use extras to 
plumb anything service specific through. It's an interesting thought.
   
   What do you think @Taragolis?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] patricker commented on pull request #26786: Fix parsing of optional `mode` field in BigQuery Result Schema

2023-02-09 Thread via GitHub


patricker commented on PR #26786:
URL: https://github.com/apache/airflow/pull/26786#issuecomment-1424873696

   @kevingoss2 All I can think is that you somehow aren't actually using Google 
Providers 8.8.0? Looking at the code for the 8.80 Tag, the line numbers/code in 
the error don't line up with that version at all: 
https://github.com/apache/airflow/blob/providers-google/8.8.0/airflow/providers/google/cloud/hooks/bigquery.py#L2992


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] bmoon4 commented on issue #22067: DAG Import Errors not show in webUI if DAG is first time upload to airflow is not currect

2023-02-09 Thread via GitHub


bmoon4 commented on issue #22067:
URL: https://github.com/apache/airflow/issues/22067#issuecomment-1424863269

   Hi, im seeing same issue in airflow 2.5.1.. If a new DAG has import errors, 
error message does not appear in GUI. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[airflow] branch main updated: Add `airflow db drop-archived` command (#29309)

2023-02-09 Thread ephraimanierobi
This is an automated email from the ASF dual-hosted git repository.

ephraimanierobi pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
 new a76e0fe16e Add `airflow db drop-archived` command (#29309)
a76e0fe16e is described below

commit a76e0fe16ef12749c3fea1b68d82936b238fafbb
Author: Ephraim Anierobi 
AuthorDate: Thu Feb 9 22:26:29 2023 +0100

Add `airflow db drop-archived` command (#29309)

* Add `airflow db drop-archived` command

This command drops the archive tables directly

As part of this, the _confirm_drop_archives function was made more 
interactive

* fixup! Add `airflow db drop-archived` command

* Fix test and add doc

* Apply suggestions from code review

Co-authored-by: Jed Cunningham 
<66968678+jedcunning...@users.noreply.github.com>
---
 airflow/cli/cli_parser.py   |   6 ++
 airflow/cli/commands/db_command.py  |  11 +++-
 airflow/utils/db_cleanup.py |  58 ++
 docs/apache-airflow/howto/usage-cli.rst |  10 
 tests/cli/commands/test_db_command.py   |  28 +
 tests/utils/test_db_cleanup.py  | 100 +---
 6 files changed, 191 insertions(+), 22 deletions(-)

diff --git a/airflow/cli/cli_parser.py b/airflow/cli/cli_parser.py
index 9a82d2f06e..bf2e98d0f3 100644
--- a/airflow/cli/cli_parser.py
+++ b/airflow/cli/cli_parser.py
@@ -1626,6 +1626,12 @@ DB_COMMANDS = (
 ARG_DB_TABLES,
 ),
 ),
+ActionCommand(
+name="drop-archived",
+help="Drop archived tables created through the db clean command",
+
func=lazy_load_command("airflow.cli.commands.db_command.drop_archived"),
+args=(ARG_DB_TABLES, ARG_YES),
+),
 )
 CONNECTIONS_COMMANDS = (
 ActionCommand(
diff --git a/airflow/cli/commands/db_command.py 
b/airflow/cli/commands/db_command.py
index 468aa3d87f..72ee55c86d 100644
--- a/airflow/cli/commands/db_command.py
+++ b/airflow/cli/commands/db_command.py
@@ -27,7 +27,7 @@ from airflow import settings
 from airflow.exceptions import AirflowException
 from airflow.utils import cli as cli_utils, db
 from airflow.utils.db import REVISION_HEADS_MAP
-from airflow.utils.db_cleanup import config_dict, export_cleaned_records, 
run_cleanup
+from airflow.utils.db_cleanup import config_dict, drop_archived_tables, 
export_cleaned_records, run_cleanup
 from airflow.utils.process_utils import execute_interactive
 
 
@@ -218,3 +218,12 @@ def export_cleaned(args):
 table_names=args.tables,
 drop_archives=args.drop_archives,
 )
+
+
+@cli_utils.action_cli(check_db=False)
+def drop_archived(args):
+"""Drops archived tables from metadata database."""
+drop_archived_tables(
+table_names=args.tables,
+needs_confirm=not args.yes,
+)
diff --git a/airflow/utils/db_cleanup.py b/airflow/utils/db_cleanup.py
index a9d12c7aa3..41b89931f5 100644
--- a/airflow/utils/db_cleanup.py
+++ b/airflow/utils/db_cleanup.py
@@ -39,6 +39,7 @@ from airflow.cli.simple_table import AirflowConsole
 from airflow.models import Base
 from airflow.utils import timezone
 from airflow.utils.db import reflect_tables
+from airflow.utils.helpers import ask_yesno
 from airflow.utils.session import NEW_SESSION, provide_session
 
 logger = logging.getLogger(__file__)
@@ -301,13 +302,21 @@ def _confirm_delete(*, date: DateTime, tables: list[str]):
 
 
 def _confirm_drop_archives(*, tables: list[str]):
+# if length of tables is greater than 3, show the total count
+if len(tables) > 3:
+text_ = f"{len(tables)} archived tables prefixed with 
{ARCHIVE_TABLE_PREFIX}"
+else:
+text_ = f"the following archived tables {tables}"
 question = (
-f"You have requested that we drop archived records for tables 
{tables!r}.\n"
-f"This is irreversible.  Consider backing up the tables first \n"
-f"Enter 'drop archived tables' (without quotes) to proceed."
+f"You have requested that we drop {text_}.\n"
+f"This is irreversible. Consider backing up the tables first \n"
 )
 print(question)
-answer = input().strip()
+if len(tables) > 3:
+show_tables = ask_yesno("Show tables? (y/n): ")
+if show_tables:
+print(tables, "\n")
+answer = input("Enter 'drop archived tables' (without quotes) to 
proceed.\n").strip()
 if not answer == "drop archived tables":
 raise SystemExit("User did not confirm; exiting.")
 
@@ -347,6 +356,19 @@ def _effective_table_names(*, table_names: list[str] | 
None):
 return effective_table_names, effective_config_dict
 
 
+def _get_archived_table_names(table_names, session):
+inspector = inspect(session.bind)
+db_table_names = [x for x in inspector.get_table_names() if 
x.startswith(ARCHIVE_TABLE_PREFIX)]
+effective_table_names, _ = 

[GitHub] [airflow] ephraimbuddy merged pull request #29309: Add `airflow db drop-archived` command

2023-02-09 Thread via GitHub


ephraimbuddy merged PR #29309:
URL: https://github.com/apache/airflow/pull/29309


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[airflow] branch main updated (fdac67b3a5 -> 9e1ff6203a)

2023-02-09 Thread onikolas
This is an automated email from the ASF dual-hosted git repository.

onikolas pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


from fdac67b3a5 Add colors in help outputs of Airfow CLI commands #28789 
(#29116)
 add 9e1ff6203a Use an Amazon Linux 2 image to run EMR job (#29443)

No new revisions were added by this update.

Summary of changes:
 tests/system/providers/amazon/aws/example_emr.py | 12 
 1 file changed, 12 insertions(+)



[GitHub] [airflow] o-nikolas merged pull request #29443: Use an Amazon Linux 2 image to run EMR job

2023-02-09 Thread via GitHub


o-nikolas merged PR #29443:
URL: https://github.com/apache/airflow/pull/29443


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29447: Remove non-public interface usage in EcsRunTaskOperator

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29447:
URL: https://github.com/apache/airflow/pull/29447#discussion_r1102036232


##
tests/providers/amazon/aws/utils/test_identifiers.py:
##
@@ -0,0 +1,74 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import random
+import string
+import uuid
+
+import pytest
+
+from airflow.providers.amazon.aws.utils.identifiers import generate_uuid
+from airflow.utils.types import NOTSET
+
+
+class TestGenerateUuid:
+@pytest.fixture(
+autouse=True,
+params=[
+pytest.param(NOTSET, id="default-namespace"),
+pytest.param(uuid.UUID(int=42), id="custom-namespace"),
+],
+)
+def setup_namespace(self, request):
+self.default_namespace = request.param is NOTSET
+self.namespace = uuid.NAMESPACE_OID if self.default_namespace else 
request.param
+self.kwargs = {"namespace": self.namespace} if not 
self.default_namespace else {}
+
+def test_deterministic(self):
+"""Test that result deterministic and valid UUID object"""
+args = [
+"".join(random.choice(string.ascii_letters) for _ in 
range(random.randint(3, 13)))
+for _ in range(100)
+]
+result = generate_uuid(*args, **self.kwargs)
+assert result == generate_uuid(*args, **self.kwargs)
+assert uuid.UUID(result).version == 5, "Should generate UUID v5"
+
+def test_nil_uuid(self):
+"""Test that result of single None are NIL UUID, regardless 
namespace."""
+assert generate_uuid(None, **self.kwargs) == 
"----"
+
+def test_single_uuid_value(self):
+"""Test that result of single not None value are the same as uuid5."""
+assert generate_uuid("", **self.kwargs) == 
str(uuid.uuid5(self.namespace, ""))
+assert generate_uuid("Airflow", **self.kwargs) == 
str(uuid.uuid5(self.namespace, "Airflow"))
+
+def test_multiple_none_value(self):
+"""Test that result of single None are NIL UUID, regardless 
namespace."""

Review Comment:
   ```suggestion
   """Test that result of single None are NIL UUID, regardless of 
namespace."""
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29447: Remove non-public interface usage in EcsRunTaskOperator

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29447:
URL: https://github.com/apache/airflow/pull/29447#discussion_r1102034396


##
tests/providers/amazon/aws/utils/test_identifiers.py:
##
@@ -0,0 +1,74 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import random
+import string
+import uuid
+
+import pytest
+
+from airflow.providers.amazon.aws.utils.identifiers import generate_uuid
+from airflow.utils.types import NOTSET
+
+
+class TestGenerateUuid:
+@pytest.fixture(
+autouse=True,
+params=[
+pytest.param(NOTSET, id="default-namespace"),
+pytest.param(uuid.UUID(int=42), id="custom-namespace"),
+],
+)
+def setup_namespace(self, request):
+self.default_namespace = request.param is NOTSET
+self.namespace = uuid.NAMESPACE_OID if self.default_namespace else 
request.param
+self.kwargs = {"namespace": self.namespace} if not 
self.default_namespace else {}
+
+def test_deterministic(self):
+"""Test that result deterministic and valid UUID object"""
+args = [
+"".join(random.choice(string.ascii_letters) for _ in 
range(random.randint(3, 13)))
+for _ in range(100)
+]

Review Comment:
   I get what you are doing here and I won't block for this, but I've been 
taught to avoid random values in tests like this and prefer static values.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29447: Remove non-public interface usage in EcsRunTaskOperator

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29447:
URL: https://github.com/apache/airflow/pull/29447#discussion_r1102029523


##
tests/providers/amazon/aws/utils/test_identifiers.py:
##
@@ -0,0 +1,74 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import random
+import string
+import uuid
+
+import pytest
+
+from airflow.providers.amazon.aws.utils.identifiers import generate_uuid
+from airflow.utils.types import NOTSET
+
+
+class TestGenerateUuid:
+@pytest.fixture(
+autouse=True,
+params=[
+pytest.param(NOTSET, id="default-namespace"),
+pytest.param(uuid.UUID(int=42), id="custom-namespace"),
+],
+)
+def setup_namespace(self, request):
+self.default_namespace = request.param is NOTSET
+self.namespace = uuid.NAMESPACE_OID if self.default_namespace else 
request.param
+self.kwargs = {"namespace": self.namespace} if not 
self.default_namespace else {}
+
+def test_deterministic(self):
+"""Test that result deterministic and valid UUID object"""

Review Comment:
   ```suggestion
   """Test that result is deterministic and a valid UUID object"""
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29447: Remove non-public interface usage in EcsRunTaskOperator

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29447:
URL: https://github.com/apache/airflow/pull/29447#discussion_r1102029134


##
tests/providers/amazon/aws/utils/test_identifiers.py:
##
@@ -0,0 +1,74 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import random
+import string
+import uuid
+
+import pytest
+
+from airflow.providers.amazon.aws.utils.identifiers import generate_uuid
+from airflow.utils.types import NOTSET
+
+
+class TestGenerateUuid:
+@pytest.fixture(
+autouse=True,
+params=[
+pytest.param(NOTSET, id="default-namespace"),
+pytest.param(uuid.UUID(int=42), id="custom-namespace"),
+],
+)
+def setup_namespace(self, request):
+self.default_namespace = request.param is NOTSET
+self.namespace = uuid.NAMESPACE_OID if self.default_namespace else 
request.param

Review Comment:
   Nice!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29447: Remove non-public interface usage in EcsRunTaskOperator

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29447:
URL: https://github.com/apache/airflow/pull/29447#discussion_r1102026640


##
tests/providers/amazon/aws/operators/test_ecs.py:
##
@@ -494,39 +494,47 @@ def test_check_success_task_not_raises(self, client_mock):
 ["", {"testTagKey": "testTagValue"}],
 ],
 )
-@mock.patch.object(EcsRunTaskOperator, "_xcom_del")
-@mock.patch.object(
-EcsRunTaskOperator,
-"xcom_pull",
-return_value=f"arn:aws:ecs:us-east-1:012345678910:task/{TASK_ID}",
+@pytest.mark.parametrize(
+"arns, expected_arn",
+[
+pytest.param(
+[
+f"arn:aws:ecs:us-east-1:012345678910:task/{TASK_ID}",
+
"arn:aws:ecs:us-east-1:012345678910:task/d8c67b3c-ac87-4ffe-a847-4785bc3a8b54",
+],
+f"arn:aws:ecs:us-east-1:012345678910:task/{TASK_ID}",
+id="multiple-arns",
+),
+pytest.param(
+[
+f"arn:aws:ecs:us-east-1:012345678910:task/{TASK_ID}",
+],
+f"arn:aws:ecs:us-east-1:012345678910:task/{TASK_ID}",
+id="simgle-arn",
+),
+],
 )
+@mock.patch("airflow.providers.amazon.aws.operators.ecs.generate_uuid")
 @mock.patch.object(EcsRunTaskOperator, "_wait_for_task_ended")
 @mock.patch.object(EcsRunTaskOperator, "_check_success_task")
 @mock.patch.object(EcsRunTaskOperator, "_start_task")
 @mock.patch.object(EcsBaseOperator, "client")
 def test_reattach_successful(
-self,
-client_mock,
-start_mock,
-check_mock,
-wait_mock,
-xcom_pull_mock,
-xcom_del_mock,
-launch_type,
-tags,
+self, client_mock, start_mock, check_mock, wait_mock, uuid_mock, 
launch_type, tags, arns, expected_arn
 ):
+"""Test reattach on first running Task ARN."""
+mock_ti = mock.MagicMock(name="MockedTaskInstance")
+mock_ti.key.primary = ("mock_dag", "mock_ti", "mock_runid", 42)
+dummy_uuid = "01-02-03-04"

Review Comment:
   I don't have a good suggestion here, but I'm pretty sure we are supposed to 
avoid using "dummy".  Maybe just call it "fake_uuid" or "expected_uuid"??



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29447: Remove non-public interface usage in EcsRunTaskOperator

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29447:
URL: https://github.com/apache/airflow/pull/29447#discussion_r1102024417


##
airflow/providers/amazon/aws/utils/identifiers.py:
##
@@ -0,0 +1,51 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from uuid import NAMESPACE_OID, UUID, uuid5
+
+NIL_UUID = UUID(int=0)
+
+
+def generate_uuid(*values: str | None, namespace: UUID = NAMESPACE_OID) -> str:
+"""
+Convert input values to deterministic UUID string representation.
+This function intend to use only for generate hash which use as identifier,
+and not for any security usage.
+
+For each value generate UUID by UUID v5 algorithm (SHA-1 + Namespace),
+this UUID will use as Namespace for the next element.
+
+If only one non-None value provided to the function, then the result of 
function
+would be the same as result of ``uuid.uuid5``.
+
+All ``None`` values replaced by NIL UUID, if it only one passed parameter 
than NIL UUID return.
+
+:param namespace: Initial namespace value which pass into ``uuid.uuid5`` 
function.

Review Comment:
   ```suggestion
   :param namespace: Initial namespace value to pass into the 
``uuid.uuid5`` function.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29447: Remove non-public interface usage in EcsRunTaskOperator

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29447:
URL: https://github.com/apache/airflow/pull/29447#discussion_r1102024091


##
airflow/providers/amazon/aws/utils/identifiers.py:
##
@@ -0,0 +1,51 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from uuid import NAMESPACE_OID, UUID, uuid5
+
+NIL_UUID = UUID(int=0)
+
+
+def generate_uuid(*values: str | None, namespace: UUID = NAMESPACE_OID) -> str:
+"""
+Convert input values to deterministic UUID string representation.
+This function intend to use only for generate hash which use as identifier,
+and not for any security usage.
+
+For each value generate UUID by UUID v5 algorithm (SHA-1 + Namespace),
+this UUID will use as Namespace for the next element.
+
+If only one non-None value provided to the function, then the result of 
function
+would be the same as result of ``uuid.uuid5``.
+
+All ``None`` values replaced by NIL UUID, if it only one passed parameter 
than NIL UUID return.

Review Comment:
   ```suggestion
   All ``None`` values are replaced by NIL UUID.  If it only one value is 
provided then return NIL UUID.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29447: Remove non-public interface usage in EcsRunTaskOperator

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29447:
URL: https://github.com/apache/airflow/pull/29447#discussion_r1102023489


##
airflow/providers/amazon/aws/utils/identifiers.py:
##
@@ -0,0 +1,51 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from uuid import NAMESPACE_OID, UUID, uuid5
+
+NIL_UUID = UUID(int=0)
+
+
+def generate_uuid(*values: str | None, namespace: UUID = NAMESPACE_OID) -> str:
+"""
+Convert input values to deterministic UUID string representation.
+This function intend to use only for generate hash which use as identifier,
+and not for any security usage.
+
+For each value generate UUID by UUID v5 algorithm (SHA-1 + Namespace),
+this UUID will use as Namespace for the next element.
+
+If only one non-None value provided to the function, then the result of 
function
+would be the same as result of ``uuid.uuid5``.

Review Comment:
   ```suggestion
   If only one non-None value is provided to the function, then the result 
of the function
   would be the same as result of ``uuid.uuid5``.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29447: Remove non-public interface usage in EcsRunTaskOperator

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29447:
URL: https://github.com/apache/airflow/pull/29447#discussion_r1102023038


##
airflow/providers/amazon/aws/utils/identifiers.py:
##
@@ -0,0 +1,51 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from uuid import NAMESPACE_OID, UUID, uuid5
+
+NIL_UUID = UUID(int=0)
+
+
+def generate_uuid(*values: str | None, namespace: UUID = NAMESPACE_OID) -> str:
+"""
+Convert input values to deterministic UUID string representation.
+This function intend to use only for generate hash which use as identifier,
+and not for any security usage.
+
+For each value generate UUID by UUID v5 algorithm (SHA-1 + Namespace),
+this UUID will use as Namespace for the next element.

Review Comment:
   ```suggestion
   Generates a UUID v5 (SHA-1 + Namespace) for each value provided, and 
this UUID is used as the Namespace for the next element.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29447: Remove non-public interface usage in EcsRunTaskOperator

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29447:
URL: https://github.com/apache/airflow/pull/29447#discussion_r1102021581


##
airflow/providers/amazon/aws/utils/identifiers.py:
##
@@ -0,0 +1,51 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from uuid import NAMESPACE_OID, UUID, uuid5
+
+NIL_UUID = UUID(int=0)
+
+
+def generate_uuid(*values: str | None, namespace: UUID = NAMESPACE_OID) -> str:
+"""
+Convert input values to deterministic UUID string representation.
+This function intend to use only for generate hash which use as identifier,
+and not for any security usage.

Review Comment:
   ```suggestion
   This function is only intended to generate a hash which used as an 
identifier, not for any security use.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] 1inuxoid commented on issue #29422: Multiple AWS connections support in DynamoDBToS3Operator

2023-02-09 Thread via GitHub


1inuxoid commented on issue #29422:
URL: https://github.com/apache/airflow/issues/29422#issuecomment-1424815215

   @eladkal, I can see that `RedshiftToS3Operator` and `S3ToRedshiftOperator` 
already implement a similar pattern using `redshift_conn_id` connection 
argument.
   I don't see any other operators in this "transfer family".
   
   Also, I think the semantics of two AWS connections in case of such 
`AwsToAwsBaseTransferOperator` would be tricky and not straightforward at all. 
How would you even name them if they are not service-specific? 
   
   I'll probably continue with a change to `DynamoDBToS3Operator` specifically 
for now, however, I'm open to suggestions on how to address this semantical 
challenge. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] LMnet commented on a diff in pull request #29434: Impovements for RedshiftDataOperator

2023-02-09 Thread via GitHub


LMnet commented on code in PR #29434:
URL: https://github.com/apache/airflow/pull/29434#discussion_r1101995475


##
airflow/providers/amazon/aws/operators/redshift_data.py:
##
@@ -47,6 +48,8 @@ class RedshiftDataOperator(BaseOperator):
 :param with_event: indicates whether to send an event to EventBridge
 :param await_result: indicates whether to wait for a result, if True wait, 
if False don't wait
 :param poll_interval: how often in seconds to check the query status
+:param return_sql_result: if True will return the result of an SQL 
statement,
+if False will return statement ID

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1101989631


##
.pre-commit-config.yaml:
##
@@ -711,6 +711,7 @@ repos:
 language: python
 pass_filenames: true
 files: ^scripts/ci/docker-compose/.+\.ya?ml$|docker-compose\.ya?ml$
+exclude: 
^scripts/ci/docker-compose/grafana/.|^scripts/ci/docker-compose/prometheus.yml

Review Comment:
   I spent a little while thinking about trying to use something equivalent to 
a .gitignore for this, but in the end I decided it's only two entries.  If we 
start adding more then we might want to consider a more scalable solution.  Or 
not.  Who knows.  But we're good with making that a problem for Future Us for 
now?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1101986913


##
scripts/ci/docker-compose/integration-statsd.yml:
##
@@ -19,15 +19,36 @@ version: "3.7"
 services:
   statsd-exporter:
 image: apache/airflow:airflow-statsd-exporter-2020.09.05-v0.17.0
+container_name: "breeze-statsd-exporter"
 ports:
   - "9125:9125"
   - "9125:9125/udp"
   - "29102:9102"
 
+  prometheus:
+image: prom/prometheus
+container_name: "breeze-prometheus"
+user: "0"
+ports:
+  - "29090:9090"

Review Comment:
   Sure.  That sounds like as good of a place as any, and better than most.   
I'll make the addition..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] kevingoss2 commented on pull request #26786: Fix parsing of optional `mode` field in BigQuery Result Schema

2023-02-09 Thread via GitHub


kevingoss2 commented on PR #26786:
URL: https://github.com/apache/airflow/pull/26786#issuecomment-1424766355

   I am still having this issue with the latest Airflow (2.5.1) and Google 
providers (8.6.0).
 File 
"/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/providers/common/sql/hooks/sql.py",
 line 349, in run
   self._run_command(cur, sql_statement, parameters)
 File 
"/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/providers/common/sql/hooks/sql.py",
 line 380, in _run_command
   cur.execute(sql_statement)
 File 
"/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/providers/google/cloud/hooks/bigquery.py",
 line 2697, in execute
   description = _format_schema_for_description(query_results["schema"])
 File 
"/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/providers/google/cloud/hooks/bigquery.py",
 line 3006, in _format_schema_for_description
   field["mode"] == "NULLABLE",
   KeyError: 'mode'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


potiuk commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1101983456


##
scripts/ci/docker-compose/integration-statsd.yml:
##
@@ -19,15 +19,36 @@ version: "3.7"
 services:
   statsd-exporter:
 image: apache/airflow:airflow-statsd-exporter-2020.09.05-v0.17.0
+container_name: "breeze-statsd-exporter"
 ports:
   - "9125:9125"
   - "9125:9125/udp"
   - "29102:9102"
 
+  prometheus:
+image: prom/prometheus
+container_name: "breeze-prometheus"
+user: "0"
+ports:
+  - "29090:9090"

Review Comment:
   I thought for a while that what you **might** add is short "Statsd testing" 
in TESTING.rst. We explain different kinds of tests we do and looking at statsd 
with tests is a kind of tests. So maybe adding a chapter on how to test metrics 
with Breeze there?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


potiuk commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1101981316


##
.pre-commit-config.yaml:
##
@@ -711,6 +711,7 @@ repos:
 language: python
 pass_filenames: true
 files: ^scripts/ci/docker-compose/.+\.ya?ml$|docker-compose\.ya?ml$
+exclude: 
^scripts/ci/docker-compose/grafana/.|^scripts/ci/docker-compose/prometheus.yml

Review Comment:
   This is fine. We do not expect all files (especially kinda "external" to 
follow our liniting rules. We exclude a number of things.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1101976661


##
scripts/ci/docker-compose/integration-statsd.yml:
##
@@ -19,15 +19,36 @@ version: "3.7"
 services:
   statsd-exporter:
 image: apache/airflow:airflow-statsd-exporter-2020.09.05-v0.17.0
+container_name: "breeze-statsd-exporter"
 ports:
   - "9125:9125"
   - "9125:9125/udp"
   - "29102:9102"
 
+  prometheus:
+image: prom/prometheus
+container_name: "breeze-prometheus"
+user: "0"
+ports:
+  - "29090:9090"

Review Comment:
   I didn't see a way to make these new ports or URLs obvious like the base 
ones are in the Breeze opening scroll.   maybe I should add a log or print 
somewhere so a user can find these links without digging in the code?  Thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1101975131


##
scripts/ci/docker-compose/integration-statsd.yml:
##
@@ -19,15 +19,36 @@ version: "3.7"
 services:
   statsd-exporter:
 image: apache/airflow:airflow-statsd-exporter-2020.09.05-v0.17.0
+container_name: "breeze-statsd-exporter"

Review Comment:
   Naming the containers here breaks the existing breeze container naming 
convention which appears to be "let docker deal with it".  My Docker experience 
is pretty limited still, but AFAICT I needed a static docker container name 
here so i could reference it in the other files like 
[here](https://github.com/apache/airflow/pull/29449/files#diff-f03546a300946bd296ba407833ac290ee605c8ce586e484809513fe667c058f3R23).
   
   As always, if there is a better way then please let me know.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1101973229


##
.pre-commit-config.yaml:
##
@@ -711,6 +711,7 @@ repos:
 language: python
 pass_filenames: true
 files: ^scripts/ci/docker-compose/.+\.ya?ml$|docker-compose\.ya?ml$
+exclude: 
^scripts/ci/docker-compose/grafana/.|^scripts/ci/docker-compose/prometheus.yml

Review Comment:
   I'm not thrilled with this solution but did not see a way to add an `ignore` 
file or tag.  If anyone knows a better way, I'm open to suggestions here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


potiuk commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1101972247


##
images/breeze/output-commands-hash.txt:
##
@@ -1,7 +1,7 @@
 # This file is automatically generated by pre-commit. If you have a conflict 
with this file

Review Comment:
   Yep.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


potiuk commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1101971591


##
dev/breeze/src/airflow_breeze/global_constants.py:
##
@@ -44,16 +44,19 @@
 DEFAULT_BACKEND = ALLOWED_BACKENDS[0]
 ALL_INTEGRATIONS = [
 "cassandra",
+"celery",
 "kerberos",
 "mongo",
 "pinot",
-"celery",
 "trino",
 ]
-ALLOWED_INTEGRATIONS = [
-*ALL_INTEGRATIONS,
-"all",
-]
+ALLOWED_INTEGRATIONS = sorted(

Review Comment:
   Good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1101971428


##
images/breeze/output-commands-hash.txt:
##
@@ -1,7 +1,7 @@
 # This file is automatically generated by pre-commit. If you have a conflict 
with this file

Review Comment:
   All changes in this file are generated through `breeze setup 
regenerate-command-images`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1101970644


##
dev/breeze/src/airflow_breeze/global_constants.py:
##
@@ -44,16 +44,19 @@
 DEFAULT_BACKEND = ALLOWED_BACKENDS[0]
 ALL_INTEGRATIONS = [
 "cassandra",
+"celery",
 "kerberos",
 "mongo",
 "pinot",
-"celery",
 "trino",
 ]
-ALLOWED_INTEGRATIONS = [
-*ALL_INTEGRATIONS,
-"all",
-]
+ALLOWED_INTEGRATIONS = sorted(

Review Comment:
   Using `sorted` here so the output on the CLI help screen will be sorted.  
Not necessarily required, but I liked it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


potiuk commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1101970479


##
dev/breeze/src/airflow_breeze/global_constants.py:
##
@@ -44,16 +44,19 @@
 DEFAULT_BACKEND = ALLOWED_BACKENDS[0]
 ALL_INTEGRATIONS = [
 "cassandra",
+"celery",

Review Comment:
   Good. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a diff in pull request #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


ferruzzi commented on code in PR #29449:
URL: https://github.com/apache/airflow/pull/29449#discussion_r1101969743


##
dev/breeze/src/airflow_breeze/global_constants.py:
##
@@ -44,16 +44,19 @@
 DEFAULT_BACKEND = ALLOWED_BACKENDS[0]
 ALL_INTEGRATIONS = [
 "cassandra",
+"celery",

Review Comment:
   Just moved to alphabetize the list



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ferruzzi opened a new pull request, #29449: Breeze StatsD Integration

2023-02-09 Thread via GitHub


ferruzzi opened a new pull request, #29449:
URL: https://github.com/apache/airflow/pull/29449

   Running `breeze start-airflow --integration statsd` will launch Breeze along 
with docker containers for statsd, Prometheus, and Grafana.  This allows us to 
test [Airflow's statsd 
metrics](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/metrics.html)
 emitting.  
   
   - Includes config files which are mounted into the Prometheus and Grafana 
containers in order to configure the ports and connections between the 
services/containers, and populates a default dashboard in Grafana.
   - Updates the CLI help menus 
   - Grafana and Prometheus config files do not match our chosen JSON schema, 
so they are added as exemptions in the json linter
   
   
   I've left a few notes and questions in the review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] michaelmicheal commented on pull request #29407: Ensure Serialized DAG is deleted

2023-02-09 Thread via GitHub


michaelmicheal commented on PR #29407:
URL: https://github.com/apache/airflow/pull/29407#issuecomment-1424742240

   > Can you rebase? would like it to be green before merge
   
   I've already rebased, is there a way to rerun CI?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #25886: Limit Google Protobuf for compatibility with biggtable client

2023-02-09 Thread via GitHub


potiuk commented on PR #25886:
URL: https://github.com/apache/airflow/pull/25886#issuecomment-1424729082

   > I'm happy to submit a PR (especially an easy one like this). before I do 
that, i was wondering if there was any sense of what the issue was that lead to 
this.
   
   It did not install back then and broke our tests. The only way to check if 
it is fixed now is to make a PR and run the tests by doing it. We have 650 
dependencies in Airflow. No-one follows every single relase of every one of 
them - so no-one knows if the problem is fixed.
   
The only way to see if the problem is fixed is tro try ut.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] kristopherkane commented on pull request #29136: Dataproc batches

2023-02-09 Thread via GitHub


kristopherkane commented on PR #29136:
URL: https://github.com/apache/airflow/pull/29136#issuecomment-1424722569

   Am I right to assume that if this gets merged then the default behavior is 
to squash onto the first PR commit in GH?  Or shall I squash and push now? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] kristopherkane commented on pull request #29136: Dataproc batches

2023-02-09 Thread via GitHub


kristopherkane commented on PR #29136:
URL: https://github.com/apache/airflow/pull/29136#issuecomment-1424721127

   I'm not sure what fixed it, but somewhere in reinstating virtualenvs, 
recreating breeze containers, numerous rebasing (of which I was already doing) 
fixed the providers yaml checks locally.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] mdering commented on pull request #25886: Limit Google Protobuf for compatibility with biggtable client

2023-02-09 Thread via GitHub


mdering commented on PR #25886:
URL: https://github.com/apache/airflow/pull/25886#issuecomment-1424711300

   I'm happy to submit a PR (especially an easy one like this). before I do 
that, i was wondering if there was any sense of what the issue was that lead to 
this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] Taragolis opened a new pull request, #29447: Remove non-public interface usage in EcsRunTaskOperator

2023-02-09 Thread via GitHub


Taragolis opened a new pull request, #29447:
URL: https://github.com/apache/airflow/pull/29447

   Right now `EcsRunTaskOperator` when required "reattach" use hacks:
   
   - save info to XCom which not valid because it not use a public interface
   - In additional it can't work with Dynamic Task Mappings (all xcom name 
contain only task_id info and miss other part of unique TI key)
   
   By this PR current mechanism replaced by builtin 
[ECS.Client.run_task](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.run_task)
 ability to setup startedBy and filter it later.
   
   `startedBy` limited by 36 characters an it is not possible to use string 
representation of `dag_id` + `task_id` + `run_id` + `map_id`, instead of this 
generate UUID based on this value which can be used as unique (per TI) value.
   
   Unfortunetly right now `EcsRunTaskOperator` set `startedBy` by owner of 
task, so it become mutuality exclusive:
   1. If reattach set to True than `startedBy` set as unique TI (UUID)
   2. If reattach set to False (default) than `startedBy` set as task.owner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on pull request #29407: Ensure Serialized DAG is deleted

2023-02-09 Thread via GitHub


ephraimbuddy commented on PR #29407:
URL: https://github.com/apache/airflow/pull/29407#issuecomment-1424706143

   Can you rebase? would like it to be green before merge


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] michaelmicheal commented on a diff in pull request #29446: Scheduler, make stale DAG deactivation threshold configurable instead of using dag processing timeout

2023-02-09 Thread via GitHub


michaelmicheal commented on code in PR #29446:
URL: https://github.com/apache/airflow/pull/29446#discussion_r1101929995


##
airflow/config_templates/default_airflow.cfg:
##
@@ -1045,8 +1045,16 @@ min_file_process_interval = 30
 # referenced and should be marked as orphaned.
 parsing_cleanup_interval = 60
 
+# How long (in seconds) to wait after we've reparsed a DAG file before 
deactivating stale
+# DAGs (DAGs which are no longer present in the expected files). The reason 
why we need
+# this threshold is to account for the time between when the file is parsed 
and when the
+# DAG is loaded. The absolute maximum that this could take is 
`dag_file_processor_timeout`,
+# but when you have a long timeout configured, it results in a significant 
delay in the
+# deactivation of stale dags.
+stale_dag_threshold = 30
+
 # How often (in seconds) to scan the DAGs directory for new files. Default to 
5 minutes.
-dag_dir_list_interval = 300
+dag_dir_list_interval = 10

Review Comment:
   Sorry, accident



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] michaelmicheal commented on pull request #29407: Ensure Serialized DAG is deleted

2023-02-09 Thread via GitHub


michaelmicheal commented on PR #29407:
URL: https://github.com/apache/airflow/pull/29407#issuecomment-1424699959

   @eladkal @bbovenzi is this good to merged?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #29446: Scheduler, make stale DAG deactivation threshold configurable instead of using dag processing timeout

2023-02-09 Thread via GitHub


ephraimbuddy commented on code in PR #29446:
URL: https://github.com/apache/airflow/pull/29446#discussion_r1101909468


##
airflow/config_templates/default_airflow.cfg:
##
@@ -1045,8 +1045,16 @@ min_file_process_interval = 30
 # referenced and should be marked as orphaned.
 parsing_cleanup_interval = 60
 
+# How long (in seconds) to wait after we've reparsed a DAG file before 
deactivating stale
+# DAGs (DAGs which are no longer present in the expected files). The reason 
why we need
+# this threshold is to account for the time between when the file is parsed 
and when the
+# DAG is loaded. The absolute maximum that this could take is 
`dag_file_processor_timeout`,
+# but when you have a long timeout configured, it results in a significant 
delay in the
+# deactivation of stale dags.
+stale_dag_threshold = 30
+
 # How often (in seconds) to scan the DAGs directory for new files. Default to 
5 minutes.
-dag_dir_list_interval = 300
+dag_dir_list_interval = 10

Review Comment:
   Why this change?



##
airflow/dag_processing/manager.py:
##
@@ -433,6 +433,8 @@ def __init__(
 self.last_deactivate_stale_dags_time = 
timezone.make_aware(datetime.fromtimestamp(0))
 # How often to check for DAGs which are no longer in files
 self.parsing_cleanup_interval = conf.getint("scheduler", 
"parsing_cleanup_interval")
+# How long to wait for a DAG to be reparsed after it's file has been 
parsed before disabling

Review Comment:
   ```suggestion
   # How long to wait for a DAG to be reparsed after its file has been 
parsed before disabling
   ```



##
airflow/config_templates/config.yml:
##
@@ -2049,6 +2049,18 @@ scheduler:
   type: integer
   example: ~
   default: "60"
+stale_dag_threshold:
+  description: |
+How long (in seconds) to wait after we've reparsed a DAG file before 
deactivating stale
+DAGs (DAGs which are no longer present in the expected files). The 
reason why we need
+this threshold is to account for the time between when the file is 
parsed and when the
+DAG is loaded. The absolute maximum that this could take is 
`dag_file_processor_timeout`,
+but when you have a long timeout configured, it results in a 
significant delay in the
+deactivation of stale dags.
+  version_added: 2.6.0
+  type: integer
+  example: ~
+  default: "30"

Review Comment:
   Should we keep this the same default as `dag_file_processor_timeout`? That 
way, there will be no change in behaviour



##
airflow/dag_processing/manager.py:
##
@@ -523,13 +525,16 @@ def deactivate_stale_dags(
 query = query.filter(DagModel.processor_subdir == dag_directory)
 dags_parsed = query.all()
 
+
+
 for dag in dags_parsed:
 # The largest valid difference between a DagFileStat's 
last_finished_time and a DAG's
-# last_parsed_time is _processor_timeout. Longer than that 
indicates that the DAG is
-# no longer present in the file.
+# last_parsed_time is thg processor_timeout. Longer than that 
indicates that the DAG is

Review Comment:
   ```suggestion
   # last_parsed_time is the processor_timeout. Longer than that 
indicates that the DAG is
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] vincbeck commented on a diff in pull request #29434: Impovements for RedshiftDataOperator

2023-02-09 Thread via GitHub


vincbeck commented on code in PR #29434:
URL: https://github.com/apache/airflow/pull/29434#discussion_r1101915933


##
airflow/providers/amazon/aws/operators/redshift_data.py:
##
@@ -47,6 +48,8 @@ class RedshiftDataOperator(BaseOperator):
 :param with_event: indicates whether to send an event to EventBridge
 :param await_result: indicates whether to wait for a result, if True wait, 
if False don't wait
 :param poll_interval: how often in seconds to check the query status
+:param return_sql_result: if True will return the result of an SQL 
statement,
+if False will return statement ID

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #25886: Limit Google Protobuf for compatibility with biggtable client

2023-02-09 Thread via GitHub


potiuk commented on PR #25886:
URL: https://github.com/apache/airflow/pull/25886#issuecomment-1424678260

   PRs are most welcome :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #25886: Limit Google Protobuf for compatibility with biggtable client

2023-02-09 Thread via GitHub


potiuk commented on PR #25886:
URL: https://github.com/apache/airflow/pull/25886#issuecomment-1424677928

   > since there isn't much here, i was wondering if this requirement is still 
necessary or if it could be relaxed
   
   I think you will never know until you try


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] vchiapaikeo commented on issue #29424: Status of testing Providers that were prepared on February 08, 2023

2023-02-09 Thread via GitHub


vchiapaikeo commented on issue #29424:
URL: https://github.com/apache/airflow/issues/29424#issuecomment-1424673748

   https://github.com/apache/airflow/pull/28959 looks good
   
   `write_on_empty=False (default)`
   https://user-images.githubusercontent.com/9200263/217911454-ce12ad5d-c11c-43a5-9f9b-b987f61d5706.png;>
   
   `write_on_empty=True`
   https://user-images.githubusercontent.com/9200263/217911094-43aac6ac-01d9-4b79-ab97-0b0fef583f0e.png;>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] michaelmicheal opened a new pull request, #29446: Scheduler, make stale DAG deactivation threshold configurable instead of using dag processing timeout

2023-02-09 Thread via GitHub


michaelmicheal opened a new pull request, #29446:
URL: https://github.com/apache/airflow/pull/29446

   When we deactivate stale DAGs, we add a buffer to the last parsed time of a 
DAG when comparing it to the last parsed time of its DAG file to account for 
DAG files that take a long time to process. 
   ```python
   
   for dag in dags_parsed:
   # The largest valid difference between a DagFileStat's 
last_finished_time and a DAG's
   # last_parsed_time is _processor_timeout. Longer than that 
indicates that the DAG is
   # no longer present in the file.
   if (
   dag.fileloc in last_parsed
   and (dag.last_parsed_time + processor_timeout) < 
last_parsed[dag.fileloc]
   ):
   cls.logger().info("DAG %s is missing and will be 
deactivated.", dag.dag_id)
   to_deactivate.add(dag.dag_id)
   
   ```
   The upper bound on this delay is indeed the processor timeout, but 
practically speaking, this is unnecessary when we configure a long timeout. 
This can result in a large delay in stale DAG deactivation. I think this buffer 
should be configurable.
   
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{issue_number}.significant.rst`, in 
[newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] vincbeck commented on issue #28784: AIP-44 Migrate DagFileProcessorManager._fetch_callbacks to Internal API

2023-02-09 Thread via GitHub


vincbeck commented on issue #28784:
URL: https://github.com/apache/airflow/issues/28784#issuecomment-1424649397

   Yeah I think I will include it in #28900


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #29445: Better handle forward type definitions in `@task.python` for multiple output

2023-02-09 Thread via GitHub


ashb commented on PR #29445:
URL: https://github.com/apache/airflow/pull/29445#issuecomment-1424624253

   What do you think @josh-fell ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ashb opened a new pull request, #29445: Better handle forward type definitions in `@task.python` for multiple output

2023-02-09 Thread via GitHub


ashb opened a new pull request, #29445:
URL: https://github.com/apache/airflow/pull/29445

   This does two things:
   
   1) it only looks at the return type annotation, not everything; and
   2) it catches and tests for an invalid/TYPE_CHECKING-only type import
   
   Closes #29435
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] victor-perchwell opened a new issue, #29444: Attempting to Upgrade Chart to Enable Examples Causes Postgres Error

2023-02-09 Thread via GitHub


victor-perchwell opened a new issue, #29444:
URL: https://github.com/apache/airflow/issues/29444

   ### Official Helm Chart version
   
   1.8.0 (latest released)
   
   ### Apache Airflow version
   
   2.5.1
   
   ### Kubernetes Version
   
   v1.26.1
   
   ### Helm Chart configuration
   
   I'm using the official helm chart and modifying the values.yaml file to 
enable examples by uncommenting:
   
![image](https://user-images.githubusercontent.com/105374838/217901076-bf0ff09b-aa14-48b9-ad8c-85ee70ea953e.png)
   
   ### Docker Image customizations
   
   _No response_
   
   ### What happened
   
   When I attempt to upgrade the release to use the modified values.yaml file, 
I get the following error:
   
![image](https://user-images.githubusercontent.com/105374838/217901295-6317e3fa-8726-48b9-80a4-8ef0f3ec87db.png)
   
   I'm extracting the values.yaml to modify via:
   ```
   helm-show-values apache-airflow/airflow > values.yaml
   ```
   
   ### What you think should happen instead
   
   _No response_
   
   ### How to reproduce
   
   Deploy the official Airflow Helm chart to minikube, attempt to modify 
values.yaml to turn examples on, and upgrade the release with the new 
values.yaml file.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] csm10495 commented on issue #29424: Status of testing Providers that were prepared on February 08, 2023

2023-02-09 Thread via GitHub


csm10495 commented on issue #29424:
URL: https://github.com/apache/airflow/issues/29424#issuecomment-1424609217

   #28808 still seems legit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] vincbeck commented on a diff in pull request #29409: Fix Rest API update user output

2023-02-09 Thread via GitHub


vincbeck commented on code in PR #29409:
URL: https://github.com/apache/airflow/pull/29409#discussion_r1101825476


##
airflow/api_connexion/openapi/v1.yaml:
##
@@ -2288,7 +2288,7 @@ paths:
   content:
 application/json:
   schema:
-$ref: '#/components/schemas/Role'
+$ref: '#/components/schemas/User'

Review Comment:
   Done, thanks for the suggestion. Regarding the tests, it seems there are 
already some tests 
[here](https://github.com/apache/airflow/blob/main/tests/api_connexion/endpoints/test_user_endpoint.py#L529)
 but regardless of the schema referenced in `v1.yaml`, it always return the 
same schema of data (the correct one). Any idea why?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] mdering commented on pull request #25886: Limit Google Protobuf for compatibility with biggtable client

2023-02-09 Thread via GitHub


mdering commented on PR #25886:
URL: https://github.com/apache/airflow/pull/25886#issuecomment-1424561076

   since there isn't much here, i was wondering if this requirement is still 
necessary or if it could be relaxed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] vincbeck commented on a diff in pull request #29443: Use an Amazon Linux 2 image to run EMR job

2023-02-09 Thread via GitHub


vincbeck commented on code in PR #29443:
URL: https://github.com/apache/airflow/pull/29443#discussion_r1101801403


##
tests/system/providers/amazon/aws/example_emr.py:
##
@@ -90,6 +90,16 @@
 # [END howto_operator_emr_steps_config]
 
 
+@task
+def get_ami_id():
+"""
+Returns an AL2 AMI compatible with EMR
+"""
+return boto3.client("ssm").get_parameter(
+Name="/aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-ebs",
+)["Parameter"]["Value"]

Review Comment:
   I like that. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] o-nikolas commented on issue #28680: Improve AWS Batch hook and operator

2023-02-09 Thread via GitHub


o-nikolas commented on issue #28680:
URL: https://github.com/apache/airflow/issues/28680#issuecomment-1424546899

   No worries @gmnmedeiros, thanks for letting us know!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] johannaojeling commented on issue #29424: Status of testing Providers that were prepared on February 08, 2023

2023-02-09 Thread via GitHub


johannaojeling commented on issue #29424:
URL: https://github.com/apache/airflow/issues/29424#issuecomment-1424529809

   #28764 is tested with the Beam RC and works as expected


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] bbovenzi commented on a diff in pull request #29441: datasets, next_run_datasets, remove unnecessary timestamp filter

2023-02-09 Thread via GitHub


bbovenzi commented on code in PR #29441:
URL: https://github.com/apache/airflow/pull/29441#discussion_r1101776764


##
airflow/www/views.py:
##
@@ -3715,7 +3715,6 @@ def next_run_datasets(self, dag_id):
 DatasetEvent,
 and_(
 DatasetEvent.dataset_id == DatasetModel.id,
-DatasetEvent.timestamp > DatasetDagRunQueue.created_at,

Review Comment:
   @blag Does this look right to you? re: 
https://github.com/apache/airflow/pull/26356



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] Taragolis commented on a diff in pull request #29443: Use an Amazon Linux 2 image to run EMR job

2023-02-09 Thread via GitHub


Taragolis commented on code in PR #29443:
URL: https://github.com/apache/airflow/pull/29443#discussion_r1101768568


##
tests/system/providers/amazon/aws/example_emr.py:
##
@@ -90,6 +90,16 @@
 # [END howto_operator_emr_steps_config]
 
 
+@task
+def get_ami_id():
+"""
+Returns an AL2 AMI compatible with EMR
+"""
+return boto3.client("ssm").get_parameter(
+Name="/aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-ebs",
+)["Parameter"]["Value"]

Review Comment:
   ```suggestion
   return SsmHook(aws_conn_id=None).get_parameter_value(
   "/aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-ebs"
   )
   ```
   
   Just an idea for use SsmHook here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] Taragolis commented on a diff in pull request #29443: Use an Amazon Linux 2 image to run EMR job

2023-02-09 Thread via GitHub


Taragolis commented on code in PR #29443:
URL: https://github.com/apache/airflow/pull/29443#discussion_r1101768568


##
tests/system/providers/amazon/aws/example_emr.py:
##
@@ -90,6 +90,16 @@
 # [END howto_operator_emr_steps_config]
 
 
+@task
+def get_ami_id():
+"""
+Returns an AL2 AMI compatible with EMR
+"""
+return boto3.client("ssm").get_parameter(
+Name="/aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-ebs",
+)["Parameter"]["Value"]

Review Comment:
   ```suggestion
   return SsmHook(aws_conn_id=None).get_parameter_value(
   "/aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-ebs"
   )
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] vincbeck opened a new pull request, #29443: Use an Amazon Linux 2 image to run EMR job

2023-02-09 Thread via GitHub


vincbeck opened a new pull request, #29443:
URL: https://github.com/apache/airflow/pull/29443

   For internal operational reasons, it is better to use Amazon Linux 2 images 
on EC2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >