jason810496 commented on issue #49863: URL: https://github.com/apache/airflow/issues/49863#issuecomment-2954807262
Related to #48491. TL;DR; IMO, we still need to add `RemoteIO` for Elasticsearch and Opensearch, which is the most simple way to fix this error, or we need to have quite change to scheduler, workload, supervisor to support `ElasticsearchTaskHandler`. Also we should handle the log template for ES when constructing workload. https://github.com/apache/airflow/blob/9ca2af40d4824c3e7df8ca193ec657331bf142e6/airflow-core/src/airflow/executors/workloads.py#L103-L133 --- I tested with other RemoteIO ( `GCSRemoteLogIO` ) and the write path and read path both work fine. More accurately, in Airflow 3, even we set remote logging with GCS as example, we are actually using `FileTaskHandler + GCSRemoteLogIO` instead of `GCSTaskHandler` However, when setting remote logging with Elasticsearch, we are really using `ElasticsearchTaskHandler` --- Here is the overall write path for remote logging in Airflow 3: 1. Try upload the remote logs in supervisor https://github.com/apache/airflow/blob/9ca2af40d4824c3e7df8ca193ec657331bf142e6/task-sdk/src/airflow/sdk/execution_time/supervisor.py#L892-L894 2. Use `RemoteIO` to upload logs https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/log.py#L545-L560 --- For `ElasticsearchTaskHandler`: 1. On logger `close`, we need to ensure the `self.handler` is not None https://github.com/apache/airflow/blob/9ca2af40d4824c3e7df8ca193ec657331bf142e6/providers/elasticsearch/src/airflow/providers/elasticsearch/log/es_task_handler.py#L512-L515 2. Which means `set_context` needed to be called ( but it is never called in Airflow 3 ) https://github.com/apache/airflow/blob/9ca2af40d4824c3e7df8ca193ec657331bf142e6/providers/elasticsearch/src/airflow/providers/elasticsearch/log/es_task_handler.py#L486-L497 3. But `set_content` need the _real_ TaskInstance ( required by `_render_log_id` helper, only ES need to fetch LogTemplate model ) https://github.com/apache/airflow/blob/9ca2af40d4824c3e7df8ca193ec657331bf142e6/providers/elasticsearch/src/airflow/providers/elasticsearch/log/es_task_handler.py#L457-L497 Based on 3. , it is impossible to be called in Airflow 3. Since we have `prohibit_commit` in the start of scheduler job runner for preventing any directly access to database, will raise `UNEXPECTED COMMIT - THIS WILL BREAK HA LOCKS!` in `ElasticsearchTaskHandler` if we call `set_content` explictly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
