jvstein opened a new pull request, #67144:
URL: https://github.com/apache/airflow/pull/67144
<!-- SPDX-License-Identifier: Apache-2.0
https://www.apache.org/licenses/LICENSE-2.0 -->
In our Kubernetes based celery executor, we ran into a runaway memory issue
with a sensor that used
`mode="reschedule"` and kept scheduling to the same worker repeatedly. In
this environment we have
a dedicated worker set devoted to sensors and the task was getting
rescheduled to the same worker
every time the poke was executed. As such, the local log file existed and
was getting appended to and
then the S3 log file was also getting appended to each time.
Over time, this caused a large memory spike as the supervisor process loaded
the logs from S3, attempted
to append a copy of the logs again, upload the result, and then repeat. The
memory usage eventually crashed
the worker due to OOM.
<!--
Thank you for contributing!
Please provide above a brief description of the changes made in this pull
request.
Write a good git commit message following this guide:
http://chris.beams.io/posts/git-commit/
Please make sure that your code changes are covered with tests.
And in case of new features or big changes remember to adjust the
documentation.
Feel free to ping (in general) for the review if you do not see reaction for
a few days
(72 Hours is the minimum reaction time you can expect from volunteers) - we
sometimes miss notifications.
In case of an existing issue, reference it using one of the following:
* closes: #ISSUE
* related: #ISSUE
-->
---
##### Was generative AI tooling used to co-author this PR?
- [X] Yes (please specify the tool below)
Generated-by: Claude Code following [the
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]