hkc-8010 commented on code in PR #66854:
URL: https://github.com/apache/airflow/pull/66854#discussion_r3353559493
##########
airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py:
##########
@@ -457,6 +457,12 @@ def ti_update_state(
extra=json.dumps({"host_name": hostname}) if hostname else
None,
)
)
+ # Commit the TI state update now to release the task_instance row lock
before
+ # running asset-event queries. Asset registration can hold the lock
for seconds
+ # under high concurrency (many aliases with large event histories),
causing
+ # idle-in-transaction pile-up that exhausts API server memory and
triggers OOMKill.
+ # The task outcome is durable from this point on.
+ session.commit()
Review Comment:
Addressed in `4673ff3`. I kept the early commit and 204 behavior for
task-state durability, and added `asset.registration_failures` on the
post-commit registration exception path so this no longer relies only on
`log.exception`. The metric is registered and the regression tests now assert
it is emitted when registration fails.
I am intentionally leaving durable retry/reconciliation out of this PR
because that is a larger delivery-semantics change; this PR keeps the
lock-contention fix scoped while making dropped registration observable. I also
updated the PR description with the sanitized contention evidence: 80+
concurrent completions, idle-in-transaction asset queries lasting minutes,
blocked `SELECT ... FOR UPDATE`, and apiserver OOMKills.
##########
airflow-core/src/airflow/assets/manager.py:
##########
@@ -327,8 +328,16 @@ def register_asset_change(
).unique()
for asset_alias_model in asset_alias_models:
- asset_alias_model.asset_events.append(asset_event)
- session.add(asset_alias_model)
+ # Use a direct INSERT rather than ORM .append() to avoid
lazy-loading the
+ # entire asset_events collection. On long-running deployments
that collection
+ # can contain thousands of rows; loading it on the
task-success hot path can
+ # leave DB connections idle-in-transaction for minutes,
blocking other workers.
+ session.execute(
Review Comment:
Addressed in `4673ff3`. I added the note directly beside the direct
association-table insert: this intentionally leaves
`asset_alias_model.asset_events` unsynced in the current session, and the path
is safe because it does not read that relationship again before commit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]