1fanwang opened a new pull request, #66890:
URL: https://github.com/apache/airflow/pull/66890

   Two routes in the task-execution API — `PATCH /task-instances/{id}/run` and 
`PATCH /task-instances/{id}/state` — catch `sqlalchemy.exc.SQLAlchemyError` 
(the parent class of `DataError`) and re-raise as `HTTPException(500, "Database 
error occurred")`. When a worker submits a payload with an oversized field (a 
`note`, a `rendered_map_index`, an `external_executor_id`, an `extra` JSON 
blob), the DB rejects with `DataError`, the local `except` fires first, and the 
worker sees an opaque 500 with no signal that payload size was the cause.
   
   That also defeats the umbrella fix in #66888: registering a global 
`DataError → 413/422` handler doesn't help these two routes because their local 
catch eats the exception before the global handler sees it.
   
   Tighten both `try` blocks: catch `DataError` explicitly and re-raise 
unchanged, so any handler registered on the execution API (the one shipped in 
#66888, or a future replacement) can translate it. The `SQLAlchemyError → 500` 
fallback for unrelated DB errors stays intact.
   
   ```python
       except DataError:
           # Re-raise unchanged so the app-level _DataErrorHandler can 
translate to 4xx.
           raise
       except SQLAlchemyError:
           log.exception("Error marking Task Instance state as running")
           raise HTTPException(
               status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, 
detail="Database error occurred"
           )
   ```
   
   ## Evidence
   
   Verified with the same TestClient sweep used in #66888, driving real 
`create_app()` routes after monkey-patching `Session.flush`/`Session.commit` to 
raise the same `DataError` MySQL would raise on an oversized `note`:
   
   ```
   --- PATCH /task-instances/{id}/state (oversized note) — BEFORE this PR ---
     HTTP 500
     body: {"message":"Internal server error"}
   
   --- PATCH /task-instances/{id}/state (oversized note) — AFTER this PR ---
     HTTP 413
     body: {"detail":{"reason":"Payload exceeded database column 
limit","orig_error":"(1406, \"Data too long for column 'note' at row 
1\")","message":"Database rejected the payload. Reduce the field size, or your 
operator may widen the column type (e.g. MEDIUMTEXT / LONGTEXT on MySQL)."}}
   ```
   
   Same `DataError`, same `orig_error`; the difference is whether the local 
`except SQLAlchemyError` got there first.
   
   ## Scope
   
   Two `try/except` blocks; no behavioural change for any non-`DataError` SQL 
error. Stacks on top of #66888 — independent of the global handler getting 
registered, but only delivers user-visible value once the handler is in place. 
Safe to merge in either order.
   
   Closes (partially) #66889 — the umbrella issue tracking the same class of 
DB-rejected-payload-as-500 failures across REST + execution API endpoints.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to