ueshin opened a new pull request, #53055:
URL: https://github.com/apache/spark/pull/53055

   ### What changes were proposed in this pull request?
   
   Kills the worker if flush fails in `daemon.py`.
   
   Before it just dies, reuse `faulthandler` feature and record the thread dump 
and it will appear in the error message if `faulthandler` is enabled.
   
   ```
   WARN TaskSetManager: Lost task 3.0 in stage 1.0 (TID 8) (127.0.0.1 executor 
1): org.apache.spark.SparkException: Python worker exited unexpectedly 
(crashed): Current thread 0x00000001f0796140 (most recent call first):
     File "/.../python/pyspark/daemon.py", line 95 in worker
     File "/.../python/pyspark/daemon.py", line 228 in manager
     File "/.../python/pyspark/daemon.py", line 253 in <module>
     File "<frozen runpy>", line 88 in _run_code
     File "<frozen runpy>", line 198 in _run_module_as_main
   
        at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:679)
   ...
   ```
   
   Even when it's disabled, the error will appear in the executor's `stderr` 
file.
   
   ### Why are the changes needed?
   
   Currently an exception caused by `outfile.flush()` failure in `daemon.py` is 
ignored, but if the last command in `worker_main` is still not flushed, it 
could cause a UDF stuck in Java waiting for the response from the Python worker.
   
   It should just die and let Spark retry the task.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Manually.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to