anblanco commented on PR #55201:
URL: https://github.com/apache/spark/pull/55201#issuecomment-4194678272

   > There's probably some minor misunderstanding for the master fix. I don't 
believe the explicit `flush()` _in addition to_ `close()` is the key for the 
issue. We need an explicit `close()` to avoid the socket being closed during 
garbage collection. `flush()` might work as well but `close()` is semantically 
what we really need. I think it would be nice to confirm that a `close()` in 
`finally` will still fix the issue on Windows.
   > 
   > Also, do we need a try ... except block for `close()` / `flush()`?
   > 
   > Unfortunately, fixing it in `worker.py` is probably not enough. The other 
workers are for python data source which is supported since spark 4.0. They 
probably suffer the same issues (@anblanco if you could confirm that on 
Windows, it would be great! `test_python_datasource` is the test file for it).
   > 
   > If I'm correct, we need to backport the fix to all those worker files too. 
I don't have a strong opinion about cherry-pick vs clean fix - but we need to 
consider the work to patch all workers, not just a single file.
   > 
   > I'm not sure if we are writing new regression tests for the maintenance 
branch, but at least we should add regression tests on master. BTW the 
regression tests do not need to contain comments about the "fix". Most of the 
times, the test itself explains what it tests against. If it's not clear, we 
should comment about what regression case it tests - aka the problem, not the 
solution.
   
   Thank you for the detailed feedback @gaogaotiantian 
   
   * `close()` vs `flush()`: Switched to `close()` to match `master`'s 
approach, and dropped the `try...except` wrapper to match the convention of the 
`master` branch. I was also able to confirm that "because `.close()` internally 
calls `.flush()` this also fixes the crash on Windows."
   * Other workers affected: Confirmed on Windows with PySpark 4.0 + Python 
3.12.10 + `daemon=false` — a Python data source test hits the same 
`EOFException`. Updated the PR to apply `close()` to all 12 additional worker 
files on branch-4.1.
   * Test comments: Updated to describe the problem, not the solution
   
   If you'd like I can squash commit my changes for a cleaner git commit 
history. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to