gaogaotiantian commented on PR #55201:
URL: https://github.com/apache/spark/pull/55201#issuecomment-4194262210

   There's probably some minor misunderstanding for the master fix. I don't 
believe the explicit `flush()` *in addition to* `close()` is the key for the 
issue. We need an explicit `close()` to avoid the socket being closed during 
garbage collection. `flush()` might work as well but `close()` is semantically 
what we really need. I think it would be nice to confirm that a `close()` in 
`finally` will still fix the issue on Windows.
   
   Also, do we need a try ... except block for `close()` / `flush()`?
   
   Unfortunately, fixing it in `worker.py` is probably not enough. The other 
workers are for python data source which is supported since spark 4.0. They 
probably suffer the same issues (@anblanco if you could confirm that on 
Windows, it would be great! `test_python_datasource` is the test file for it).
   
   If I'm correct, we need to backport the fix to all those worker files too. I 
don't have a strong opinion about cherry-pick vs clean fix - but we need to 
consider the work to patch all workers, not just a single file.
   
   I'm not sure if we are writing new regression tests for the maintenance 
branch, but at least we should add regression tests on master. BTW the 
regression tests do not need to contain comments about the "fix". Most of the 
times, the test itself explains what it tests against. If it's not clear, we 
should comment about what regression case it tests - aka the problem, not the 
solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to