Wojciech Szlachta created SPARK-51966: -----------------------------------------
Summary: Replace select.select() with select.poll() when running on POSIX os Key: SPARK-51966 URL: https://issues.apache.org/jira/browse/SPARK-51966 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.5.5, 4.0.0 Reporter: Wojciech Szlachta On glibc based Linux systems {{select()}} can monitor only file descriptor numbers that are less than {{FD_SETSIZE}} (1024). This is an unreasonably low limit for many modern applications. When running via {{pyspark}} we frequently observe: {code} Exception occurred during processing of request from ('127.0.0.1', 46334) Traceback (most recent call last): File "/usr/lib/python3.11/socketserver.py", line 317, in _handle_request_noblock self.process_request(request, client_address) File "/usr/lib/python3.11/socketserver.py", line 348, in process_request self.finish_request(request, client_address) File "/usr/lib/python3.11/socketserver.py", line 361, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/lib/python3.11/socketserver.py", line 755, in __init__ self.handle() File "/usr/lib/python3.11/site-packages/pyspark/accumulators.py", line 293, in handle poll(authenticate_and_accum_updates) File "/usr/lib/python3.11/site-packages/pyspark/accumulators.py", line 266, in poll r, _, _ = select.select([self.rfile], [], [], 1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: filedescriptor out of range in select() {code} On POSIX systems {{poll()}} should be used instead of {{select()}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org