[ https://issues.apache.org/jira/browse/SPARK-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695427#comment-16695427 ]
Ruslan Dautkhanov edited comment on SPARK-26019 at 11/22/18 12:42 AM: ---------------------------------------------------------------------- Thank you [~irashid] I confirm that swapping those two lines doesn't fix things. Fixing race condition that happens in accumulators.py: _start_update_server() # SocketServer:TCPServer defaults bind_and_activate to True [https://github.com/python/cpython/blob/2.7/Lib/SocketServer.py#L413] # Also {{handle()}} is defined in derived class _UpdateRequestHandler here [https://github.com/apache/spark/blob/master/python/pyspark/accumulators.py#L232] Please help review [https://github.com/apache/spark/pull/23113] Basically fix is to bind and activate SocketServer.TCPServer only in that dedicated thread to serve AccumulatorServer, to avoid race condition that could happen if we start listening and accepting connections in main thread. I manually verified and it fixes things for us. Thank you. was (Author: tagar): Thank you [~irashid] I confirm that swapping those two lines doesn't fix things. Fixing race condition that happens in accumulators.py: _start_update_server() # SocketServer:TCPServer defaults bind_and_activate to True [https://github.com/python/cpython/blob/2.7/Lib/SocketServer.py#L413] # Also {{handle()}} is defined in derived class _UpdateRequestHandler here [https://github.com/apache/spark/blob/master/python/pyspark/accumulators.py#L232] Please help review [https://github.com/apache/spark/pull/23113] Basically fix is to bind and activate SocketServer.TCPServer only in that dedicated thread to serve AccumulatorServer, to avoid race condition that happens that could happen if we start listening and accepting connections in main thread. I manually verified and it fixes things for us. Thank you. > pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" > in authenticate_and_accum_updates() > ---------------------------------------------------------------------------------------------------------------- > > Key: SPARK-26019 > URL: https://issues.apache.org/jira/browse/SPARK-26019 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.3.2, 2.4.0 > Reporter: Ruslan Dautkhanov > Priority: Major > > Started happening after 2.3.1 -> 2.3.2 upgrade. > > {code:python} > Exception happened during processing of request from ('127.0.0.1', 43418) > ---------------------------------------- > Traceback (most recent call last): > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 290, in _handle_request_noblock > self.process_request(request, client_address) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 318, in process_request > self.finish_request(request, client_address) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 331, in finish_request > self.RequestHandlerClass(request, client_address, self) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 652, in __init__ > self.handle() > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 263, in handle > poll(authenticate_and_accum_updates) > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 238, in poll > if func(): > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 251, in authenticate_and_accum_updates > received_token = self.rfile.read(len(auth_token)) > TypeError: object of type 'NoneType' has no len() > > {code} > > Error happens here: > https://github.com/apache/spark/blob/cb90617f894fd51a092710271823ec7d1cd3a668/python/pyspark/accumulators.py#L254 > The PySpark code was just running a simple pipeline of > binary_rdd = sc.binaryRecords(full_file_path, record_length).map(lambda .. ) > and then converting it to a dataframe and running a count on it. > It seems error is flaky - on next rerun it didn't happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org