[ https://issues.apache.org/jira/browse/SPARK-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719359#comment-16719359 ]
Dongjoon Hyun commented on SPARK-26019: --------------------------------------- Hi, [~irashid]. Is there a corresponding Apache Zeppelin JIRA issue to provide a secure connection? bq. the summary here is that zeppelin provides its own py4j gateway connection, and it does not use a secure connection, but the accumulator server is expecting a secure connection. Allowing insecure connections on a secure server sounds like another security issue. Security is an issue of all or nothing. bq. To avoid a breaking change in 2.x, this ticket should be used to change the accumulator server to allow an insecure connection, > pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" > in authenticate_and_accum_updates() > ---------------------------------------------------------------------------------------------------------------- > > Key: SPARK-26019 > URL: https://issues.apache.org/jira/browse/SPARK-26019 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.3.2, 2.4.0 > Reporter: Ruslan Dautkhanov > Priority: Major > > pyspark's accumulator server expects a secure py4j connection between python > and the jvm. Spark will normally create a secure connection, but there is a > public api which allows you to pass in your own py4j connection. (this is > used by zeppelin, at least.) When this happens, you get an error like: > {noformat} > pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" > in authenticate_and_accum_updates() > {noformat} > We should change pyspark to > 1) warn loudly if a user passes in an insecure connection > 1a) I'd like to suggest that we even error out, unless the user actively > opts-in with a config like "spark.python.allowInsecurePy4j=true" > 2) The accumulator server should be changed to allow insecure connections. > note that SPARK-26349 will disallow insecure connections completely in 3.0. > > More info on how this occurs: > {code:python} > Exception happened during processing of request from ('127.0.0.1', 43418) > ---------------------------------------- > Traceback (most recent call last): > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 290, in _handle_request_noblock > self.process_request(request, client_address) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 318, in process_request > self.finish_request(request, client_address) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 331, in finish_request > self.RequestHandlerClass(request, client_address, self) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 652, in __init__ > self.handle() > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 263, in handle > poll(authenticate_and_accum_updates) > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 238, in poll > if func(): > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 251, in authenticate_and_accum_updates > received_token = self.rfile.read(len(auth_token)) > TypeError: object of type 'NoneType' has no len() > > {code} > > Error happens here: > https://github.com/apache/spark/blob/cb90617f894fd51a092710271823ec7d1cd3a668/python/pyspark/accumulators.py#L254 > The PySpark code was just running a simple pipeline of > binary_rdd = sc.binaryRecords(full_file_path, record_length).map(lambda .. ) > and then converting it to a dataframe and running a count on it. > It seems error is flaky - on next rerun it didn't happen. (But accumulators > don't actually work.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org