[ 
https://issues.apache.org/jira/browse/SPARK-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719433#comment-16719433
 ] 

Imran Rashid commented on SPARK-26019:
--------------------------------------

Hi [~dongjoon], I don't think there is a zeppelin issue, but it is fixed in the 
master branch, though not yet released to the best of my knowledge 
([~hyukjin.kwon] maybe you know more on the zeppelin side?)

bq. Allowing insecure connections on a secure server sounds like another 
security issue. Security is an issue of all or nothing.

totally understand your concerns, but the PMC did discuss this and came to this 
decision.  The reasoning is that the user has chosen to bring in an insecure 
connection.  Yes, absolutely, with this, you don't really have a "secure 
server".  But we do let users run in insecure modes anyway.

> pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" 
> in authenticate_and_accum_updates()
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-26019
>                 URL: https://issues.apache.org/jira/browse/SPARK-26019
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.3.2, 2.4.0
>            Reporter: Ruslan Dautkhanov
>            Priority: Major
>
> pyspark's accumulator server expects a secure py4j connection between python 
> and the jvm.  Spark will normally create a secure connection, but there is a 
> public api which allows you to pass in your own py4j connection.  (this is 
> used by zeppelin, at least.)  When this happens, you get an error like:
> {noformat}
> pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" 
> in authenticate_and_accum_updates()
> {noformat}
> We should change pyspark to
> 1) warn loudly if a user passes in an insecure connection
> 1a) I'd like to suggest that we even error out, unless the user actively 
> opts-in with a config like "spark.python.allowInsecurePy4j=true"
> 2) The accumulator server should be changed to allow insecure connections.
> note that SPARK-26349 will disallow insecure connections completely in 3.0.
>  
> More info on how this occurs:
> {code:python}
> Exception happened during processing of request from ('127.0.0.1', 43418)
> ----------------------------------------
> Traceback (most recent call last):
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 
> 290, in _handle_request_noblock
>     self.process_request(request, client_address)
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 
> 318, in process_request
>     self.finish_request(request, client_address)
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 
> 331, in finish_request
>     self.RequestHandlerClass(request, client_address, self)
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 
> 652, in __init__
>     self.handle()
>   File 
> "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py",
>  line 263, in handle
>     poll(authenticate_and_accum_updates)
>   File 
> "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py",
>  line 238, in poll
>     if func():
>   File 
> "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py",
>  line 251, in authenticate_and_accum_updates
>     received_token = self.rfile.read(len(auth_token))
> TypeError: object of type 'NoneType' has no len()
>  
> {code}
>  
> Error happens here:
> https://github.com/apache/spark/blob/cb90617f894fd51a092710271823ec7d1cd3a668/python/pyspark/accumulators.py#L254
> The PySpark code was just running a simple pipeline of 
> binary_rdd = sc.binaryRecords(full_file_path, record_length).map(lambda .. )
> and then converting it to a dataframe and running a count on it.
> It seems error is flaky - on next rerun it didn't happen. (But accumulators 
> don't actually work.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to