[ https://issues.apache.org/jira/browse/SPARK-27335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318171#comment-17318171 ]
Samantha Zeitlin commented on SPARK-27335: ------------------------------------------ I'm seeing this on spark 3.0.1 and I'm not using Correlation.corr at all. Here's the full traceback: ```` Traceback (most recent call last): File "/Users/szeitlin/Radically_Different_Data_Science/Tribe/Bison/code/dbx-notebooks/gopher_changes/test_gopher_changes.py", line 37, in test_get_batch_meters row = output.head() File "/Users/szeitlin/anaconda3/envs/dbconnect/lib/python3.7/site-packages/pyspark/sql/dataframe.py", line 1369, in head rs = self.head(1) File "/Users/szeitlin/anaconda3/envs/dbconnect/lib/python3.7/site-packages/pyspark/sql/dataframe.py", line 1371, in head return self.take(n) File "/Users/szeitlin/anaconda3/envs/dbconnect/lib/python3.7/site-packages/pyspark/sql/dataframe.py", line 657, in take return self.limit(num).collect() File "/Users/szeitlin/anaconda3/envs/dbconnect/lib/python3.7/site-packages/pyspark/sql/dataframe.py", line 610, in collect with SCCallSiteSync(self._sc) as css: File "/Users/szeitlin/anaconda3/envs/dbconnect/lib/python3.7/site-packages/pyspark/traceback_utils.py", line 72, in __enter__ self._context._jsc.setCallSite(self._call_site) AttributeError: 'NoneType' object has no attribute 'setCallSite' ``` I think this may be related to whether there are other spark contexts available at the time, since I've seen it only when I had a notebook running while also trying to run tests. It sure would be nice if spark were a little smarter about knowing (or asking?) which spark context to use, or shutting down extras, if there is more than one available. > cannot collect() from Correlation.corr > -------------------------------------- > > Key: SPARK-27335 > URL: https://issues.apache.org/jira/browse/SPARK-27335 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.4.0 > Reporter: Natalino Busa > Priority: Major > > reproducing the bug from the example in the documentation: > > > {code:java} > import pyspark > from pyspark.ml.linalg import Vectors > from pyspark.ml.stat import Correlation > spark = pyspark.sql.SparkSession.builder.getOrCreate() > dataset = [[Vectors.dense([1, 0, 0, -2])], > [Vectors.dense([4, 5, 0, 3])], > [Vectors.dense([6, 7, 0, 8])], > [Vectors.dense([9, 0, 0, 1])]] > dataset = spark.createDataFrame(dataset, ['features']) > df = Correlation.corr(dataset, 'features', 'pearson') > df.collect() > > {code} > This produces the following stack trace: > > {code:java} > --------------------------------------------------------------------------- > AttributeError Traceback (most recent call last) > <ipython-input-92-e7889fa5d198> in <module>() > 11 dataset = spark.createDataFrame(dataset, ['features']) > 12 df = Correlation.corr(dataset, 'features', 'pearson') > ---> 13 df.collect() > /opt/spark/python/pyspark/sql/dataframe.py in collect(self) > 530 [Row(age=2, name=u'Alice'), Row(age=5, name=u'Bob')] > 531 """ > --> 532 with SCCallSiteSync(self._sc) as css: > 533 sock_info = self._jdf.collectToPython() > 534 return list(_load_from_socket(sock_info, > BatchedSerializer(PickleSerializer()))) > /opt/spark/python/pyspark/traceback_utils.py in __enter__(self) > 70 def __enter__(self): > 71 if SCCallSiteSync._spark_stack_depth == 0: > ---> 72 self._context._jsc.setCallSite(self._call_site) > 73 SCCallSiteSync._spark_stack_depth += 1 > 74 > AttributeError: 'NoneType' object has no attribute 'setCallSite'{code} > > > Analysis: > Somehow the dataframe properties `df.sql_ctx.sparkSession._jsparkSession`, > and `spark._jsparkSession` do not match with the ones available in the spark > session. > The following code fixes the problem (I hope this helps you narrowing down > the root cause) > > {code:java} > df.sql_ctx.sparkSession._jsparkSession = spark._jsparkSession > df._sc = spark._sc > df.collect() > >>> [Row(pearson(features)=DenseMatrix(4, 4, [1.0, 0.0556, nan, 0.4005, > >>> 0.0556, 1.0, nan, 0.9136, nan, nan, 1.0, nan, 0.4005, 0.9136, nan, 1.0], > >>> False))]{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org