Shubham Patil created SPARK-46636: ------------------------------------- Summary: Pyspark throwing TypeError while collecting a RDD Key: SPARK-46636 URL: https://issues.apache.org/jira/browse/SPARK-46636 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.3.4 Environment: Running this in anaconda jupyter notebook
Python== 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)] Spark== 3.3.4 pyspark== 3.4.1 Reporter: Shubham Patil Im trying to collect a RDD after applying a filter on it but its throwing an error. Error can be reproduced from below code {code:java} from pyspark.sql import SparkSession spark = SparkSession.builder.master("local[*]").appName("Practice").getOrCreate() sc = spark.sparkContext data = [1,2,3,4,5,6,7,8,9,10,11,12] dataRdd = sc.parallelize(data) dataRdd = dataRdd.filter(lambda a: a%2==0) dataRdd.collect() {code} Below is the error that its throwing: {code:java} --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[18], line 1 ----> 1 dataRdd.collect() File ~\anaconda3\envs\spark_latest\Lib\site-packages\pyspark\rdd.py: 1814, in RDD.collect(self) 1812 with SCCallSiteSync(self.context): 1813 assert self.ctx._jvm is not None -> 1814 sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) 1815 return list(_load_from_socket(sock_info, self._jrdd_deserializer)) File ~\anaconda3\envs\spark_latest\Lib\site-packages\pyspark\rdd.py: 5441, in PipelinedRDD._jrdd(self) 438 else: 5439 profiler = None -> 5441 wrapped_func = _wrap_function( 5442 self.ctx, self.func, self._prev_jrdd_deserializer, self._jrdd_deserializer, profiler 5443 ) 5445 assert self.ctx._jvm is not None 5446 python_rdd = self.ctx._jvm.PythonRDD( 5447 self._prev_jrdd.rdd(), wrapped_func, self.preservesPartitioning, self.is_barrier 5448 ) File ~\anaconda3\envs\spark_latest\Lib\site-packages\pyspark\rdd.py: 5243, in _wrap_function(sc, func, deserializer, serializer, profiler) 5241 pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command) 5242 assert sc._jvm is not None -> 5243 return sc._jvm.SimplePythonFunction( 5244 bytearray(pickled_command), 5245 env, 5246 includes, 5247 sc.pythonExec, 5248 sc.pythonVer, 5249 broadcast_vars, 5250 sc._javaAccumulator, 5251 ) TypeError: 'JavaPackage' object is not callable{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org