[
https://issues.apache.org/jira/browse/SPARK-22711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351601#comment-16351601
]
Prateek commented on SPARK-22711:
---------------------------------
Thanks . work around works fine.
I wonder why *setup_environment* method did not take care of this. Is it that
each pass of map or flatMap master assigns to worker nodes and worker nodes are
kind of refreshed(setup RDD was temporary) or is it that it assigns to
different worker node other it did for than setup_environment.
> _pickle.PicklingError: args[0] from __newobj__ args has the wrong class from
> cloudpickle.py
> -------------------------------------------------------------------------------------------
>
> Key: SPARK-22711
> URL: https://issues.apache.org/jira/browse/SPARK-22711
> Project: Spark
> Issue Type: Bug
> Components: PySpark, Spark Submit
> Affects Versions: 2.2.0, 2.2.1
> Environment: Ubuntu pseudo distributed installation of Spark 2.2.0
> Reporter: Prateek
> Priority: Major
> Attachments: Jira_Spark_minimized_code.py
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> When I submit a Pyspark program with spark-submit command this error is
> thrown.
> It happens when for code like below
> RDD2 = RDD1.map(lambda m: function_x(m)).reduceByKey(lambda c,v :c+v)
> or
> RDD2 = RDD1.flatMap(lambda m: function_x(m)).reduceByKey(lambda c,v :c+v)
> or
> RDD2 = RDD1.flatMap(lambda m: function_x(m)).reduce(lambda c,v :c+v)
> Traceback (most recent call last):
> File "/home/prateek/Project/textrank.py", line 299, in <module>
> summaryRDD = sentenceTokensReduceRDD.map(lambda m:
> get_summary(m)).reduceByKey(lambda c,v :c+v)
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1608,
> in reduceByKey
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1846,
> in combineByKey
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1783,
> in partitionBy
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2455,
> in _jrdd
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2388,
> in _wrap_function
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2374,
> in _prepare_for_python_RDD
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line
> 460, in dumps
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 704, in dumps
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 148, in dump
> File "/usr/lib/python3.5/pickle.py", line 408, in dump
> self.save(obj)
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.5/pickle.py", line 740, in save_tuple
> save(element)
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 255, in save_function
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 292, in save_function_tuple
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
> save(element)
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.5/pickle.py", line 770, in save_list
> self._batch_appends(obj)
> File "/usr/lib/python3.5/pickle.py", line 794, in _batch_appends
> save(x)
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 255, in save_function
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 292, in save_function_tuple
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
> save(element)
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.5/pickle.py", line 770, in save_list
> self._batch_appends(obj)
> File "/usr/lib/python3.5/pickle.py", line 794, in _batch_appends
> save(x)
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 255, in save_function
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 292, in save_function_tuple
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
> save(element)
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.5/pickle.py", line 770, in save_list
> self._batch_appends(obj)
> File "/usr/lib/python3.5/pickle.py", line 794, in _batch_appends
> save(x)
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 255, in save_function
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 292, in save_function_tuple
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
> save(element)
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.5/pickle.py", line 770, in save_list
> self._batch_appends(obj)
> File "/usr/lib/python3.5/pickle.py", line 797, in _batch_appends
> save(tmp[0])
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 249, in save_function
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 297, in save_function_tuple
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
> self._batch_setitems(obj.items())
> File "/usr/lib/python3.5/pickle.py", line 841, in _batch_setitems
> save(v)
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 249, in save_function
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 297, in save_function_tuple
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
> self._batch_setitems(obj.items())
> File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems
> save(v)
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 249, in save_function
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 297, in save_function_tuple
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
> self._batch_setitems(obj.items())
> File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems
> save(v)
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 249, in save_function
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 297, in save_function_tuple
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
> self._batch_setitems(obj.items())
> File "/usr/lib/python3.5/pickle.py", line 841, in _batch_setitems
> save(v)
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 249, in save_function
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 297, in save_function_tuple
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
> self._batch_setitems(obj.items())
> File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems
> save(v)
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 249, in save_function
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 297, in save_function_tuple
> File "/usr/lib/python3.5/pickle.py", line 475, in save
> f(self, obj) # Call unbound method with explicit self
> File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
> self._batch_setitems(obj.items())
> File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems
> save(v)
> File "/usr/lib/python3.5/pickle.py", line 520, in save
> self.save_reduce(obj=obj, *rv)
> File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
> 565, in save_reduce
> _pickle.PicklingError: args[0] from __newobj__ args has the wrong class
> I tried replacing the cloudpickle code from GitHub , but that started giving
> error copy_reg not defined and copyreg not defined .(for both python 2.7 and
> 3.5)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]