[ 
https://issues.apache.org/jira/browse/SPARK-22711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338070#comment-16338070
 ] 

Bryan Cutler commented on SPARK-22711:
--------------------------------------

Hi [~PrateekRM], here is your code trimmed down to where the problem is.  It 
seems like CloudPickle in pyspark is having trouble with wordnet

{code}
from pyspark import SparkContext
from nltk.corpus import wordnet as wn

def to_synset(word):
    return str(wn.synsets(word))

sc = SparkContext(appName="Text Rank")
rdd = sc.parallelize(["cat", "dog"])
print(rdd.map(to_synset).collect())
{code}

I can look into it, but as a workaround if you import wordnet in your function, 
it seems to work fine

{code}
def to_synset(word):
    from nltk.corpus import wordnet as wn
    return str(wn.synsets(word))
{code}

> _pickle.PicklingError: args[0] from __newobj__ args has the wrong class from 
> cloudpickle.py
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-22711
>                 URL: https://issues.apache.org/jira/browse/SPARK-22711
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Spark Submit
>    Affects Versions: 2.2.0, 2.2.1
>         Environment: Ubuntu pseudo distributed installation of Spark 2.2.0
>            Reporter: Prateek
>            Priority: Major
>         Attachments: Jira_Spark_minimized_code.py
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> When I submit a Pyspark program with spark-submit command this error is 
> thrown.
> It happens when for code like below
> RDD2 = RDD1.map(lambda m: function_x(m)).reduceByKey(lambda c,v :c+v)
> or 
> RDD2 = RDD1.flatMap(lambda m: function_x(m)).reduceByKey(lambda c,v :c+v)
> or
> RDD2 = RDD1.flatMap(lambda m: function_x(m)).reduce(lambda c,v :c+v)
> Traceback (most recent call last):
>   File "/home/prateek/Project/textrank.py", line 299, in <module>
>     summaryRDD = sentenceTokensReduceRDD.map(lambda m: 
> get_summary(m)).reduceByKey(lambda c,v :c+v)
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1608, 
> in reduceByKey
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1846, 
> in combineByKey
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1783, 
> in partitionBy
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2455, 
> in _jrdd
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2388, 
> in _wrap_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2374, 
> in _prepare_for_python_RDD
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 
> 460, in dumps
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 704, in dumps
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 148, in dump
>   File "/usr/lib/python3.5/pickle.py", line 408, in dump
>     self.save(obj)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 740, in save_tuple
>     save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 255, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 292, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
>     save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 770, in save_list
>     self._batch_appends(obj)
>   File "/usr/lib/python3.5/pickle.py", line 794, in _batch_appends
>     save(x)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 255, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 292, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
>     save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 770, in save_list
>     self._batch_appends(obj)
>   File "/usr/lib/python3.5/pickle.py", line 794, in _batch_appends
>     save(x)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 255, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 292, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
>     save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 770, in save_list
>     self._batch_appends(obj)
>   File "/usr/lib/python3.5/pickle.py", line 794, in _batch_appends
>     save(x)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 255, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 292, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 725, in save_tuple
>     save(element)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 770, in save_list
>     self._batch_appends(obj)
>   File "/usr/lib/python3.5/pickle.py", line 797, in _batch_appends
>     save(tmp[0])
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 841, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 841, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 249, in save_function
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 297, in save_function_tuple
>   File "/usr/lib/python3.5/pickle.py", line 475, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/usr/lib/python3.5/pickle.py", line 810, in save_dict
>     self._batch_setitems(obj.items())
>   File "/usr/lib/python3.5/pickle.py", line 836, in _batch_setitems
>     save(v)
>   File "/usr/lib/python3.5/pickle.py", line 520, in save
>     self.save_reduce(obj=obj, *rv)
>   File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 
> 565, in save_reduce
> _pickle.PicklingError: args[0] from __newobj__ args has the wrong class
> I tried replacing the cloudpickle code from GitHub , but that started giving 
> error copy_reg not defined and copyreg not defined .(for both python 2.7 and 
> 3.5)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to