kehao created SPARK-19627: ----------------------------- Summary: pyspark call jvm function defined by ourselves Key: SPARK-19627 URL: https://issues.apache.org/jira/browse/SPARK-19627 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.6.1 Reporter: kehao Fix For: 1.6.1
hi, I have a question that pyspark couldn't execute suceess by call jvm's function defined by myself, please view the code below: from pyspark import SparkConf,SparkContext from py4j.java_gateway import java_import if __name__ == "__main__": # conf = SparkConf().setAppName("testing") # sc = SparkContext(conf=conf) sc = SparkContext(appName="Py4jTesting") def foo(x): java_import(sc._jvm, "Calculate") func = sc._jvm.Calculate() func.sqAdd(x) rdd = sc.parallelize([1, 2, 3]) result = rdd.map(foo).collect() print("$$$$$$$$$$$$$$$$$$$$$$") print(result) the result shows as below ,who can help me? Traceback (most recent call last): File "/home/manager/data/software/mytest/kehao/driver.py", line 19, in <module> result = rdd.map(foo).collect() File "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in collect File "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 2379, in _jrdd File "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 2299, in _prepare_for_python_RDD File "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 428, in dumps File "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 646, in dumps File "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 107, in dump File "/usr/lib/python3.4/pickle.py", line 412, in dump self.save(obj) File "/usr/lib/python3.4/pickle.py", line 479, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.4/pickle.py", line 744, in save_tuple save(element) File "/usr/lib/python3.4/pickle.py", line 479, in save f(self, obj) # Call unbound method with explicit self File "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 199, in save_function File "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 236, in save_function_tuple File "/usr/lib/python3.4/pickle.py", line 479, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.4/pickle.py", line 729, in save_tuple save(element) File "/usr/lib/python3.4/pickle.py", line 479, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.4/pickle.py", line 774, in save_list self._batch_appends(obj) File "/usr/lib/python3.4/pickle.py", line 801, in _batch_appends save(tmp[0]) File "/usr/lib/python3.4/pickle.py", line 479, in save f(self, obj) # Call unbound method with explicit self File "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 193, in save_function File "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 241, in save_function_tuple File "/usr/lib/python3.4/pickle.py", line 479, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.4/pickle.py", line 814, in save_dict self._batch_setitems(obj.items()) File "/usr/lib/python3.4/pickle.py", line 840, in _batch_setitems save(v) File "/usr/lib/python3.4/pickle.py", line 499, in save rv = reduce(self.proto) File "/home/manager/data/software/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/context.py", line 268, in __getnewargs__ Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063 -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org