bq. the right way to reach JVM in python

Can you tell us more about what you want to achieve ?

If you want to pass some value to workers, you can use broadcast variable.

Cheers

On Mon, Sep 28, 2015 at 10:31 PM, YiZhi Liu <javeli...@gmail.com> wrote:

> Hi Ted,
>
> Thank you for reply. The sc works at driver, but how can I reach the
> JVM in rdd.map ?
>
> 2015-09-29 11:26 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:
> >>>> sc._jvm.java.lang.Integer.valueOf("12")
> > 12
> >
> > FYI
> >
> > On Mon, Sep 28, 2015 at 8:08 PM, YiZhi Liu <javeli...@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I'm doing some data processing on pyspark, but I failed to reach JVM
> >> in workers. Here is what I did:
> >>
> >> $ bin/pyspark
> >> >>> data = sc.parallelize(["123", "234"])
> >> >>> numbers = data.map(lambda s:
> >> >>>
> SparkContext._active_spark_context._jvm.java.lang.Integer.valueOf(s.strip()))
> >> >>> numbers.collect()
> >>
> >> I got,
> >>
> >> Caused by: org.apache.spark.api.python.PythonException: Traceback
> >> (most recent call last):
> >>   File
> >>
> "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/worker.py",
> >> line 111, in main
> >>     process()
> >>   File
> >>
> "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/worker.py",
> >> line 106, in process
> >>     serializer.dump_stream(func(split_index, iterator), outfile)
> >>   File
> >>
> "/mnt/hgfs/lewis/Workspace/source-codes/spark/python/lib/pyspark.zip/pyspark/serializers.py",
> >> line 263, in dump_stream
> >>     vs = list(itertools.islice(iterator, batch))
> >>   File "<stdin>", line 1, in <lambda>
> >> AttributeError: 'NoneType' object has no attribute '_jvm'
> >>
> >> at
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138)
> >> at
> >>
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
> >> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
> >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> >> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >> ... 1 more
> >>
> >> While _jvm at the driver end looks fine:
> >>
> >> >>>
> >> >>>
> SparkContext._active_spark_context._jvm.java.lang.Integer.valueOf("123".strip())
> >> 123
> >>
> >> The program is trivial, I just wonder what is the right way to reach
> >> JVM in python. Any help would be appreciated.
> >>
> >> Thanks
> >>
> >> --
> >> Yizhi Liu
> >> Senior Software Engineer / Data Mining
> >> www.mvad.com, Shanghai, China
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> >
>
>
>
> --
> Yizhi Liu
> Senior Software Engineer / Data Mining
> www.mvad.com, Shanghai, China
>

Reply via email to