Re: NewSparkInterpreter fails on yarn-cluster

Thomas Bünger Thu, 07 Jun 2018 05:34:09 -0700

I specifically mean visualisation via ZeppelinContext inside a Spark
interpreter. (e.g. "z.show(...)")
The visualisation of SparkSQL results inside a SparkSQLInterpreter work
fine, also in yarn-cluster mode.


Am Do., 7. Juni 2018 um 14:30 Uhr schrieb Thomas Bünger <
thom.bu...@googlemail.com>:

> Hey Jeff,
>
> I tried your changes and now it works nicely. Thank you very much!
>
> But I still can't use any of the forms and visualizations in yarn-cluster?
> I was hoping that this got resolved with the new SparkInterpreter so that
> I can switch from yarn-client to yarn-cluster mode in 0.8, but I'm still
> getting errors like
> "error: not found: value z"
>
> Was this not in scope of that change? Is this a bug? Or is it known
> limitation and also not supported in 0.8?
>
> Best regards,
>  Thomas
>
> Am Mi., 6. Juni 2018 um 03:28 Uhr schrieb Jeff Zhang <zjf...@gmail.com>:
>
>>
>> I can confirm that this is a bug, and created
>> https://issues.apache.org/jira/browse/ZEPPELIN-3531
>>
>> Will fix it soon
>>
>> Jeff Zhang <zjf...@gmail.com>于2018年6月5日周二 下午9:01写道：
>>
>>>
>>> hmm, it looks like a bug. I will check it tomorrow.
>>>
>>>
>>> Thomas Bünger <thom.bu...@googlemail.com>于2018年6月5日周二 下午8:56写道：
>>>
>>>> $ ls /usr/lib/spark/python/lib
>>>> py4j-0.10.6-src.zip  PY4J_LICENSE.txt  pyspark.zip
>>>>
>>>> So folder exists and contains both necessary zips. Please note, that in
>>>> local or yarn-client mode the files are properly picked up from that very
>>>> same location.
>>>>
>>>> How does yarn-cluster work under the hood? Could it be that environment
>>>> variables (like SPARK_HOME) are lost, because they are only available in my
>>>> local shell + zeppelin daemon process? Do I need to tell YARN somehow about
>>>> SPARK_HOME?
>>>>
>>>> Am Di., 5. Juni 2018 um 14:48 Uhr schrieb Jeff Zhang <zjf...@gmail.com
>>>> >:
>>>>
>>>>>
>>>>> Could you check whether there's folder /usr/lib/spark/python/lib ?
>>>>>
>>>>>
>>>>> Thomas Bünger <thom.bu...@googlemail.com>于2018年6月5日周二 下午8:45写道：
>>>>>
>>>>>>
>>>>>> sys.env
>>>>>> java.lang.NullPointerException at
>>>>>> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
>>>>>> at
>>>>>> org.apache.zeppelin.spark.NewSparkInterpreter.open(NewSparkInterpreter.java:90)
>>>>>> at
>>>>>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:62)
>>>>>> at
>>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
>>>>>> at
>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
>>>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
>>>>>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
>>>>>> at 
>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>>>>> at
>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>> at java.lang.Thread.run(Thread.java:748)
>>>>>>
>>>>>>
>>>>>> Am Di., 5. Juni 2018 um 14:41 Uhr schrieb Jeff Zhang <
>>>>>> zjf...@gmail.com>:
>>>>>>
>>>>>>> Could you paste the full stracktrace ?
>>>>>>>
>>>>>>>
>>>>>>> Thomas Bünger <thom.bu...@googlemail.com>于2018年6月5日周二 下午8:21写道：
>>>>>>>
>>>>>>>> I've tried the 0.8.0-rc4 on my EMR cluster using the preinstalled
>>>>>>>> version of spark under /usr/lib/spark.
>>>>>>>>
>>>>>>>> This works fine in local or yarn-client mode, but in yarn-cluster
>>>>>>>> mode i just get a
>>>>>>>>
>>>>>>>> java.lang.NullPointerException at
>>>>>>>> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
>>>>>>>>
>>>>>>>> Seems to be caused by an unsuccessful search for the py4j libraries.
>>>>>>>> I've made sure that SPARK_HOME is actually set in .bash_rc, in
>>>>>>>> zeppelin-env.sh and via the new %spark.conf, but somehow in the remote
>>>>>>>> interpreter, something odd is going on.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>  Thomas
>>>>>>>>
>>>>>>>

Re: NewSparkInterpreter fails on yarn-cluster

Reply via email to