Re: NewSparkInterpreter fails on yarn-cluster

2018-06-05 Thread Jeff Zhang
I can confirm that this is a bug, and created
https://issues.apache.org/jira/browse/ZEPPELIN-3531

Will fix it soon

Jeff Zhang 于2018年6月5日周二 下午9:01写道:

>
> hmm, it looks like a bug. I will check it tomorrow.
>
>
> Thomas Bünger 于2018年6月5日周二 下午8:56写道:
>
>> $ ls /usr/lib/spark/python/lib
>> py4j-0.10.6-src.zip  PY4J_LICENSE.txt  pyspark.zip
>>
>> So folder exists and contains both necessary zips. Please note, that in
>> local or yarn-client mode the files are properly picked up from that very
>> same location.
>>
>> How does yarn-cluster work under the hood? Could it be that environment
>> variables (like SPARK_HOME) are lost, because they are only available in my
>> local shell + zeppelin daemon process? Do I need to tell YARN somehow about
>> SPARK_HOME?
>>
>> Am Di., 5. Juni 2018 um 14:48 Uhr schrieb Jeff Zhang :
>>
>>>
>>> Could you check whether there's folder /usr/lib/spark/python/lib ?
>>>
>>>
>>> Thomas Bünger 于2018年6月5日周二 下午8:45写道:
>>>

 sys.env
 java.lang.NullPointerException at
 org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
 at
 org.apache.zeppelin.spark.NewSparkInterpreter.open(NewSparkInterpreter.java:90)
 at
 org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:62)
 at
 org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
 at
 org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
 at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
 org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)


 Am Di., 5. Juni 2018 um 14:41 Uhr schrieb Jeff Zhang >>> >:

> Could you paste the full stracktrace ?
>
>
> Thomas Bünger 于2018年6月5日周二 下午8:21写道:
>
>> I've tried the 0.8.0-rc4 on my EMR cluster using the preinstalled
>> version of spark under /usr/lib/spark.
>>
>> This works fine in local or yarn-client mode, but in yarn-cluster
>> mode i just get a
>>
>> java.lang.NullPointerException at
>> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
>>
>> Seems to be caused by an unsuccessful search for the py4j libraries.
>> I've made sure that SPARK_HOME is actually set in .bash_rc, in
>> zeppelin-env.sh and via the new %spark.conf, but somehow in the remote
>> interpreter, something odd is going on.
>>
>> Best regards,
>>  Thomas
>>
>


Re: Difficult paths for Zeppelin and Nginx integration

2018-06-05 Thread Sam Nicholson
 One more thing worth while noting is - if I pass public IP address of
Zeppelin, it fails; but if private IP is passed, it works. This is
specifically observed on Azure VM and AWS EC2 instances.

Yes,  On AWS and Azure this is to be expected, and desired.

Public IP  ---|--- nginx ---|--- Private IP

Even if the IPs are on the same network interface.
AWS routing and switching is interesting.  Multiple methods/generations of
patterns, and designed to not leak private address space.
Azure has different behavior, but similar results.


On Mon, Jun 4, 2018 at 9:26 PM, Sanket Shah  wrote:

> Thanks Sam and Fabien for sharing the snippets. Fabien's solution didn't
> worked, but Sam's solution got me success.
>
> location /zeppelin/ {
> proxy_pass http://10.0.1.1:8080/;proxy_set_header X-Real-IP $remote_addr;
> proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
> proxy_set_header Host $http_host;
> proxy_set_header X-NginX-Proxy true;
> proxy_redirect off;
> }
>
> location /zeppelin/ws {
> proxy_pass http://10.0.1.1:8080/ws;proxy_set_header Host $http_host;
> proxy_set_header X-Real-IP $proxy_protocol_addr;
> proxy_set_header X-Forwarded-For $proxy_protocol_addr;
> proxy_http_version 1.1;
> proxy_set_header Upgrade websocket;
> proxy_set_header Connection upgrade;
> proxy_read_timeout 5s;
> proxy_connect_timeout 5s;
> proxy_redirect off;
> }
>
>
> One more thing worth while noting is - if I pass public IP address of
> Zeppelin, it fails; but if private IP is passed, it works. This is
> specifically observed on Azure VM and AWS EC2 instances.
>
>
> *Sanket Tarun Shah - **Enterprise Architect*
> +91 98793 56075 |  sanket.s...@outlook.com 
> (LinkedIn )
> --
> *From:* Sam Nicholson 
> *Sent:* 05 June 2018 04:53 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Re: Difficult paths for Zeppelin and Nginx integration
>
>
> Here's the zeppelin from my currently, working, config
> I have changed my DNS domains to "internal" and "external"
> Other than that, it's really verbatim.
>
> server {
> listen 443 ssl http2;
> server_name  zeppelin.external;
> ssl_certificate /etc/certs/fullchain.cer;
> ssl_certificate_key /etc/certs/cert.key;
> location / {
> proxy_pass   http://zeppelin.internal:6800;
> proxy_set_header Host$host;
> proxy_set_header X-Real-IP   $proxy_protocol_addr;
> proxy_set_header X-Forwarded-For $proxy_protocol_addr;
> }
> location /ws {
> proxy_pass   http://zeppelin.internal:6800;
> proxy_set_header Host$host;
> proxy_set_header X-Real-IP   $proxy_protocol_addr;
> proxy_set_header X-Forwarded-For $proxy_protocol_addr;
> proxy_http_version 1.1;
> proxy_set_header Upgrade websocket;
> proxy_set_header Connection upgrade;
> proxy_read_timeout 86400;
> }
> }
>
>
> On Mon, Jun 4, 2018 at 12:58 PM, Sanket Shah 
> wrote:
>
> Am trying to put Nginx in front of Zeppelin. Regular requests are passing
> through, but Websockets are not working. Followed this based on guide of
> Zeppelin - https://zeppelin.apache.org/docs/0.7.3/security/authentica
> tion.html. Seems having a real tough luck to get this going as scratching
> head and pulling hairs 😞
> Apache Zeppelin 0.7.3 Documentation: Authentication for NGINX
> 
> There are multiple ways to enable authentication in Apache Zeppelin. This
> page describes HTTP basic auth using NGINX.
> zeppelin.apache.org
>
> Below is excerpt of my configuration:
>
> location /zeppelin/ {
> proxy_pass http://104.211.216.218:8080/;proxy_set_header X-Real-IP 
> $remote_addr;
> proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
> proxy_set_header Host $http_host;
> proxy_set_header X-NginX-Proxy true;
> proxy_redirect off;
> }
>
> location /zeppelin/ws {
> proxy_pass http://104.211.216.218:8080/ws;proxy_http_version 1.1;
> proxy_set_header Upgrade websocket;
> proxy_set_header Connection upgrade;
> proxy_read_timeout 86400;
> }
>
>
>
>
> *Sanket Tarun Shah - **Enterprise Architect*
> +91 98793 56075 |  sanket.s...@outlook.com 
> (LinkedIn )
>
>
>


Re: NewSparkInterpreter fails on yarn-cluster

2018-06-05 Thread Jeff Zhang
hmm, it looks like a bug. I will check it tomorrow.


Thomas Bünger 于2018年6月5日周二 下午8:56写道:

> $ ls /usr/lib/spark/python/lib
> py4j-0.10.6-src.zip  PY4J_LICENSE.txt  pyspark.zip
>
> So folder exists and contains both necessary zips. Please note, that in
> local or yarn-client mode the files are properly picked up from that very
> same location.
>
> How does yarn-cluster work under the hood? Could it be that environment
> variables (like SPARK_HOME) are lost, because they are only available in my
> local shell + zeppelin daemon process? Do I need to tell YARN somehow about
> SPARK_HOME?
>
> Am Di., 5. Juni 2018 um 14:48 Uhr schrieb Jeff Zhang :
>
>>
>> Could you check whether there's folder /usr/lib/spark/python/lib ?
>>
>>
>> Thomas Bünger 于2018年6月5日周二 下午8:45写道:
>>
>>>
>>> sys.env
>>> java.lang.NullPointerException at
>>> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
>>> at
>>> org.apache.zeppelin.spark.NewSparkInterpreter.open(NewSparkInterpreter.java:90)
>>> at
>>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:62)
>>> at
>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
>>> at
>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
>>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>> at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>> at java.lang.Thread.run(Thread.java:748)
>>>
>>>
>>> Am Di., 5. Juni 2018 um 14:41 Uhr schrieb Jeff Zhang :
>>>
 Could you paste the full stracktrace ?


 Thomas Bünger 于2018年6月5日周二 下午8:21写道:

> I've tried the 0.8.0-rc4 on my EMR cluster using the preinstalled
> version of spark under /usr/lib/spark.
>
> This works fine in local or yarn-client mode, but in yarn-cluster mode
> i just get a
>
> java.lang.NullPointerException at
> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
>
> Seems to be caused by an unsuccessful search for the py4j libraries.
> I've made sure that SPARK_HOME is actually set in .bash_rc, in
> zeppelin-env.sh and via the new %spark.conf, but somehow in the remote
> interpreter, something odd is going on.
>
> Best regards,
>  Thomas
>



Re: NewSparkInterpreter fails on yarn-cluster

2018-06-05 Thread Thomas Bünger
$ ls /usr/lib/spark/python/lib
py4j-0.10.6-src.zip  PY4J_LICENSE.txt  pyspark.zip

So folder exists and contains both necessary zips. Please note, that in
local or yarn-client mode the files are properly picked up from that very
same location.

How does yarn-cluster work under the hood? Could it be that environment
variables (like SPARK_HOME) are lost, because they are only available in my
local shell + zeppelin daemon process? Do I need to tell YARN somehow about
SPARK_HOME?

Am Di., 5. Juni 2018 um 14:48 Uhr schrieb Jeff Zhang :

>
> Could you check whether there's folder /usr/lib/spark/python/lib ?
>
>
> Thomas Bünger 于2018年6月5日周二 下午8:45写道:
>
>>
>> sys.env
>> java.lang.NullPointerException at
>> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
>> at
>> org.apache.zeppelin.spark.NewSparkInterpreter.open(NewSparkInterpreter.java:90)
>> at
>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:62)
>> at
>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
>> at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
>> at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>>
>>
>> Am Di., 5. Juni 2018 um 14:41 Uhr schrieb Jeff Zhang :
>>
>>> Could you paste the full stracktrace ?
>>>
>>>
>>> Thomas Bünger 于2018年6月5日周二 下午8:21写道:
>>>
 I've tried the 0.8.0-rc4 on my EMR cluster using the preinstalled
 version of spark under /usr/lib/spark.

 This works fine in local or yarn-client mode, but in yarn-cluster mode
 i just get a

 java.lang.NullPointerException at
 org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)

 Seems to be caused by an unsuccessful search for the py4j libraries.
 I've made sure that SPARK_HOME is actually set in .bash_rc, in
 zeppelin-env.sh and via the new %spark.conf, but somehow in the remote
 interpreter, something odd is going on.

 Best regards,
  Thomas

>>>


Re: NewSparkInterpreter fails on yarn-cluster

2018-06-05 Thread Jeff Zhang
Could you check whether there's folder /usr/lib/spark/python/lib ?


Thomas Bünger 于2018年6月5日周二 下午8:45写道:

>
> sys.env
> java.lang.NullPointerException at
> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
> at
> org.apache.zeppelin.spark.NewSparkInterpreter.open(NewSparkInterpreter.java:90)
> at
> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:62)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
> Am Di., 5. Juni 2018 um 14:41 Uhr schrieb Jeff Zhang :
>
>> Could you paste the full stracktrace ?
>>
>>
>> Thomas Bünger 于2018年6月5日周二 下午8:21写道:
>>
>>> I've tried the 0.8.0-rc4 on my EMR cluster using the preinstalled
>>> version of spark under /usr/lib/spark.
>>>
>>> This works fine in local or yarn-client mode, but in yarn-cluster mode i
>>> just get a
>>>
>>> java.lang.NullPointerException at
>>> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
>>>
>>> Seems to be caused by an unsuccessful search for the py4j libraries.
>>> I've made sure that SPARK_HOME is actually set in .bash_rc, in
>>> zeppelin-env.sh and via the new %spark.conf, but somehow in the remote
>>> interpreter, something odd is going on.
>>>
>>> Best regards,
>>>  Thomas
>>>
>>


Re: NewSparkInterpreter fails on yarn-cluster

2018-06-05 Thread Thomas Bünger
sys.env
java.lang.NullPointerException at
org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
at
org.apache.zeppelin.spark.NewSparkInterpreter.open(NewSparkInterpreter.java:90)
at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:62)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


Am Di., 5. Juni 2018 um 14:41 Uhr schrieb Jeff Zhang :

> Could you paste the full stracktrace ?
>
>
> Thomas Bünger 于2018年6月5日周二 下午8:21写道:
>
>> I've tried the 0.8.0-rc4 on my EMR cluster using the preinstalled version
>> of spark under /usr/lib/spark.
>>
>> This works fine in local or yarn-client mode, but in yarn-cluster mode i
>> just get a
>>
>> java.lang.NullPointerException at
>> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
>>
>> Seems to be caused by an unsuccessful search for the py4j libraries.
>> I've made sure that SPARK_HOME is actually set in .bash_rc, in
>> zeppelin-env.sh and via the new %spark.conf, but somehow in the remote
>> interpreter, something odd is going on.
>>
>> Best regards,
>>  Thomas
>>
>


Re: NewSparkInterpreter fails on yarn-cluster

2018-06-05 Thread Jeff Zhang
Could you paste the full stracktrace ?


Thomas Bünger 于2018年6月5日周二 下午8:21写道:

> I've tried the 0.8.0-rc4 on my EMR cluster using the preinstalled version
> of spark under /usr/lib/spark.
>
> This works fine in local or yarn-client mode, but in yarn-cluster mode i
> just get a
>
> java.lang.NullPointerException at
> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
>
> Seems to be caused by an unsuccessful search for the py4j libraries.
> I've made sure that SPARK_HOME is actually set in .bash_rc, in
> zeppelin-env.sh and via the new %spark.conf, but somehow in the remote
> interpreter, something odd is going on.
>
> Best regards,
>  Thomas
>


NewSparkInterpreter fails on yarn-cluster

2018-06-05 Thread Thomas Bünger
I've tried the 0.8.0-rc4 on my EMR cluster using the preinstalled version
of spark under /usr/lib/spark.

This works fine in local or yarn-client mode, but in yarn-cluster mode i
just get a

java.lang.NullPointerException at
org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)

Seems to be caused by an unsuccessful search for the py4j libraries.
I've made sure that SPARK_HOME is actually set in .bash_rc, in
zeppelin-env.sh and via the new %spark.conf, but somehow in the remote
interpreter, something odd is going on.

Best regards,
 Thomas