RE: how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-07 Thread Manuel Sopena Ballesteros
Sorry for the stupid question

How can I use pip? Zeppelin will run pip through the shell interpreter but my 
system global python is 2.6…


[cid:image002.jpg@01D3FF37.8827CBF0]

thanks

Manuel

From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Friday, June 8, 2018 1:45 PM
To: users@zeppelin.apache.org
Subject: Re: how to load pandas into pyspark (centos 6 with python 2.6)


pip should be available under your python3.6.5, you can use that to install 
pandas


Manuel Sopena Ballesteros 
mailto:manuel...@garvan.org.au>>于2018年6月8日周五 上午11:40写道:
Hi Jeff,

Thank you very much for your quick response. My zeppelin is deployed using HDP 
(hortonworks platform) so I already have spark/yarn integration and I am using 
zeppelin.pyspark.python to tell pyspark to run python 3.6:

zeppelin.pyspark.python --> /tmp/Python-3.6.5/python

I do have root access to the machine but OS is centos 6 (python system 
environment is 2.6) hence pip is not available

Thank you

Manuel

From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Friday, June 8, 2018 11:47 AM
To: users@zeppelin.apache.org
Subject: Re: how to load pandas into pyspark (centos 6 with python 2.6)


First I would suggest you to use python 2.7 or python 3.x, because spark2.x has 
drop the support of python 2.6.
Second you need to configure PYSPARK_PYTHON in spark interpreter setting to 
point to the python that you installed. (I don't know what do you mena that you 
can't install pandas system wide). Do you mean you are not root and don't have 
permission to install python packages ?



Manuel Sopena Ballesteros 
mailto:manuel...@garvan.org.au>>于2018年6月8日周五 上午9:26写道:
Dear Zeppelin community,

I am trying to load pandas into my zeppelin %spark2.pyspark interpreter. The 
system I am using is centos 6 with python 2.6 so I can’t install pandas system 
wide through pip as suggested in the documentation.

What can I do if I want to add modules into the %spark2.pyspark interpreter?

Thank you very much

Manuel Sopena Ballesteros | Big data Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 
2010
T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 
8507 | E: 
manuel...@garvan.org.au

NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.
NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.
NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.


Re: how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-07 Thread Jeff Zhang
pip should be available under your python3.6.5, you can use that to install
pandas


Manuel Sopena Ballesteros 于2018年6月8日周五 上午11:40写道:

> Hi Jeff,
>
>
>
> Thank you very much for your quick response. My zeppelin is deployed using
> HDP (hortonworks platform) so I already have spark/yarn integration and I
> am using zeppelin.pyspark.python to tell pyspark to run python 3.6:
>
>
>
> zeppelin.pyspark.python à /tmp/Python-3.6.5/python
>
>
>
> I do have root access to the machine but OS is centos 6 (python system
> environment is 2.6) hence pip is not available
>
>
>
> Thank you
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjf...@gmail.com]
> *Sent:* Friday, June 8, 2018 11:47 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Re: how to load pandas into pyspark (centos 6 with python 2.6)
>
>
>
>
>
> First I would suggest you to use python 2.7 or python 3.x, because
> spark2.x has drop the support of python 2.6.
>
> Second you need to configure PYSPARK_PYTHON in spark interpreter setting
> to point to the python that you installed. (I don't know what do you mena
> that you can't install pandas system wide). Do you mean you are not root
> and don't have permission to install python packages ?
>
>
>
>
>
>
>
> Manuel Sopena Ballesteros 于2018年6月8日周五 上午9:26写道:
>
> Dear Zeppelin community,
>
>
>
> I am trying to load pandas into my zeppelin %spark2.pyspark interpreter.
> The system I am using is centos 6 with python 2.6 so I can’t install pandas
> system wide through pip as suggested in the documentation.
>
>
>
> What can I do if I want to add modules into the %spark2.pyspark
> interpreter?
>
>
>
> Thank you very much
>
>
>
> *Manuel Sopena Ballesteros *| Big data Engineer
> *Garvan Institute of Medical Research *
> The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
> 
> *T:* + 61 (0)2 9355 5760 <+61%202%209355%205760> | *F:* +61 (0)2 9295 8507
> <+61%202%209295%208507> | *E:* manuel...@garvan.org.au
>
>
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>


RE: how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-07 Thread Manuel Sopena Ballesteros
Hi Jeff,

Thank you very much for your quick response. My zeppelin is deployed using HDP 
(hortonworks platform) so I already have spark/yarn integration and I am using 
zeppelin.pyspark.python to tell pyspark to run python 3.6:

zeppelin.pyspark.python --> /tmp/Python-3.6.5/python

I do have root access to the machine but OS is centos 6 (python system 
environment is 2.6) hence pip is not available

Thank you

Manuel

From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Friday, June 8, 2018 11:47 AM
To: users@zeppelin.apache.org
Subject: Re: how to load pandas into pyspark (centos 6 with python 2.6)


First I would suggest you to use python 2.7 or python 3.x, because spark2.x has 
drop the support of python 2.6.
Second you need to configure PYSPARK_PYTHON in spark interpreter setting to 
point to the python that you installed. (I don't know what do you mena that you 
can't install pandas system wide). Do you mean you are not root and don't have 
permission to install python packages ?



Manuel Sopena Ballesteros 
mailto:manuel...@garvan.org.au>>于2018年6月8日周五 上午9:26写道:
Dear Zeppelin community,

I am trying to load pandas into my zeppelin %spark2.pyspark interpreter. The 
system I am using is centos 6 with python 2.6 so I can’t install pandas system 
wide through pip as suggested in the documentation.

What can I do if I want to add modules into the %spark2.pyspark interpreter?

Thank you very much

Manuel Sopena Ballesteros | Big data Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 
2010
T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 
8507 | E: 
manuel...@garvan.org.au

NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.
NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.


Re: how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-07 Thread Jeff Zhang
First I would suggest you to use python 2.7 or python 3.x, because spark2.x
has drop the support of python 2.6.
Second you need to configure PYSPARK_PYTHON in spark interpreter setting to
point to the python that you installed. (I don't know what do you mena that
you can't install pandas system wide). Do you mean you are not root and
don't have permission to install python packages ?



Manuel Sopena Ballesteros 于2018年6月8日周五 上午9:26写道:

> Dear Zeppelin community,
>
>
>
> I am trying to load pandas into my zeppelin %spark2.pyspark interpreter.
> The system I am using is centos 6 with python 2.6 so I can’t install pandas
> system wide through pip as suggested in the documentation.
>
>
>
> What can I do if I want to add modules into the %spark2.pyspark
> interpreter?
>
>
>
> Thank you very much
>
>
>
> *Manuel Sopena Ballesteros *| Big data Engineer
> *Garvan Institute of Medical Research *
> The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
> 
> *T:* + 61 (0)2 9355 5760 <+61%202%209355%205760> | *F:* +61 (0)2 9295 8507
> <+61%202%209295%208507> | *E:* manuel...@garvan.org.au
>
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>


how to load pandas into pyspark (centos 6 with python 2.6)

2018-06-07 Thread Manuel Sopena Ballesteros
Dear Zeppelin community,

I am trying to load pandas into my zeppelin %spark2.pyspark interpreter. The 
system I am using is centos 6 with python 2.6 so I can't install pandas system 
wide through pip as suggested in the documentation.

What can I do if I want to add modules into the %spark2.pyspark interpreter?

Thank you very much

Manuel Sopena Ballesteros | Big data Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 8507 | E: 
manuel...@garvan.org.au

NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.


Re: NewSparkInterpreter fails on yarn-cluster

2018-06-07 Thread Jeff Zhang
Hi Thomas,

I try to the latest branch-0.8, it works for me. Could you try again to
verify it ?


Thomas Bünger 于2018年6月7日周四 下午8:34写道:

> I specifically mean visualisation via ZeppelinContext inside a Spark
> interpreter. (e.g. "z.show(...)")
> The visualisation of SparkSQL results inside a SparkSQLInterpreter work
> fine, also in yarn-cluster mode.
>
> Am Do., 7. Juni 2018 um 14:30 Uhr schrieb Thomas Bünger <
> thom.bu...@googlemail.com>:
>
>> Hey Jeff,
>>
>> I tried your changes and now it works nicely. Thank you very much!
>>
>> But I still can't use any of the forms and visualizations in yarn-cluster?
>> I was hoping that this got resolved with the new SparkInterpreter so that
>> I can switch from yarn-client to yarn-cluster mode in 0.8, but I'm still
>> getting errors like
>> "error: not found: value z"
>>
>> Was this not in scope of that change? Is this a bug? Or is it known
>> limitation and also not supported in 0.8?
>>
>> Best regards,
>>  Thomas
>>
>> Am Mi., 6. Juni 2018 um 03:28 Uhr schrieb Jeff Zhang :
>>
>>>
>>> I can confirm that this is a bug, and created
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3531
>>>
>>> Will fix it soon
>>>
>>> Jeff Zhang 于2018年6月5日周二 下午9:01写道:
>>>

 hmm, it looks like a bug. I will check it tomorrow.


 Thomas Bünger 于2018年6月5日周二 下午8:56写道:

> $ ls /usr/lib/spark/python/lib
> py4j-0.10.6-src.zip  PY4J_LICENSE.txt  pyspark.zip
>
> So folder exists and contains both necessary zips. Please note, that
> in local or yarn-client mode the files are properly picked up from that
> very same location.
>
> How does yarn-cluster work under the hood? Could it be that
> environment variables (like SPARK_HOME) are lost, because they are only
> available in my local shell + zeppelin daemon process? Do I need to tell
> YARN somehow about SPARK_HOME?
>
> Am Di., 5. Juni 2018 um 14:48 Uhr schrieb Jeff Zhang  >:
>
>>
>> Could you check whether there's folder /usr/lib/spark/python/lib ?
>>
>>
>> Thomas Bünger 于2018年6月5日周二 下午8:45写道:
>>
>>>
>>> sys.env
>>> java.lang.NullPointerException at
>>> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
>>> at
>>> org.apache.zeppelin.spark.NewSparkInterpreter.open(NewSparkInterpreter.java:90)
>>> at
>>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:62)
>>> at
>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
>>> at
>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
>>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
>>> at 
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>> at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>> at java.lang.Thread.run(Thread.java:748)
>>>
>>>
>>> Am Di., 5. Juni 2018 um 14:41 Uhr schrieb Jeff Zhang <
>>> zjf...@gmail.com>:
>>>
 Could you paste the full stracktrace ?


 Thomas Bünger 于2018年6月5日周二 下午8:21写道:

> I've tried the 0.8.0-rc4 on my EMR cluster using the preinstalled
> version of spark under /usr/lib/spark.
>
> This works fine in local or yarn-client mode, but in yarn-cluster
> mode i just get a
>
> java.lang.NullPointerException at
> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
>
> Seems to be caused by an unsuccessful search for the py4j
> libraries.
> I've made sure that SPARK_HOME is actually set in .bash_rc, in
> zeppelin-env.sh and via the new %spark.conf, but somehow in the remote
> interpreter, something odd is going on.
>
> Best regards,
>  Thomas
>



Re: Zeppelin 0.8

2018-06-07 Thread Jianfeng (Jeff) Zhang

I am doing the release, the latest RC4 is canceled, I will start RC5 in the 
next few days.


Best Regard,
Jeff Zhang


From: Benjamin Kim mailto:bbuil...@gmail.com>>
Reply-To: "users@zeppelin.apache.org" 
mailto:users@zeppelin.apache.org>>
Date: Thursday, June 7, 2018 at 10:52 PM
To: "users@zeppelin.apache.org" 
mailto:users@zeppelin.apache.org>>
Subject: Re: Zeppelin 0.8

Can anyone tell me what the status is for 0.8 release?

On May 2, 2018, at 4:43 PM, Jeff Zhang 
mailto:zjf...@gmail.com>> wrote:


Yes, 0.8 will support spark 2.3

Benjamin Kim mailto:bbuil...@gmail.com>>于2018年5月3日周四 
上午1:59写道:
Will Zeppelin 0.8 have Spark 2.3 support?

On Apr 30, 2018, at 1:27 AM, Rotem Herzberg 
mailto:rotem.herzb...@gigaspaces.com>> wrote:

Thanks

On Mon, Apr 30, 2018 at 11:16 AM, Jeff Zhang 
mailto:zjf...@gmail.com>> wrote:

I am preparing the RC for 0.8


Rotem Herzberg 
mailto:rotem.herzb...@gigaspaces.com>>于2018年4月30日周一
 下午3:57写道:
Hi,

What is the release date for Zeppelin 0.8? (support for spark 2.3)

Thanks,

--
[https://storage.googleapis.com/signaturesatori/customer-C00h6qxzk/images/-1a2e6b7d626eb2978e380ac586ed5981ba86c4c70b6c557c571908c4cdb8fd0e.png]
Rotem Herzberg
SW Engineer | GigaSpaces Technologies

rotem.herzb...@gigaspaces.com  | M 
+972547718880

 
[https://storage.googleapis.com/signaturesatori/customer-C00h6qxzk/images/4f4d7968f470731f0ef8b9da14b9cf7ba778a53deac7757e1b2b57812426897f.png]
 
[https://storage.googleapis.com/signaturesatori/customer-C00h6qxzk/images/-59956a2e877aa01955fa046b46b752af4aa755b370b46f9b9ac1d5a319c6a2b2.png]
 
[https://storage.googleapis.com/signaturesatori/customer-C00h6qxzk/images/-78d2ddbda5535a86836b182f31495a57c6890d8d5947584e89cf9ab60269a29b.png]
 




--
[https://storage.googleapis.com/signaturesatori/customer-C00h6qxzk/images/-1a2e6b7d626eb2978e380ac586ed5981ba86c4c70b6c557c571908c4cdb8fd0e.png]
Rotem Herzberg
SW Engineer | GigaSpaces Technologies

rotem.herzb...@gigaspaces.com  | M 
+972547718880

 
[https://storage.googleapis.com/signaturesatori/customer-C00h6qxzk/images/4f4d7968f470731f0ef8b9da14b9cf7ba778a53deac7757e1b2b57812426897f.png]
 
[https://storage.googleapis.com/signaturesatori/customer-C00h6qxzk/images/-59956a2e877aa01955fa046b46b752af4aa755b370b46f9b9ac1d5a319c6a2b2.png]
 
[https://storage.googleapis.com/signaturesatori/customer-C00h6qxzk/images/-78d2ddbda5535a86836b182f31495a57c6890d8d5947584e89cf9ab60269a29b.png]
 





Re: Zeppelin 0.8

2018-06-07 Thread Benjamin Kim
Can anyone tell me what the status is for 0.8 release?

> On May 2, 2018, at 4:43 PM, Jeff Zhang  wrote:
> 
> 
> Yes, 0.8 will support spark 2.3
> 
> Benjamin Kim mailto:bbuil...@gmail.com>>于2018年5月3日周四 
> 上午1:59写道:
> Will Zeppelin 0.8 have Spark 2.3 support?
> 
>> On Apr 30, 2018, at 1:27 AM, Rotem Herzberg > > wrote:
>> 
>> Thanks
>> 
>> On Mon, Apr 30, 2018 at 11:16 AM, Jeff Zhang > > wrote:
>> 
>> I am preparing the RC for 0.8
>> 
>> 
>> Rotem Herzberg > >于2018年4月30日周一 下午3:57写道:
>> Hi,
>> 
>> What is the release date for Zeppelin 0.8? (support for spark 2.3)
>> 
>> Thanks,
>> 
>> -- 
>>    
>> Rotem Herzberg
>> SW Engineer | GigaSpaces Technologies
>>  
>> rotem.herzb...@gigaspaces.com    | M 
>> +972547718880 <> 
>> 
>>   
>> 
>> 
>> 
>> 
>> -- 
>>    
>> Rotem Herzberg
>> SW Engineer | GigaSpaces Technologies
>>  
>> rotem.herzb...@gigaspaces.com    | M 
>> +972547718880 <> 
>> 
>>   
>> 
>> 



Re: Credentials for JDBC

2018-06-07 Thread Benjamin Kim
Hi 종열,

Can you show me how?

Thanks,
Ben


> On Jun 6, 2018, at 10:32 PM, Jongyoul Lee  wrote:
> 
> We have a trick to get credential information from a credential page. I'll 
> take into it.
> 
> On Thu, Jun 7, 2018 at 7:53 AM, Benjamin Kim  > wrote:
> I created a JDBC interpreter for AWS Athena, and it passes the access key as 
> UID and secret key as PWD in the URL connection string. Does anyone know if I 
> can setup each user to pass their own credentials in a, sort of, credentials 
> file or config?
> 
> Thanks,
> Ben
> 
> 
> 
> -- 
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net 



Re: NewSparkInterpreter fails on yarn-cluster

2018-06-07 Thread Thomas Bünger
I specifically mean visualisation via ZeppelinContext inside a Spark
interpreter. (e.g. "z.show(...)")
The visualisation of SparkSQL results inside a SparkSQLInterpreter work
fine, also in yarn-cluster mode.

Am Do., 7. Juni 2018 um 14:30 Uhr schrieb Thomas Bünger <
thom.bu...@googlemail.com>:

> Hey Jeff,
>
> I tried your changes and now it works nicely. Thank you very much!
>
> But I still can't use any of the forms and visualizations in yarn-cluster?
> I was hoping that this got resolved with the new SparkInterpreter so that
> I can switch from yarn-client to yarn-cluster mode in 0.8, but I'm still
> getting errors like
> "error: not found: value z"
>
> Was this not in scope of that change? Is this a bug? Or is it known
> limitation and also not supported in 0.8?
>
> Best regards,
>  Thomas
>
> Am Mi., 6. Juni 2018 um 03:28 Uhr schrieb Jeff Zhang :
>
>>
>> I can confirm that this is a bug, and created
>> https://issues.apache.org/jira/browse/ZEPPELIN-3531
>>
>> Will fix it soon
>>
>> Jeff Zhang 于2018年6月5日周二 下午9:01写道:
>>
>>>
>>> hmm, it looks like a bug. I will check it tomorrow.
>>>
>>>
>>> Thomas Bünger 于2018年6月5日周二 下午8:56写道:
>>>
 $ ls /usr/lib/spark/python/lib
 py4j-0.10.6-src.zip  PY4J_LICENSE.txt  pyspark.zip

 So folder exists and contains both necessary zips. Please note, that in
 local or yarn-client mode the files are properly picked up from that very
 same location.

 How does yarn-cluster work under the hood? Could it be that environment
 variables (like SPARK_HOME) are lost, because they are only available in my
 local shell + zeppelin daemon process? Do I need to tell YARN somehow about
 SPARK_HOME?

 Am Di., 5. Juni 2018 um 14:48 Uhr schrieb Jeff Zhang >>> >:

>
> Could you check whether there's folder /usr/lib/spark/python/lib ?
>
>
> Thomas Bünger 于2018年6月5日周二 下午8:45写道:
>
>>
>> sys.env
>> java.lang.NullPointerException at
>> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
>> at
>> org.apache.zeppelin.spark.NewSparkInterpreter.open(NewSparkInterpreter.java:90)
>> at
>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:62)
>> at
>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
>> at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
>> at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
>> at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>>
>>
>> Am Di., 5. Juni 2018 um 14:41 Uhr schrieb Jeff Zhang <
>> zjf...@gmail.com>:
>>
>>> Could you paste the full stracktrace ?
>>>
>>>
>>> Thomas Bünger 于2018年6月5日周二 下午8:21写道:
>>>
 I've tried the 0.8.0-rc4 on my EMR cluster using the preinstalled
 version of spark under /usr/lib/spark.

 This works fine in local or yarn-client mode, but in yarn-cluster
 mode i just get a

 java.lang.NullPointerException at
 org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)

 Seems to be caused by an unsuccessful search for the py4j libraries.
 I've made sure that SPARK_HOME is actually set in .bash_rc, in
 zeppelin-env.sh and via the new %spark.conf, but somehow in the remote
 interpreter, something odd is going on.

 Best regards,
  Thomas

>>>


Re: NewSparkInterpreter fails on yarn-cluster

2018-06-07 Thread Thomas Bünger
Hey Jeff,

I tried your changes and now it works nicely. Thank you very much!

But I still can't use any of the forms and visualizations in yarn-cluster?
I was hoping that this got resolved with the new SparkInterpreter so that I
can switch from yarn-client to yarn-cluster mode in 0.8, but I'm still
getting errors like
"error: not found: value z"

Was this not in scope of that change? Is this a bug? Or is it known
limitation and also not supported in 0.8?

Best regards,
 Thomas

Am Mi., 6. Juni 2018 um 03:28 Uhr schrieb Jeff Zhang :

>
> I can confirm that this is a bug, and created
> https://issues.apache.org/jira/browse/ZEPPELIN-3531
>
> Will fix it soon
>
> Jeff Zhang 于2018年6月5日周二 下午9:01写道:
>
>>
>> hmm, it looks like a bug. I will check it tomorrow.
>>
>>
>> Thomas Bünger 于2018年6月5日周二 下午8:56写道:
>>
>>> $ ls /usr/lib/spark/python/lib
>>> py4j-0.10.6-src.zip  PY4J_LICENSE.txt  pyspark.zip
>>>
>>> So folder exists and contains both necessary zips. Please note, that in
>>> local or yarn-client mode the files are properly picked up from that very
>>> same location.
>>>
>>> How does yarn-cluster work under the hood? Could it be that environment
>>> variables (like SPARK_HOME) are lost, because they are only available in my
>>> local shell + zeppelin daemon process? Do I need to tell YARN somehow about
>>> SPARK_HOME?
>>>
>>> Am Di., 5. Juni 2018 um 14:48 Uhr schrieb Jeff Zhang :
>>>

 Could you check whether there's folder /usr/lib/spark/python/lib ?


 Thomas Bünger 于2018年6月5日周二 下午8:45写道:

>
> sys.env
> java.lang.NullPointerException at
> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
> at
> org.apache.zeppelin.spark.NewSparkInterpreter.open(NewSparkInterpreter.java:90)
> at
> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:62)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
> Am Di., 5. Juni 2018 um 14:41 Uhr schrieb Jeff Zhang  >:
>
>> Could you paste the full stracktrace ?
>>
>>
>> Thomas Bünger 于2018年6月5日周二 下午8:21写道:
>>
>>> I've tried the 0.8.0-rc4 on my EMR cluster using the preinstalled
>>> version of spark under /usr/lib/spark.
>>>
>>> This works fine in local or yarn-client mode, but in yarn-cluster
>>> mode i just get a
>>>
>>> java.lang.NullPointerException at
>>> org.apache.zeppelin.spark.NewSparkInterpreter.setupConfForPySpark(NewSparkInterpreter.java:149)
>>>
>>> Seems to be caused by an unsuccessful search for the py4j libraries.
>>> I've made sure that SPARK_HOME is actually set in .bash_rc, in
>>> zeppelin-env.sh and via the new %spark.conf, but somehow in the remote
>>> interpreter, something odd is going on.
>>>
>>> Best regards,
>>>  Thomas
>>>
>>