Re: User Impersonation Configuration

2017-05-10 Thread Prabhjyot Singh
>From the exception, it looks like you are on 0.7.1. Which has the
above-mentioned patch (https://github.com/apache/zeppelin/pull/1840).

Without ZEPPELIN_IMPERSONATE_CMD and ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER
set it should work just fine. However, if you want to setup password-less
login, you can use this doc, its was written for Zeppelin-0.6.0 but it
works fine
https://community.hortonworks.com/content/kbentry/81069/how-to-enable-user-impersonation-for-sh-interprete.html
.

Also, can you quickly check if you are able to connect to spark with user
impersonation?


On 10 May 2017 at 23:05, Yeshwanth Jagini  wrote:

> Hi prabhjyot,
> thanks for your reply.
>
> i am using zeppelin 0.7.0 version.
> when i do not specify impersonation config in zeppelin-env.sh and only in
> interpreter setting,
> it is throwing following  exception
>
> ERROR [2017-05-10 17:26:30,551] ({pool-2-thread-3} Job.java[run]:188) -
> Job failed
> org.apache.zeppelin.interpreter.InterpreterException: Host key
> verification failed.
>
> at org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProces
> s.start(RemoteInterpreterManagedProcess.java:143)
> at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.
> reference(RemoteInterpreterProcess.java:73)
> at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(
> RemoteInterpreter.java:258)
> at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(
> RemoteInterpreter.java:423)
> at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(
> LazyOpenInterpreter.java:106)
> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
> at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(
> RemoteScheduler.java:329)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> at java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> i am running zeppelin as root user, root user doesn't had a password less
> ssh setup where as the end web user *user1* has.
>
> how should i proceed now?
>
> Thanks,
> Yeshwanth Jagini
>
>
>
>
>
>
>
>
>
>
> On Wed, May 10, 2017 at 1:45 AM, Prabhjyot Singh <
> prabhjyotsi...@apache.org> wrote:
>
>> Hi Yeshwant,
>>
>> Which version of Zeppelin are you on?
>>
>> If you are on latest then you don't need to do any of 
>> ZEPPELIN_IMPERSONATE_CMD
>> or ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER. Just by enabling User
>> Impersonation check-box should be sufficient.
>>
>> Can you confirm by `ps aux | grep spark`. This is what I see on my
>> machine;
>>
>> prabhjyotsingh@MACHINE:~/ps-zeppelin/logs$ ps aux | grep spark
>> prabhjyotsingh2496   0.2  3.9  5179540 657660 s000  S12:08PM
>> 0:30.68 
>> /Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/bin/java
>> -cp /Users/prabhjyotsingh/ps-zeppelin/interpreter/spark/*:/Users
>> /prabhjyotsingh/ps-zeppelin/zeppelin-interpreter/target/
>> lib/*:/Users/prabhjyotsingh/ps-zeppelin/zeppelin-
>> interpreter/target/classes/:/Users/prabhjyotsingh/ps-
>> zeppelin/zeppelin-interpreter/target/test-classes/:/Users/pr
>> abhjyotsingh/ps-zeppelin/zeppelin-zengine/target/test-classe
>> s/:/Users/prabhjyotsingh/ps-zeppelin/interpreter/spark/
>> zeppelin-spark_2.10-0.8.0-SNAPSHOT.jar:/Users/prabhjyotsingh/spark-2.0.0-
>> bin-hadoop2.7/conf/:/Users/prabhjyotsingh/spark-2.0.0-bin-hadoop2.7/jars/*
>> -Xmx1g -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///
>> Users/prabhjyotsingh/ps-zeppelin/conf/log4j.properties
>> -Dzeppelin.log.file=/Users/prabhjyotsingh/ps-zeppelin/logs/
>> zeppelin-interpreter-spark-user1-spark-prabhjyotsingh-HW11610.local.log
>> org.apache.spark.deploy.SparkSubmit --conf spark.driver.extraClassPath=:/
>> Users/prabhjyotsingh/ps-zeppelin/interpreter/spark/*:/Users/
>> prabhjyotsingh/ps-zeppelin/zeppelin-interpreter/target/
>> lib/*::/Users/prabhjyotsingh/ps-zeppelin/zeppelin-
>> interpreter/target/classes:/Users/prabhjyotsingh/ps-
>> zeppelin/zeppelin-interpreter/target/test-classes:/Users/
>> prabhjyotsingh/ps-zeppelin/zeppelin-zengine/target/test-
>> classes:/Users/prabhjyotsingh/ps-zeppelin/interpreter/spark/
>> zeppelin-spark_2.10-0.8.0-SNAPSHOT.jar --conf
>> spark.driver.extraJavaOptions= -Dfile.encoding=UTF-8
>> -Dlog4j.configuration=file:///Users/prabhjyotsingh/ps-zeppelin/conf/log4j.properties
>> -Dzeppelin.log.file=/Users/prabhjyotsingh/ps-zeppelin/logs/
>> zeppelin-interpreter-spark-user1-spark-prabhjyotsingh-HW11610.local.log
>> --class 

Modularization of notebooks

2017-05-10 Thread Georg Heiler
How can I modularize a big notebook? Is it possible to import other notebooks?

Can Zeppelin interoperate with a regular SBT-scala project?

http://stackoverflow.com/questions/43796688/zeppelin-load-full-project-external-files


Re: NullPointerException at org.apache.zeppelin.spark.Utils.buildJobGroupId

2017-05-10 Thread Jeff Zhang
It is fixed here
https://github.com/apache/zeppelin/pull/2334



Ruslan Dautkhanov 于2017年5月10日周三 下午12:46写道:

> Has anyone experienced below exception?
> It started happening inconsistently after upgrade to a last week master
> snapshot of Zeppelin.
> We have multiple users reported the same issue.
>
> java.lang.NullPointerException at
> org.apache.zeppelin.spark.Utils.buildJobGroupId(Utils.java:112) at
> org.apache.zeppelin.spark.SparkZeppelinContext.showData(SparkZeppelinContext.java:100)
> at
> org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:129)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:101)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:500)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:181) at
> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> Thanks,
> Ruslan
>
>


NullPointerException at org.apache.zeppelin.spark.Utils.buildJobGroupId

2017-05-10 Thread Ruslan Dautkhanov
Has anyone experienced below exception?
It started happening inconsistently after upgrade to a last week master
snapshot of Zeppelin.
We have multiple users reported the same issue.

java.lang.NullPointerException at
org.apache.zeppelin.spark.Utils.buildJobGroupId(Utils.java:112) at
org.apache.zeppelin.spark.SparkZeppelinContext.showData(SparkZeppelinContext.java:100)
at
org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:129)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:101)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:500)
at org.apache.zeppelin.scheduler.Job.run(Job.java:181) at
org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)



Thanks,
Ruslan


Re: User Impersonation Configuration

2017-05-10 Thread Yeshwanth Jagini
Hi prabhjyot,
thanks for your reply.

i am using zeppelin 0.7.0 version.
when i do not specify impersonation config in zeppelin-env.sh and only in
interpreter setting,
it is throwing following  exception

ERROR [2017-05-10 17:26:30,551] ({pool-2-thread-3} Job.java[run]:188) - Job
failed
org.apache.zeppelin.interpreter.InterpreterException: Host key verification
failed.

at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:143)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.reference(RemoteInterpreterProcess.java:73)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:258)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:423)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:106)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)



i am running zeppelin as root user, root user doesn't had a password less
ssh setup where as the end web user *user1* has.

how should i proceed now?

Thanks,
Yeshwanth Jagini










On Wed, May 10, 2017 at 1:45 AM, Prabhjyot Singh 
wrote:

> Hi Yeshwant,
>
> Which version of Zeppelin are you on?
>
> If you are on latest then you don't need to do any of ZEPPELIN_IMPERSONATE_CMD
> or ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER. Just by enabling User
> Impersonation check-box should be sufficient.
>
> Can you confirm by `ps aux | grep spark`. This is what I see on my
> machine;
>
> prabhjyotsingh@MACHINE:~/ps-zeppelin/logs$ ps aux | grep spark
> prabhjyotsingh2496   0.2  3.9  5179540 657660 s000  S12:08PM
> 0:30.68 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/bin/java
> -cp /Users/prabhjyotsingh/ps-zeppelin/interpreter/spark/*:/
> Users/prabhjyotsingh/ps-zeppelin/zeppelin-interpreter/target/lib/*:/Users/
> prabhjyotsingh/ps-zeppelin/zeppelin-interpreter/target/classes/:/Users/
> prabhjyotsingh/ps-zeppelin/zeppelin-interpreter/target/
> test-classes/:/Users/prabhjyotsingh/ps-zeppelin/
> zeppelin-zengine/target/test-classes/:/Users/prabhjyotsingh/ps-zeppelin/
> interpreter/spark/zeppelin-spark_2.10-0.8.0-SNAPSHOT.jar:
> /Users/prabhjyotsingh/spark-2.0.0-bin-hadoop2.7/conf/:/
> Users/prabhjyotsingh/spark-2.0.0-bin-hadoop2.7/jars/* -Xmx1g
> -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///
> Users/prabhjyotsingh/ps-zeppelin/conf/log4j.properties
> -Dzeppelin.log.file=/Users/prabhjyotsingh/ps-zeppelin/
> logs/zeppelin-interpreter-spark-user1-spark-prabhjyotsingh-HW11610.local.log
> org.apache.spark.deploy.SparkSubmit --conf spark.driver.extraClassPath=:/
> Users/prabhjyotsingh/ps-zeppelin/interpreter/spark/*:/
> Users/prabhjyotsingh/ps-zeppelin/zeppelin-interpreter/
> target/lib/*::/Users/prabhjyotsingh/ps-zeppelin/
> zeppelin-interpreter/target/classes:/Users/prabhjyotsingh/
> ps-zeppelin/zeppelin-interpreter/target/test-
> classes:/Users/prabhjyotsingh/ps-zeppelin/zeppelin-zengine/
> target/test-classes:/Users/prabhjyotsingh/ps-zeppelin/
> interpreter/spark/zeppelin-spark_2.10-0.8.0-SNAPSHOT.jar --conf
> spark.driver.extraJavaOptions= -Dfile.encoding=UTF-8
> -Dlog4j.configuration=file:///Users/prabhjyotsingh/ps-zeppelin/conf/log4j.properties
> -Dzeppelin.log.file=/Users/prabhjyotsingh/ps-zeppelin/
> logs/zeppelin-interpreter-spark-user1-spark-prabhjyotsingh-HW11610.local.log
> --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 
> *--proxy-user
> user1* /Users/prabhjyotsingh/ps-zeppelin/interpreter/spark/
> zeppelin-spark_2.10-0.8.0-SNAPSHOT.jar 50911
> prabhjyotsingh2508   0.0  0.0  2445100860 s000  S+   12:08PM
> 0:00.00 grep spark
> prabhjyotsingh2495   0.0  0.0  2465144764 s000  S12:08PM
> 0:00.00 /bin/bash /Users/prabhjyotsingh/ps-zeppelin/bin/interpreter.sh -d
> /Users/prabhjyotsingh/ps-zeppelin/interpreter/spark -p 50911 -u user1 -l
> /Users/prabhjyotsingh/ps-zeppelin/local-repo/2CEZC4JXN -g spark
> prabhjyotsingh2484   0.0  0.0  2465144   1368 s000  S12:08PM
> 0:00.01 /bin/bash /Users/prabhjyotsingh/ps-zeppelin/bin/interpreter.sh -d
> /Users/prabhjyotsingh/ps-zeppelin/interpreter/spark -p 50911 -u user1 -l
> 

Re: Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode

2017-05-10 Thread Sofiane Cherchalli
I've put the csv in the worker node since the job is run in the worker. I
didn't put the csv in the master because I believe it doesn't run jobs.

If I put the csv in the zeppelin node with the same path as the worker, it
reads the csv and writes a _SUCCESS file locally. The job is run on the
worker too but doesn't terminate. The result is saved under a _temporary
directory in the worker.

worker - ls -laRt /data/02.csv/


02.csv/:
total 0
drwxr-xr-x. 3 root root 24 Apr 28 09:55 .
drwxr-xr-x. 3 root root 15 Apr 28 09:55 _temporary
drwxr-xr-x. 3 root root 64 Apr 28 09:55 ..

02.csv/_temporary:
total 0
drwxr-xr-x. 5 root root 106 Apr 28 09:56 0
drwxr-xr-x. 3 root root  15 Apr 28 09:55 .
drwxr-xr-x. 3 root root  24 Apr 28 09:55 ..

02.csv/_temporary/0:
total 0
drwxr-xr-x. 5 root root 106 Apr 28 09:56 .
drwxr-xr-x. 2 root root   6 Apr 28 09:56 _temporary
drwxr-xr-x. 2 root root 129 Apr 28 09:56 task_20170428095632_0005_m_00
drwxr-xr-x. 2 root root 129 Apr 28 09:55 task_20170428095516_0002_m_00
drwxr-xr-x. 3 root root  15 Apr 28 09:55 ..

02.csv/_temporary/0/_temporary:
total 0
drwxr-xr-x. 2 root root   6 Apr 28 09:56 .
drwxr-xr-x. 5 root root 106 Apr 28 09:56 ..

02.csv/_temporary/0/task_20170428095632_0005_m_00:
total 52
drwxr-xr-x. 5 root root   106 Apr 28 09:56 ..
-rw-r--r--. 1 root root   376 Apr 28 09:56
.part-0-e39ebc76-5343-407e-b42e-c33e69b8fd1a.csv.crc
-rw-r--r--. 1 root root 46605 Apr 28 09:56
part-0-e39ebc76-5343-407e-b42e-c33e69b8fd1a.csv
drwxr-xr-x. 2 root root   129 Apr 28 09:56 .

02.csv/_temporary/0/task_20170428095516_0002_m_00:
total 52
drwxr-xr-x. 5 root root   106 Apr 28 09:56 ..
-rw-r--r--. 1 root root   376 Apr 28 09:55
.part-0-c2ac5299-26f6-4b23-a74b-b3dc96464271.csv.crc
-rw-r--r--. 1 root root 46605 Apr 28 09:55
part-0-c2ac5299-26f6-4b23-a74b-b3dc96464271.csv


zeppelin - ls -laRt 02.csv/


02.csv/:
total 12
drwxr-sr-x2 root 1700  4096 Apr 28 09:56 .
-rw-r--r--1 root 1700 8 Apr 28 09:56 ._SUCCESS.crc
-rw-r--r--1 root 1700 0 Apr 28 09:56 _SUCCESS
drwxrwsr-x5 root 1700  4096 Apr 28 09:56 ..




El El mié, 10 may 2017 a las 14:06, Meethu Mathew 
escribió:

> Try putting the csv in the same path in all the nodes or in a mount point
> path which is accessible by all the nodes
>
> Regards,
>
>
> Meethu Mathew
>
>
> On Wed, May 10, 2017 at 3:36 PM, Sofiane Cherchalli 
> wrote:
>
>> Yes, I already tested with spark-shell and pyspark , with the same result.
>>
>> Can't I use Linux filesystem to read CSV, such as file:///data/file.csv.
>> My understanding is that the job is sent and is interpreted in the worker,
>> isn't it?
>>
>> Thanks.
>>
>> El El mar, 9 may 2017 a las 20:23, Jongyoul Lee 
>> escribió:
>>
>>> Could you test if it works with spark-shell?
>>>
>>> On Sun, May 7, 2017 at 5:22 PM, Sofiane Cherchalli 
>>> wrote:
>>>
 Hi,

 I have a standalone cluster, one master and one worker, running in
 separate nodes. Zeppelin is running is in a separate node too in client
 mode.

 When I run a notebook that reads a CSV file located in the worker
 node with Spark-CSV package, Zeppelin tries to read the CSV locally and
 fails because the CVS is in the worker node and not in Zeppelin node.

 Is this the expected behavior?

 Thanks.

>>>
>>>
>>>
>>> --
>>> 이종열, Jongyoul Lee, 李宗烈
>>> http://madeng.net
>>>
>>
>


Re: Hive Reserve Keyword support

2017-05-10 Thread Dibyendu Bhattacharya
right. This backticks worked

On Wed, May 10, 2017 at 8:51 AM, Felix Cheung 
wrote:

> I think you can put backticks around the name date
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
>
> --
> *From:* Jongyoul Lee 
> *Sent:* Tuesday, May 9, 2017 10:33:50 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Re: Hive Reserve Keyword support
>
> If it's possible for you to pass that properties when you create a
> connection, you can passes it by setting it into interpreter setting
>
> On Sat, Apr 29, 2017 at 4:25 PM, Dibyendu Bhattacharya <
> dibyendu.bhattach...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a Hive Table which has a column named date. When I tried to query
>> using Zeppelin %jdbc interpreter , I got bellow error.
>>
>>
>> Error while compiling statement: FAILED: ParseException line 1:312 Failed
>> to recognize predicate 'date'. Failed rule: 'identifier' in expression
>> specification
>> class org.apache.hive.service.cli.HiveSQLException
>> org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:231)
>> org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:217)
>> org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
>> org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInte
>> rpreter.java:322)
>> org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInter
>> preter.java:408)
>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.
>> interpret(LazyOpenInterpreter.java:94)
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServ
>> er$InterpretJob.jobRun(RemoteInterpreterServer.java:341)
>> org.apache.zeppelin.scheduler.Job.run(Job.java:176)
>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.ru
>> n(ParallelScheduler.java:162)
>>
>>
>> My query looks like this :
>>
>> select x,y,z from mytable where date = '2017-04-28"
>>
>> I believe it is failing because date is reserve keyword . Is there anyway
>> I can set  hive.support.sql11.reserved.keywords=false in Zeppelin ?
>>
>> regards,
>> Dibyendu
>>
>>
>>
>>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>