Re: Where are the interpreter settings stored ?

2017-05-31 Thread Sudev A C
Thanks a lot, Ahyoung :)




On Tue, May 30, 2017 at 9:50 PM, Ahyoung Ryu  wrote:

> Hi Sudev,
>
> The setting information is stored in conf/interpreter.json
>
> Hope it helps,
> Ahyoung
>
> On Tue, May 30, 2017 at 9:18 AM, Sudev A C  wrote:
>
>> Hi,
>>
>>
>> Can anyone help in understanding where the interpreter settings are
>> stored?
>>
>> I'm trying to run Zeppelin in Docker, my issue is that I have to do all
>> the configuration during every run of the docker. If the settings are being
>> stored, read and populated to Zeppelin from a file then I can pre-load the
>> same file to its location during docker build.
>>
>> Has anyone worked around this issue?
>>
>>
>>
>> Thanks
>> Sudev
>>
>> ::DISCLAIMER::
>>
>> 
>> 
>> 
>>
>>
>> This message is intended only for the use of the addressee and may
>> contain information that is privileged, confidential and exempt from
>> disclosure under applicable law. If the reader of this message is not the
>> intended recipient, or the employee or agent responsible for delivering the
>> message to the intended recipient, you are hereby notified that any
>> dissemination, distribution or copying of this communication is strictly
>> prohibited. If you have received this e-mail in error, please notify us
>> immediately by return e-mail and delete this e-mail and all attachments
>> from your system.
>>
>
>


-- 
Sudev A C | Senior Data Engineer
sudev...@go-mmt.com | 8089442513

2nd floor, Tower B Divyashree Technopolis Yemalur , Bangalore , Karnataka
560025,India
 


-- 


::DISCLAIMER::




This message is intended only for the use of the addressee and may contain 
information that is privileged, confidential and exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivering the message 
to the intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this communication is strictly prohibited. If 
you have received this e-mail in error, please notify us immediately by 
return e-mail and delete this e-mail and all attachments from your system.


Re: Livy - add external libraries from additional maven repo

2017-05-31 Thread Theofilos Kakantousis

Thanks everyone for the feedback!

Indeed %dep works only for Spark interpreter, just mentioned it to show 
the interpreter behavior I expected with Livy.
When setting my local maven repo and the "groupId:artifactId:version" in 
the interpreter settings, I can see the dependency (i.e. a jar file) 
being downloaded to the local OS under "local-repo" directory but the 
dependency is not deployed with the Spark application in YARN.


Cheers,
Theo

On 2017-05-31 01:19, Ben Vogan wrote:
For what it's worth I have successfully added jar files and maven 
packages to sessions using zeppelin & livy 0.3 - although not using 
%dep.  In the interpreter settings I set the livy.spark.jars setting 
for jars that are on my HDFS cluster, and livy.spark.jars.packages for 
maven packages - although only using maven central and not a local repo.


--Ben

On Tue, May 30, 2017 at 12:36 PM, Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:


To add, this might be an issue with Livy.

I'm seeing something similar as well.

If you can get a repo with calling the Livy REST API directly it
will be worthwhile to follow up with the Livy community separately.



*From:* Felix Cheung mailto:felixcheun...@hotmail.com>>
*Sent:* Tuesday, May 30, 2017 11:34:31 AM
*To:* users@zeppelin.apache.org
; users@zeppelin.apache.org

*Subject:* Re: Livy - add external libraries from additional maven
repo
if I recall, %dep only works with the built in Spark interpreter
and not the Livy interpreter.

To manage dependency win Livy you will need to set Spark conf with
Livy.


*From:* Theofilos Kakantousis mailto:t...@kth.se>>
*Sent:* Tuesday, May 30, 2017 9:05:15 AM
*To:* users@zeppelin.apache.org 
*Subject:* Livy - add external libraries from additional maven repo
Hi everyone,

I'm using Zeppelin with Livy 0.4 and trying to add external
libraries from an additional maven repo to my application
according to the documentation available here

.
The example works fine, but when I set the
livy.spark.jars.packages to my library the interpreter throws an
unresolved dependency error.

I have added the additional maven repository in the interpreter
settings and have also tried setting livy.spark.jars.ivy but
without luck. However, if I use the Spark interpreter with the
following code it works fine.

"%dep
z.reset();
z.addRepo("my repo").url("http://myrepo"; ).snapshot
z.load("mygroup:myartifact:myversion");

Has anyone managed to do that with Livy? Thanks!

Cheers,
Theo




--
*BENJAMIN VOGAN*| Data Platform Team Lead








Re: Permission denied: user=zeppelin while using %spark.pyspark interpreter in AWS EMR cluster

2017-05-31 Thread BigData Consultant
How to set the access for zeppelin user to the HDFS?

On Tue, May 30, 2017 at 2:16 AM, Felix Cheung 
wrote:

> Seems to be with hdfs ACL - does the service user Zeppelin have access to
> your storage?
>
> --
> *From:* BigData Consultant 
> *Sent:* Friday, May 26, 2017 10:56:31 PM
> *To:* d...@zeppelin.apache.org; users@zeppelin.apache.org
> *Subject:* Permission denied: user=zeppelin while using %spark.pyspark
> interpreter in AWS EMR cluster
>
> Hi Team,
>
> I have created pyspark structure streaming program and trying to execute in
> the Zeppelin notebook, I am getting the following error:
>
> Py4JJavaError: An error occurred while calling o191.start.
> : org.apache.hadoop.security.AccessControlException: Permission denied:
> user=zeppelin, access=WRITE,
> inode="/mnt/tmp/temporary-e0cf0f09-a6f4-44d6-9a72-
> 324660085608/metadata":hdfs:hadoop:drwxr-xr-x
>
>
> I am using Zeppelin Notebook Version 0.7.1 in AWS EMR cluster.
>
> Help would be much appreciated.
>
> *Full stacktrace:*
>
>
> Traceback (most recent call last):
> File "/tmp/zeppelin_pyspark-8165971491474576109.py", line 349, in 
> raise Exception(traceback.format_exc())
> Exception: Traceback (most recent call last):
> File "/tmp/zeppelin_pyspark-8165971491474576109.py", line 342, in 
> exec(code)
> File "", line 5, in 
> File "/usr/lib/spark/python/pyspark/sql/streaming.py", line 816, in start
> return self._sq(self._jwrite.start())
> File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py",
> line 1133, in __call__
> answer, self.gateway_client, self.target_id, self.name)
> File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
> return f(*a, **kw)
> File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py",
> line
> 319, in get_return_value
> format(target_id, ".", name), value)
> Py4JJavaError: An error occurred while calling o191.start.
> : org.apache.hadoop.security.AccessControlException: Permission denied:
> user=zeppelin, access=WRITE,
> inode="/mnt/tmp/temporary-e0cf0f09-a6f4-44d6-9a72-
> 324660085608/metadata":hdfs:hadoop:drwxr-xr-x
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> check(FSPermissionChecker.java:320)
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> check(FSPermissionChecker.java:292)
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> checkPermission(FSPermissionChecker.java:213)
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> checkPermission(FSPermissionChecker.java:190)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkPermission(FSDirectory.java:1728)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkPermission(FSDirectory.java:1712)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(
> FSDirectory.java:1695)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(
> FSNamesystem.java:2515)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
> startFileInt(FSNamesystem.java:2450)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
> startFile(FSNamesystem.java:2334)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
> create(NameNodeRpcServer.java:624)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSi
> deTranslatorPB.create(ClientNamenodeProtocolServerSi
> deTranslatorPB.java:397)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$
> ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.
> java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(
> ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1698)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(
> RemoteException.java:106)
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(
> RemoteException.java:73)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(
> DFSOutputStream.java:1653)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$7.
> doCall(DistributedFileS

Re: Permission denied: user=zeppelin while using %spark.pyspark interpreter in AWS EMR cluster

2017-05-31 Thread Trevor Grant
maybe relevant (maybe not)

https://github.com/apache/zeppelin/pull/1323

Had some issues hitting a cloud HDFS instance a while back- you may be able
to hack a solution out relevant to your problem.

tg

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Wed, May 31, 2017 at 12:52 PM, BigData Consultant <
bigdata.consultant2...@gmail.com> wrote:

> How to set the access for zeppelin user to the HDFS?
>
> On Tue, May 30, 2017 at 2:16 AM, Felix Cheung 
> wrote:
>
>> Seems to be with hdfs ACL - does the service user Zeppelin have access to
>> your storage?
>>
>> --
>> *From:* BigData Consultant 
>> *Sent:* Friday, May 26, 2017 10:56:31 PM
>> *To:* d...@zeppelin.apache.org; users@zeppelin.apache.org
>> *Subject:* Permission denied: user=zeppelin while using %spark.pyspark
>> interpreter in AWS EMR cluster
>>
>> Hi Team,
>>
>> I have created pyspark structure streaming program and trying to execute
>> in
>> the Zeppelin notebook, I am getting the following error:
>>
>> Py4JJavaError: An error occurred while calling o191.start.
>> : org.apache.hadoop.security.AccessControlException: Permission denied:
>> user=zeppelin, access=WRITE,
>> inode="/mnt/tmp/temporary-e0cf0f09-a6f4-44d6-9a72-3246600856
>> 08/metadata":hdfs:hadoop:drwxr-xr-x
>>
>>
>> I am using Zeppelin Notebook Version 0.7.1 in AWS EMR cluster.
>>
>> Help would be much appreciated.
>>
>> *Full stacktrace:*
>>
>>
>> Traceback (most recent call last):
>> File "/tmp/zeppelin_pyspark-8165971491474576109.py", line 349, in
>> 
>> raise Exception(traceback.format_exc())
>> Exception: Traceback (most recent call last):
>> File "/tmp/zeppelin_pyspark-8165971491474576109.py", line 342, in
>> 
>> exec(code)
>> File "", line 5, in 
>> File "/usr/lib/spark/python/pyspark/sql/streaming.py", line 816, in start
>> return self._sq(self._jwrite.start())
>> File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gat
>> eway.py",
>> line 1133, in __call__
>> answer, self.gateway_client, self.target_id, self.name)
>> File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
>> return f(*a, **kw)
>> File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py",
>> line
>> 319, in get_return_value
>> format(target_id, ".", name), value)
>> Py4JJavaError: An error occurred while calling o191.start.
>> : org.apache.hadoop.security.AccessControlException: Permission denied:
>> user=zeppelin, access=WRITE,
>> inode="/mnt/tmp/temporary-e0cf0f09-a6f4-44d6-9a72-3246600856
>> 08/metadata":hdfs:hadoop:drwxr-xr-x
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>> heck(FSPermissionChecker.java:320)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>> heck(FSPermissionChecker.java:292)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>> heckPermission(FSPermissionChecker.java:213)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>> heckPermission(FSPermissionChecker.java:190)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPerm
>> ission(FSDirectory.java:1728)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPerm
>> ission(FSDirectory.java:1712)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAnce
>> storAccess(FSDirectory.java:1695)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFil
>> eInternal(FSNamesystem.java:2515)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFil
>> eInt(FSNamesystem.java:2450)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFil
>> e(FSNamesystem.java:2334)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.cre
>> ate(NameNodeRpcServer.java:624)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServ
>> erSideTranslatorPB.create(ClientNamenodeProtocolServerSideTr
>> anslatorPB.java:397)
>> at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocol
>> Protos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNam
>> enodeProtocolProtos.java)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcIn
>> voker.call(ProtobufRpcEngine.java:616)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> upInformation.java:1698)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>> at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
>> ConstructorAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingConstru

Re: Livy - add external libraries from additional maven repo

2017-05-31 Thread Felix Cheung
How are you setting this:
When setting my local maven repo and the "groupId:artifactId:version" in the 
interpreter settings


From: Theofilos Kakantousis 
Sent: Wednesday, May 31, 2017 1:56:07 AM
To: users@zeppelin.apache.org
Subject: Re: Livy - add external libraries from additional maven repo

Thanks everyone for the feedback!

Indeed %dep works only for Spark interpreter, just mentioned it to show the 
interpreter behavior I expected with Livy.
When setting my local maven repo and the "groupId:artifactId:version" in the 
interpreter settings, I can see the dependency (i.e. a jar file) being 
downloaded to the local OS under "local-repo" directory but the dependency is 
not deployed with the Spark application in YARN.

Cheers,
Theo

On 2017-05-31 01:19, Ben Vogan wrote:
For what it's worth I have successfully added jar files and maven packages to 
sessions using zeppelin & livy 0.3 - although not using %dep.  In the 
interpreter settings I set the livy.spark.jars setting for jars that are on my 
HDFS cluster, and livy.spark.jars.packages for maven packages - although only 
using maven central and not a local repo.

--Ben

On Tue, May 30, 2017 at 12:36 PM, Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
To add, this might be an issue with Livy.

I'm seeing something similar as well.

If you can get a repo with calling the Livy REST API directly it will be 
worthwhile to follow up with the Livy community separately.



From: Felix Cheung mailto:felixcheun...@hotmail.com>>
Sent: Tuesday, May 30, 2017 11:34:31 AM
To: users@zeppelin.apache.org; 
users@zeppelin.apache.org
Subject: Re: Livy - add external libraries from additional maven repo

if I recall, %dep only works with the built in Spark interpreter and not the 
Livy interpreter.

To manage dependency win Livy you will need to set Spark conf with Livy.


From: Theofilos Kakantousis mailto:t...@kth.se>>
Sent: Tuesday, May 30, 2017 9:05:15 AM
To: users@zeppelin.apache.org
Subject: Livy - add external libraries from additional maven repo

Hi everyone,

I'm using Zeppelin with Livy 0.4 and trying to add external libraries from an 
additional maven repo to my application according to the documentation 
available 
here.
 The example works fine, but when I set the livy.spark.jars.packages to my 
library the interpreter throws an unresolved dependency error.

I have added the additional maven repository in the interpreter settings and 
have also tried setting livy.spark.jars.ivy but without luck. However, if I use 
the Spark interpreter with the following code it works fine.

"%dep
z.reset();
z.addRepo("my repo").url("http://myrepo";).snapshot
z.load("mygroup:myartifact:myversion");

Has anyone managed to do that with Livy? Thanks!

Cheers,
Theo



--
BENJAMIN VOGAN | Data Platform Team Lead

[https://docs.google.com/uc?export=download&id=0B2SW57jgJhgaOGNLSXhCaUNoMzQ&revid=0B2SW57jgJhgaNm1lM1Q4NXlrN1VkcmQ5MG1TNWs1L0pOdkZBPQ]
[https://docs.google.com/uc?export=download&id=0B2SW57jgJhgaU0o1Zy1YWmhNN0k&revid=0B2SW57jgJhgaMG1hZXJ0b0hQYzNvVnhyRlhFV3R0aDh6eEhVPQ][https://docs.google.com/uc?export=download&id=0B2SW57jgJhgaNjNwelg2Y1lwdkE&revid=0B2SW57jgJhgaR0xZWG5xajFGQk5VRWtTNC9yV09lbmVQeDRRPQ][https://docs.google.com/uc?export=download&id=0B2SW57jgJhgaTU1MQjdpaG96N0E&revid=0B2SW57jgJhgaazkyTkU1VlF5UFBrYmFuZklhMFA1T3NqZklvPQ][https://docs.google.com/uc?export=download&id=0B2SW57jgJhgaQVJzbkc2TUVqOEU&revid=0B2SW57jgJhgaU2Jpb3hWQlZoVWFsQjl0aHBrVVZpMXp0VFYwPQ][https://docs.google.com/uc?export=download&id=0B2SW57jgJhgaMHBINExPa1NLTVE&revid=0B2SW57jgJhgacUlnaGRqNVJDNlhCTVlodHloeUR5aXNoZlp3PQ]



Re: Livy - add external libraries from additional maven repo

2017-05-31 Thread Jeff Zhang
Ben's right, for livy interpreter, you need to specify
livy.spark.jars.packages.

Specifying interpreter dependency doesn't work for livy interpreter,
because livy run in yarn-cluster mode, while interpreter dependencies is
downloaded in the client JVM.




Felix Cheung 于2017年6月1日周四 上午2:52写道:

> How are you setting this:
> When setting my local maven repo and the "groupId:artifactId:version" in
> the interpreter settings
>
> --
> *From:* Theofilos Kakantousis 
> *Sent:* Wednesday, May 31, 2017 1:56:07 AM
> *To:* users@zeppelin.apache.org
>
> *Subject:* Re: Livy - add external libraries from additional maven repo
> Thanks everyone for the feedback!
>
> Indeed %dep works only for Spark interpreter, just mentioned it to show
> the interpreter behavior I expected with Livy.
> When setting my local maven repo and the "groupId:artifactId:version" in
> the interpreter settings, I can see the dependency (i.e. a jar file) being
> downloaded to the local OS under "local-repo" directory but the dependency
> is not deployed with the Spark application in YARN.
>
> Cheers,
> Theo
>
> On 2017-05-31 01:19, Ben Vogan wrote:
>
> For what it's worth I have successfully added jar files and maven packages
> to sessions using zeppelin & livy 0.3 - although not using %dep.  In the
> interpreter settings I set the livy.spark.jars setting for jars that are on
> my HDFS cluster, and livy.spark.jars.packages for maven packages -
> although only using maven central and not a local repo.
>
> --Ben
>
> On Tue, May 30, 2017 at 12:36 PM, Felix Cheung 
> wrote:
>
>> To add, this might be an issue with Livy.
>>
>> I'm seeing something similar as well.
>>
>> If you can get a repo with calling the Livy REST API directly it will be
>> worthwhile to follow up with the Livy community separately.
>>
>>
>> --
>> *From:* Felix Cheung 
>> *Sent:* Tuesday, May 30, 2017 11:34:31 AM
>> *To:* users@zeppelin.apache.org; users@zeppelin.apache.org
>> *Subject:* Re: Livy - add external libraries from additional maven repo
>>
>> if I recall, %dep only works with the built in Spark interpreter and not
>> the Livy interpreter.
>>
>> To manage dependency win Livy you will need to set Spark conf with Livy.
>>
>> --
>> *From:* Theofilos Kakantousis 
>> *Sent:* Tuesday, May 30, 2017 9:05:15 AM
>> *To:* users@zeppelin.apache.org
>> *Subject:* Livy - add external libraries from additional maven repo
>>
>> Hi everyone,
>>
>> I'm using Zeppelin with Livy 0.4 and trying to add external libraries
>> from an additional maven repo to my application according to the
>> documentation available here
>> .
>> The example works fine, but when I set the livy.spark.jars.packages to my
>> library the interpreter throws an unresolved dependency error.
>>
>> I have added the additional maven repository in the interpreter settings
>> and have also tried setting livy.spark.jars.ivy but without luck. However,
>> if I use the Spark interpreter with the following code it works fine.
>>
>> "%dep
>> z.reset();
>> z.addRepo("my repo").url("http://myrepo"; ).snapshot
>> z.load("mygroup:myartifact:myversion");
>>
>> Has anyone managed to do that with Livy? Thanks!
>>
>> Cheers,
>> Theo
>>
>
>
>
> --
> *BENJAMIN VOGAN* | Data Platform Team Lead
>
> 
>  
>  
> 
>
>
>


Re: Permission denied: user=zeppelin while using %spark.pyspark interpreter in AWS EMR cluster

2017-05-31 Thread Jeff Zhang
Try to set spark.sql.streaming.checkpointLocation to a folder that user
zeppelin has the write permission



Trevor Grant 于2017年6月1日周四 上午2:00写道:

> maybe relevant (maybe not)
>
> https://github.com/apache/zeppelin/pull/1323
>
> Had some issues hitting a cloud HDFS instance a while back- you may be
> able to hack a solution out relevant to your problem.
>
> tg
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Wed, May 31, 2017 at 12:52 PM, BigData Consultant <
> bigdata.consultant2...@gmail.com> wrote:
>
>> How to set the access for zeppelin user to the HDFS?
>>
> On Tue, May 30, 2017 at 2:16 AM, Felix Cheung 
>> wrote:
>>
> Seems to be with hdfs ACL - does the service user Zeppelin have access to
>>> your storage?
>>>
>>> --
>>> *From:* BigData Consultant 
>>> *Sent:* Friday, May 26, 2017 10:56:31 PM
>>> *To:* d...@zeppelin.apache.org; users@zeppelin.apache.org
>>> *Subject:* Permission denied: user=zeppelin while using %spark.pyspark
>>> interpreter in AWS EMR cluster
>>>
>>>
>> Hi Team,
>>>
>>> I have created pyspark structure streaming program and trying to execute
>>> in
>>> the Zeppelin notebook, I am getting the following error:
>>>
>>> Py4JJavaError: An error occurred while calling o191.start.
>>> : org.apache.hadoop.security.AccessControlException: Permission denied:
>>> user=zeppelin, access=WRITE,
>>>
>>
>>> inode="/mnt/tmp/temporary-e0cf0f09-a6f4-44d6-9a72-324660085608/metadata":hdfs:hadoop:drwxr-xr-x
>>>
>>
>>>
>>>
>>> I am using Zeppelin Notebook Version 0.7.1 in AWS EMR cluster.
>>>
>>> Help would be much appreciated.
>>>
>>> *Full stacktrace:*
>>>
>>
>>>
>>> Traceback (most recent call last):
>>> File "/tmp/zeppelin_pyspark-8165971491474576109.py", line 349, in
>>> 
>>> raise Exception(traceback.format_exc())
>>> Exception: Traceback (most recent call last):
>>> File "/tmp/zeppelin_pyspark-8165971491474576109.py", line 342, in
>>> 
>>> exec(code)
>>> File "", line 5, in 
>>> File "/usr/lib/spark/python/pyspark/sql/streaming.py", line 816, in start
>>> return self._sq(self._jwrite.start())
>>> File
>>> "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py",
>>> line 1133, in __call__
>>> answer, self.gateway_client, self.target_id, self.name)
>>> File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
>>> return f(*a, **kw)
>>> File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py",
>>> line
>>> 319, in get_return_value
>>> format(target_id, ".", name), value)
>>> Py4JJavaError: An error occurred while calling o191.start.
>>> : org.apache.hadoop.security.AccessControlException: Permission denied:
>>> user=zeppelin, access=WRITE,
>>>
>>
>>> inode="/mnt/tmp/temporary-e0cf0f09-a6f4-44d6-9a72-324660085608/metadata":hdfs:hadoop:drwxr-xr-x
>>> at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:320)
>>> at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
>>> at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
>>> at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
>>> at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1728)
>>> at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1712)
>>> at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1695)
>>> at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2515)
>>> at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2450)
>>> at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2334)
>>> at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:624)
>>> at
>>>
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
>>> at
>>>
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>> at
>>>
>>
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>>>
>>
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>> at
>>>
>>
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja

Task not serializable error when I try to cache the spark sql table

2017-05-31 Thread shyla deshpande
Hello all,

I am getting org.apache.spark.SparkException: Task not serializable error
when I try to cache the spark sql table. I am using a UDF on a column of
table and want to cache the resultant table . I can execute the paragraph
successfully when there is no caching.

Please help! Thanks

UDF :
def fn1(res: String): Int = {
  100
}
 spark.udf.register("fn1", fn1(_: String): Int)


   spark
  .read
  .format("org.apache.spark.sql.cassandra")
  .options(Map("keyspace" -> "k", "table" -> "t"))
  .load
  .createOrReplaceTempView("t1")


 val df1 = spark.sql("SELECT  col1, col2, fn1(col3)   from t1" )

 df1.createOrReplaceTempView("t2")

   spark.catalog.cacheTable("t2")