Re: spark-shell with different username

2016-04-02 Thread Matt Tenenbaum
Hi Mich. I certainly should have included that info in my original message
(sorry!): it's a mac, running OS X (10.11.3).

Cheers
-mt

On Fri, Apr 1, 2016 at 11:16 PM, Mich Talebzadeh 
wrote:

> Matt,
>
> What OS are you using on your laptop? Sounds like Ubuntu or something?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 2 April 2016 at 01:17, Matt Tenenbaum 
> wrote:
>
>> Hello all —
>>
>> tl;dr: I’m having an issue running spark-shell from my laptop (or other
>> non-cluster-affiliated machine), and I think the issue boils down to
>> usernames. Can I convince spark/scala that I’m someone other than $USER?
>>
>> A bit of background: our cluster is CDH 5.4.8, installed with Cloudera
>> Manager 5.5. We use LDAP, and my login on all hadoop-affiliated machines
>> (including the gateway boxes we use for running scheduled work) is
>> ‘matt.tenenbaum’. When I run spark-shell on one of those machines,
>> everything is fine:
>>
>> [matt.tenenbaum@remote-machine ~]$ HADOOP_CONF_DIR=/etc/hadoop/conf 
>> SPARK_HOME=spark-1.6.0-bin-hadoop2.6 
>> spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode client
>>
>> Everything starts up correctly, I get a scala prompt, the SparkContext
>> and SQL context are correctly initialized, and I’m off to the races:
>>
>> 16/04/01 23:27:00 INFO session.SessionState: Created local directory: 
>> /tmp/35b58974-dad5-43c6-9864-43815d101ca0_resources
>> 16/04/01 23:27:00 INFO session.SessionState: Created HDFS directory: 
>> /tmp/hive/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0
>> 16/04/01 23:27:00 INFO session.SessionState: Created local directory: 
>> /tmp/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0
>> 16/04/01 23:27:00 INFO session.SessionState: Created HDFS directory: 
>> /tmp/hive/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0/_tmp_space.db
>> 16/04/01 23:27:00 INFO repl.SparkILoop: Created sql context (with Hive 
>> support)..
>> SQL context available as sqlContext.
>>
>> scala> 1 + 41
>> res0: Int = 42
>>
>> scala> sc
>> res1: org.apache.spark.SparkContext = org.apache.spark.SparkContext@4e9bd2c8
>>
>> I am running 1.6 from a downloaded tgz file, rather than the spark-shell
>> made available to the cluster from CDH. I can copy that tgz to my laptop,
>> and grab a copy of the cluster configurations, and in a perfect world I
>> would then be able to run everything in the same way
>>
>> [matt@laptop ~]$ HADOOP_CONF_DIR=path/to/hadoop/conf 
>> SPARK_HOME=spark-1.6.0-bin-hadoop2.6 
>> spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode client
>>
>> Notice there are two things that are different:
>>
>>1. My local username on my laptop is ‘matt’, which does not match my
>>name on the remote machine.
>>2. The Hadoop configs live somewhere other than /etc/hadoop/conf
>>
>> Alas, #1 proves fatal because of cluster permissions (there is no
>> /user/matt/ in HDFS, and ‘matt’ is not a valid LDAP user). In the
>> initialization logging output, I can see that fail in an expected way:
>>
>> 16/04/01 16:37:19 INFO yarn.Client: Setting up container launch context for 
>> our AM
>> 16/04/01 16:37:19 INFO yarn.Client: Setting up the launch environment for 
>> our AM container
>> 16/04/01 16:37:19 INFO yarn.Client: Preparing resources for our AM container
>> 16/04/01 16:37:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
>> library for your platform... using builtin-java classes where applicable
>> 16/04/01 16:37:21 ERROR spark.SparkContext: Error initializing SparkContext.
>> org.apache.hadoop.security.AccessControlException: Permission denied: 
>> user=matt, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
>> at 
>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
>> at 
>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
>> at (... etc ...)
>>
>> Fine. In other circumstances I’ve told Hadoop explicitly who I am by
>> setting HADOOP_USER_NAME. Maybe that works here?
>>
>> [matt@laptop ~]$ HADOOP_USER_NAME=matt.tenenbaum HADOOP_CONF_DIR=soma-conf 
>> SPARK_HOME=spark-1.6.0-bin-hadoop2.6 
>> spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode client
>>
>> Eventually that fails too, but not for the same reason. Setting
>> HADOOP_USER_NAME is sufficient to allow initialization to get past the
>> access-control problems, and I can see it request a new application from
>> the cluster
>>
>> 16/04/01 16:43:08 INFO yarn.Client: Will allocate AM container, with 896 MB 
>> memory including 384 MB overhead
>> 16/04/01 16:43:08 INFO yarn.Client: Setting up container launch context 

Re: spark-shell with different username

2016-04-02 Thread Sebastian YEPES FERNANDEZ
Matt, have you tried using the parameter  --*proxy*-*user* matt
On Apr 2, 2016 8:17 AM, "Mich Talebzadeh"  wrote:

> Matt,
>
> What OS are you using on your laptop? Sounds like Ubuntu or something?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 2 April 2016 at 01:17, Matt Tenenbaum 
> wrote:
>
>> Hello all —
>>
>> tl;dr: I’m having an issue running spark-shell from my laptop (or other
>> non-cluster-affiliated machine), and I think the issue boils down to
>> usernames. Can I convince spark/scala that I’m someone other than $USER?
>>
>> A bit of background: our cluster is CDH 5.4.8, installed with Cloudera
>> Manager 5.5. We use LDAP, and my login on all hadoop-affiliated machines
>> (including the gateway boxes we use for running scheduled work) is
>> ‘matt.tenenbaum’. When I run spark-shell on one of those machines,
>> everything is fine:
>>
>> [matt.tenenbaum@remote-machine ~]$ HADOOP_CONF_DIR=/etc/hadoop/conf 
>> SPARK_HOME=spark-1.6.0-bin-hadoop2.6 
>> spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode client
>>
>> Everything starts up correctly, I get a scala prompt, the SparkContext
>> and SQL context are correctly initialized, and I’m off to the races:
>>
>> 16/04/01 23:27:00 INFO session.SessionState: Created local directory: 
>> /tmp/35b58974-dad5-43c6-9864-43815d101ca0_resources
>> 16/04/01 23:27:00 INFO session.SessionState: Created HDFS directory: 
>> /tmp/hive/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0
>> 16/04/01 23:27:00 INFO session.SessionState: Created local directory: 
>> /tmp/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0
>> 16/04/01 23:27:00 INFO session.SessionState: Created HDFS directory: 
>> /tmp/hive/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0/_tmp_space.db
>> 16/04/01 23:27:00 INFO repl.SparkILoop: Created sql context (with Hive 
>> support)..
>> SQL context available as sqlContext.
>>
>> scala> 1 + 41
>> res0: Int = 42
>>
>> scala> sc
>> res1: org.apache.spark.SparkContext = org.apache.spark.SparkContext@4e9bd2c8
>>
>> I am running 1.6 from a downloaded tgz file, rather than the spark-shell
>> made available to the cluster from CDH. I can copy that tgz to my laptop,
>> and grab a copy of the cluster configurations, and in a perfect world I
>> would then be able to run everything in the same way
>>
>> [matt@laptop ~]$ HADOOP_CONF_DIR=path/to/hadoop/conf 
>> SPARK_HOME=spark-1.6.0-bin-hadoop2.6 
>> spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode client
>>
>> Notice there are two things that are different:
>>
>>1. My local username on my laptop is ‘matt’, which does not match my
>>name on the remote machine.
>>2. The Hadoop configs live somewhere other than /etc/hadoop/conf
>>
>> Alas, #1 proves fatal because of cluster permissions (there is no
>> /user/matt/ in HDFS, and ‘matt’ is not a valid LDAP user). In the
>> initialization logging output, I can see that fail in an expected way:
>>
>> 16/04/01 16:37:19 INFO yarn.Client: Setting up container launch context for 
>> our AM
>> 16/04/01 16:37:19 INFO yarn.Client: Setting up the launch environment for 
>> our AM container
>> 16/04/01 16:37:19 INFO yarn.Client: Preparing resources for our AM container
>> 16/04/01 16:37:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
>> library for your platform... using builtin-java classes where applicable
>> 16/04/01 16:37:21 ERROR spark.SparkContext: Error initializing SparkContext.
>> org.apache.hadoop.security.AccessControlException: Permission denied: 
>> user=matt, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
>> at 
>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
>> at 
>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
>> at (... etc ...)
>>
>> Fine. In other circumstances I’ve told Hadoop explicitly who I am by
>> setting HADOOP_USER_NAME. Maybe that works here?
>>
>> [matt@laptop ~]$ HADOOP_USER_NAME=matt.tenenbaum HADOOP_CONF_DIR=soma-conf 
>> SPARK_HOME=spark-1.6.0-bin-hadoop2.6 
>> spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode client
>>
>> Eventually that fails too, but not for the same reason. Setting
>> HADOOP_USER_NAME is sufficient to allow initialization to get past the
>> access-control problems, and I can see it request a new application from
>> the cluster
>>
>> 16/04/01 16:43:08 INFO yarn.Client: Will allocate AM container, with 896 MB 
>> memory including 384 MB overhead
>> 16/04/01 16:43:08 INFO yarn.Client: Setting up container launch context for 
>> our AM
>> 16/04/01 16:43:08 INFO yarn.Client: Setting up the launch 

Re: spark-shell with different username

2016-04-02 Thread Mich Talebzadeh
Matt,

What OS are you using on your laptop? Sounds like Ubuntu or something?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 2 April 2016 at 01:17, Matt Tenenbaum  wrote:

> Hello all —
>
> tl;dr: I’m having an issue running spark-shell from my laptop (or other
> non-cluster-affiliated machine), and I think the issue boils down to
> usernames. Can I convince spark/scala that I’m someone other than $USER?
>
> A bit of background: our cluster is CDH 5.4.8, installed with Cloudera
> Manager 5.5. We use LDAP, and my login on all hadoop-affiliated machines
> (including the gateway boxes we use for running scheduled work) is
> ‘matt.tenenbaum’. When I run spark-shell on one of those machines,
> everything is fine:
>
> [matt.tenenbaum@remote-machine ~]$ HADOOP_CONF_DIR=/etc/hadoop/conf 
> SPARK_HOME=spark-1.6.0-bin-hadoop2.6 
> spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode client
>
> Everything starts up correctly, I get a scala prompt, the SparkContext and
> SQL context are correctly initialized, and I’m off to the races:
>
> 16/04/01 23:27:00 INFO session.SessionState: Created local directory: 
> /tmp/35b58974-dad5-43c6-9864-43815d101ca0_resources
> 16/04/01 23:27:00 INFO session.SessionState: Created HDFS directory: 
> /tmp/hive/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0
> 16/04/01 23:27:00 INFO session.SessionState: Created local directory: 
> /tmp/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0
> 16/04/01 23:27:00 INFO session.SessionState: Created HDFS directory: 
> /tmp/hive/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0/_tmp_space.db
> 16/04/01 23:27:00 INFO repl.SparkILoop: Created sql context (with Hive 
> support)..
> SQL context available as sqlContext.
>
> scala> 1 + 41
> res0: Int = 42
>
> scala> sc
> res1: org.apache.spark.SparkContext = org.apache.spark.SparkContext@4e9bd2c8
>
> I am running 1.6 from a downloaded tgz file, rather than the spark-shell
> made available to the cluster from CDH. I can copy that tgz to my laptop,
> and grab a copy of the cluster configurations, and in a perfect world I
> would then be able to run everything in the same way
>
> [matt@laptop ~]$ HADOOP_CONF_DIR=path/to/hadoop/conf 
> SPARK_HOME=spark-1.6.0-bin-hadoop2.6 
> spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode client
>
> Notice there are two things that are different:
>
>1. My local username on my laptop is ‘matt’, which does not match my
>name on the remote machine.
>2. The Hadoop configs live somewhere other than /etc/hadoop/conf
>
> Alas, #1 proves fatal because of cluster permissions (there is no
> /user/matt/ in HDFS, and ‘matt’ is not a valid LDAP user). In the
> initialization logging output, I can see that fail in an expected way:
>
> 16/04/01 16:37:19 INFO yarn.Client: Setting up container launch context for 
> our AM
> 16/04/01 16:37:19 INFO yarn.Client: Setting up the launch environment for our 
> AM container
> 16/04/01 16:37:19 INFO yarn.Client: Preparing resources for our AM container
> 16/04/01 16:37:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/04/01 16:37:21 ERROR spark.SparkContext: Error initializing SparkContext.
> org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=matt, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
> at 
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
> at 
> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
> at (... etc ...)
>
> Fine. In other circumstances I’ve told Hadoop explicitly who I am by
> setting HADOOP_USER_NAME. Maybe that works here?
>
> [matt@laptop ~]$ HADOOP_USER_NAME=matt.tenenbaum HADOOP_CONF_DIR=soma-conf 
> SPARK_HOME=spark-1.6.0-bin-hadoop2.6 
> spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode client
>
> Eventually that fails too, but not for the same reason. Setting
> HADOOP_USER_NAME is sufficient to allow initialization to get past the
> access-control problems, and I can see it request a new application from
> the cluster
>
> 16/04/01 16:43:08 INFO yarn.Client: Will allocate AM container, with 896 MB 
> memory including 384 MB overhead
> 16/04/01 16:43:08 INFO yarn.Client: Setting up container launch context for 
> our AM
> 16/04/01 16:43:08 INFO yarn.Client: Setting up the launch environment for our 
> AM container
> 16/04/01 16:43:08 INFO yarn.Client: Preparing resources for our AM container
> ... [resource uploads happen here] ...
> 16/04/01 16:46:16 INFO spark.SecurityManager: Changing view acls to: 
> matt,matt.tenenbaum
> 16/04/01 

spark-shell with different username

2016-04-01 Thread Matt Tenenbaum
Hello all —

tl;dr: I’m having an issue running spark-shell from my laptop (or other
non-cluster-affiliated machine), and I think the issue boils down to
usernames. Can I convince spark/scala that I’m someone other than $USER?

A bit of background: our cluster is CDH 5.4.8, installed with Cloudera
Manager 5.5. We use LDAP, and my login on all hadoop-affiliated machines
(including the gateway boxes we use for running scheduled work) is
‘matt.tenenbaum’. When I run spark-shell on one of those machines,
everything is fine:

[matt.tenenbaum@remote-machine ~]$ HADOOP_CONF_DIR=/etc/hadoop/conf
SPARK_HOME=spark-1.6.0-bin-hadoop2.6
spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode
client

Everything starts up correctly, I get a scala prompt, the SparkContext and
SQL context are correctly initialized, and I’m off to the races:

16/04/01 23:27:00 INFO session.SessionState: Created local directory:
/tmp/35b58974-dad5-43c6-9864-43815d101ca0_resources
16/04/01 23:27:00 INFO session.SessionState: Created HDFS directory:
/tmp/hive/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0
16/04/01 23:27:00 INFO session.SessionState: Created local directory:
/tmp/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0
16/04/01 23:27:00 INFO session.SessionState: Created HDFS directory:
/tmp/hive/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0/_tmp_space.db
16/04/01 23:27:00 INFO repl.SparkILoop: Created sql context (with Hive
support)..
SQL context available as sqlContext.

scala> 1 + 41
res0: Int = 42

scala> sc
res1: org.apache.spark.SparkContext = org.apache.spark.SparkContext@4e9bd2c8

I am running 1.6 from a downloaded tgz file, rather than the spark-shell
made available to the cluster from CDH. I can copy that tgz to my laptop,
and grab a copy of the cluster configurations, and in a perfect world I
would then be able to run everything in the same way

[matt@laptop ~]$ HADOOP_CONF_DIR=path/to/hadoop/conf
SPARK_HOME=spark-1.6.0-bin-hadoop2.6
spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode
client

Notice there are two things that are different:

   1. My local username on my laptop is ‘matt’, which does not match my
   name on the remote machine.
   2. The Hadoop configs live somewhere other than /etc/hadoop/conf

Alas, #1 proves fatal because of cluster permissions (there is no
/user/matt/ in HDFS, and ‘matt’ is not a valid LDAP user). In the
initialization logging output, I can see that fail in an expected way:

16/04/01 16:37:19 INFO yarn.Client: Setting up container launch
context for our AM
16/04/01 16:37:19 INFO yarn.Client: Setting up the launch environment
for our AM container
16/04/01 16:37:19 INFO yarn.Client: Preparing resources for our AM container
16/04/01 16:37:20 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
16/04/01 16:37:21 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.hadoop.security.AccessControlException: Permission denied:
user=matt, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at (... etc ...)

Fine. In other circumstances I’ve told Hadoop explicitly who I am by
setting HADOOP_USER_NAME. Maybe that works here?

[matt@laptop ~]$ HADOOP_USER_NAME=matt.tenenbaum
HADOOP_CONF_DIR=soma-conf SPARK_HOME=spark-1.6.0-bin-hadoop2.6
spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode
client

Eventually that fails too, but not for the same reason. Setting
HADOOP_USER_NAME is sufficient to allow initialization to get past the
access-control problems, and I can see it request a new application from
the cluster

16/04/01 16:43:08 INFO yarn.Client: Will allocate AM container, with
896 MB memory including 384 MB overhead
16/04/01 16:43:08 INFO yarn.Client: Setting up container launch
context for our AM
16/04/01 16:43:08 INFO yarn.Client: Setting up the launch environment
for our AM container
16/04/01 16:43:08 INFO yarn.Client: Preparing resources for our AM container
... [resource uploads happen here] ...
16/04/01 16:46:16 INFO spark.SecurityManager: Changing view acls to:
matt,matt.tenenbaum
16/04/01 16:46:16 INFO spark.SecurityManager: Changing modify acls to:
matt,matt.tenenbaum
16/04/01 16:46:16 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(matt, matt.tenenbaum); users with modify permissions:
Set(matt, matt.tenenbaum)
16/04/01 16:46:16 INFO yarn.Client: Submitting application 30965 to
ResourceManager
16/04/01 16:46:16 INFO impl.YarnClientImpl: Submitted application
application_1451332794331_30965
16/04/01 16:46:17 INFO yarn.Client: Application report for
application_1451332794331_30965 (state: ACCEPTED)
16/04/01