To answer my own question, in case someone else runs into this.  The spark user 
needs to be in the same group on the namenode, and hdfs caches that information 
for it seems like at least an hour.  Magically started working on its own.

Greg

From: Greg <greg.h...@rackspace.com<mailto:greg.h...@rackspace.com>>
Date: Tuesday, September 9, 2014 2:30 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: spark on yarn history server + hdfs permissions issue

I am running Spark on Yarn with the HDP 2.1 technical preview.  I'm having 
issues getting the spark history server permissions to read the spark event 
logs from hdfs.  Both sides are configured to write/read logs from:

hdfs:///apps/spark/events

The history server is running as user spark, the jobs are running as user 
lavaqe.  Both users are in the  hdfs group on all the nodes in the cluster.

That root logs folder is globally writeable, but owned by the spark user:

drwxrwxrwx   - spark hdfs          0 2014-09-09 18:19 /apps/spark/events

All good so far.  Spark jobs create subfolders and put their event logs in 
there just fine.  The problem is that the history server, running as the spark 
user, cannot read those logs.  They're written as the user that initiates the 
job, but still in the same hdfs group:

drwxrwx---   - lavaqe hdfs          0 2014-09-09 19:24 
/apps/spark/events/spark-pi-1410290714996

The files are group readable/writable, but this is the error I get:

Permission denied: user=spark, access=READ_EXECUTE, 
inode="/apps/spark/events/spark-pi-1410290714996":lavaqe:hdfs:drwxrwx---

So, two questions, I guess:

1. Do group permissions just plain not work in hdfs or am I missing something?
2. Is there a way to tell Spark to log with more permissive permissions so the 
history server can read the generated logs?

Greg

Reply via email to