To answer my own question, in case someone else runs into this. The spark user
needs to be in the same group on the namenode, and hdfs caches that information
for it seems like at least an hour. Magically started working on its own.
Greg
From: Greg greg.h...@rackspace.commailto:greg.h...@rackspace.com
Date: Tuesday, September 9, 2014 2:30 PM
To: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: spark on yarn history server + hdfs permissions issue
I am running Spark on Yarn with the HDP 2.1 technical preview. I'm having
issues getting the spark history server permissions to read the spark event
logs from hdfs. Both sides are configured to write/read logs from:
hdfs:///apps/spark/events
The history server is running as user spark, the jobs are running as user
lavaqe. Both users are in the hdfs group on all the nodes in the cluster.
That root logs folder is globally writeable, but owned by the spark user:
drwxrwxrwx - spark hdfs 0 2014-09-09 18:19 /apps/spark/events
All good so far. Spark jobs create subfolders and put their event logs in
there just fine. The problem is that the history server, running as the spark
user, cannot read those logs. They're written as the user that initiates the
job, but still in the same hdfs group:
drwxrwx--- - lavaqe hdfs 0 2014-09-09 19:24
/apps/spark/events/spark-pi-1410290714996
The files are group readable/writable, but this is the error I get:
Permission denied: user=spark, access=READ_EXECUTE,
inode=/apps/spark/events/spark-pi-1410290714996:lavaqe:hdfs:drwxrwx---
So, two questions, I guess:
1. Do group permissions just plain not work in hdfs or am I missing something?
2. Is there a way to tell Spark to log with more permissive permissions so the
history server can read the generated logs?
Greg