[jira] [Commented] (SPARK-7791) Set user for executors in standalone-mode

2015-08-03 Thread Niels Becker (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652660#comment-14652660
 ] 

Niels Becker commented on SPARK-7791:
-

We endet up using your workaround. But since our Spark-Slaves are running 
inside docker containers and GlusterFS is mounted on the host mashine, we were 
able to only mount the according user folders into the docker container by 
setting {{spark.mesos.executor.docker.volumes}}. This way Spark is not able to 
write to other users. 

 Set user for executors in standalone-mode
 -

 Key: SPARK-7791
 URL: https://issues.apache.org/jira/browse/SPARK-7791
 Project: Spark
  Issue Type: Wish
  Components: Spark Core
Reporter: Tomasz Früboes

 I'm opening this following a discussion in 
 https://www.mail-archive.com/user@spark.apache.org/msg28633.html
  Our setup was following. Spark (1.3.1, prebuilt for hadoop 2.6, also 2.4) 
 was installed in the standalone mode and started manually from the root 
 account. Everything worked properly apart of operations  such us
 rdd.saveAsPickleFile(ofile)
 which end with exception:
 py4j.protocol.Py4JJavaError: An error occurred while calling o27.save.
 : java.io.IOException: Failed to rename 
 DeprecatedRawLocalFileStatus{path=file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/_temporary/0/task_201505191540_0009_r_01/part-r-2.parquet;
  isDirectory=false; length=534; replication=1; blocksize=33554432; 
 modification_time=1432042832000; access_time=0; owner=; group=; 
 permission=rw-rw-rw-; isSymlink=false} to 
 file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/part-r-2.parquet
  at 
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346)
 (files created in _temporary were owned by user root). It would be great if 
 spark could set the user for the executor also in standalone mode. Setting 
 SPARK_USER has no effect here.
 BTW it may be a good idea to add some warning (e.g. during spark startup) 
 that running from root account is not very healthy idea. E.g. mapping this 
 function 
 def test(x):
f = open('/etc/testTMF.txt', 'w')
return 0
 on a rdd creates a file in /etc/ (surprisingly calls like f.Write(text) end 
 with an exception)
 Thanks,
   Tomasz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7791) Set user for executors in standalone-mode

2015-07-31 Thread Niels Becker (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649753#comment-14649753
 ] 

Niels Becker commented on SPARK-7791:
-

I run into the same problem saving a dataframe as parquet.
Our Environment:
- Ubuntu 14
- Spark 1.4.1 prebuild for Hadoop 2.6
- GlusterFS 3.7
- Mesos 0.23.0
- Docker 1.7.1

Start _pyspark_ as _sparkuser_ and load some data into a dataframe {{df}}. Then 
run {{df.write.format(parquet).save(/data/test/wikipedia_test.parquet)}}
_/data_ is a GlusterFS voulme on each node
_/data/test_ permissions:
{code}
# owner: sparkuser
# group: sparkuser
# flags: -s-
user::rwx
group::rwx
other::r-x
default:user::rwx
default:group::rwx
default:other::r-x
{code}

Tomasz described a workaround in 
[https://www.mail-archive.com/user@spark.apache.org/msg28820.html] but that 
does not work for us.
Interesting thing is that {{*.gz.parquet}} files have {noformat}root:sparkuser 
-rw-r--r--{noformat} permissions
but {{*.gz.parquet.crc}} files have {noformat}root:sparkuser 
-rw-rw-r--{noformat} permissions like they should have been.
This sugests that spark does not use default file permissions at least for 
parquet files.

I can confirm that setting {{SPARK_USER}} to either {{root}} nor {{sparkuser}} 
has no effekt.
Running pyspark as root works.

I assume that all spark tasks are executed as root and overwrite the default 
file permissions but do not change the user.
So after the job is done the driver tries to rename the files to its final 
destination but fails because lack of permissions.

 Set user for executors in standalone-mode
 -

 Key: SPARK-7791
 URL: https://issues.apache.org/jira/browse/SPARK-7791
 Project: Spark
  Issue Type: Wish
  Components: Spark Core
Reporter: Tomasz Früboes

 I'm opening this following a discussion in 
 https://www.mail-archive.com/user@spark.apache.org/msg28633.html
  Our setup was following. Spark (1.3.1, prebuilt for hadoop 2.6, also 2.4) 
 was installed in the standalone mode and started manually from the root 
 account. Everything worked properly apart of operations  such us
 rdd.saveAsPickleFile(ofile)
 which end with exception:
 py4j.protocol.Py4JJavaError: An error occurred while calling o27.save.
 : java.io.IOException: Failed to rename 
 DeprecatedRawLocalFileStatus{path=file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/_temporary/0/task_201505191540_0009_r_01/part-r-2.parquet;
  isDirectory=false; length=534; replication=1; blocksize=33554432; 
 modification_time=1432042832000; access_time=0; owner=; group=; 
 permission=rw-rw-rw-; isSymlink=false} to 
 file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/part-r-2.parquet
  at 
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346)
 (files created in _temporary were owned by user root). It would be great if 
 spark could set the user for the executor also in standalone mode. Setting 
 SPARK_USER has no effect here.
 BTW it may be a good idea to add some warning (e.g. during spark startup) 
 that running from root account is not very healthy idea. E.g. mapping this 
 function 
 def test(x):
f = open('/etc/testTMF.txt', 'w')
return 0
 on a rdd creates a file in /etc/ (surprisingly calls like f.Write(text) end 
 with an exception)
 Thanks,
   Tomasz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7791) Set user for executors in standalone-mode

2015-07-31 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649921#comment-14649921
 ] 

Tomasz Früboes commented on SPARK-7791:
---

For us the final solution was to leave the standalone mode and setup YARN (and 
than use spark on YARN). Yarn can be configured to use LinuxContainerExecutor, 
which can be configured to set proper userids. No idea how mesos works but 
maybe something similar can be done?

 Set user for executors in standalone-mode
 -

 Key: SPARK-7791
 URL: https://issues.apache.org/jira/browse/SPARK-7791
 Project: Spark
  Issue Type: Wish
  Components: Spark Core
Reporter: Tomasz Früboes

 I'm opening this following a discussion in 
 https://www.mail-archive.com/user@spark.apache.org/msg28633.html
  Our setup was following. Spark (1.3.1, prebuilt for hadoop 2.6, also 2.4) 
 was installed in the standalone mode and started manually from the root 
 account. Everything worked properly apart of operations  such us
 rdd.saveAsPickleFile(ofile)
 which end with exception:
 py4j.protocol.Py4JJavaError: An error occurred while calling o27.save.
 : java.io.IOException: Failed to rename 
 DeprecatedRawLocalFileStatus{path=file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/_temporary/0/task_201505191540_0009_r_01/part-r-2.parquet;
  isDirectory=false; length=534; replication=1; blocksize=33554432; 
 modification_time=1432042832000; access_time=0; owner=; group=; 
 permission=rw-rw-rw-; isSymlink=false} to 
 file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/part-r-2.parquet
  at 
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346)
 (files created in _temporary were owned by user root). It would be great if 
 spark could set the user for the executor also in standalone mode. Setting 
 SPARK_USER has no effect here.
 BTW it may be a good idea to add some warning (e.g. during spark startup) 
 that running from root account is not very healthy idea. E.g. mapping this 
 function 
 def test(x):
f = open('/etc/testTMF.txt', 'w')
return 0
 on a rdd creates a file in /etc/ (surprisingly calls like f.Write(text) end 
 with an exception)
 Thanks,
   Tomasz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org