[jira] [Commented] (SPARK-7791) Set user for executors in standalone-mode
[ https://issues.apache.org/jira/browse/SPARK-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652660#comment-14652660 ] Niels Becker commented on SPARK-7791: - We endet up using your workaround. But since our Spark-Slaves are running inside docker containers and GlusterFS is mounted on the host mashine, we were able to only mount the according user folders into the docker container by setting {{spark.mesos.executor.docker.volumes}}. This way Spark is not able to write to other users. Set user for executors in standalone-mode - Key: SPARK-7791 URL: https://issues.apache.org/jira/browse/SPARK-7791 Project: Spark Issue Type: Wish Components: Spark Core Reporter: Tomasz Früboes I'm opening this following a discussion in https://www.mail-archive.com/user@spark.apache.org/msg28633.html Our setup was following. Spark (1.3.1, prebuilt for hadoop 2.6, also 2.4) was installed in the standalone mode and started manually from the root account. Everything worked properly apart of operations such us rdd.saveAsPickleFile(ofile) which end with exception: py4j.protocol.Py4JJavaError: An error occurred while calling o27.save. : java.io.IOException: Failed to rename DeprecatedRawLocalFileStatus{path=file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/_temporary/0/task_201505191540_0009_r_01/part-r-2.parquet; isDirectory=false; length=534; replication=1; blocksize=33554432; modification_time=1432042832000; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false} to file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/part-r-2.parquet at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346) (files created in _temporary were owned by user root). It would be great if spark could set the user for the executor also in standalone mode. Setting SPARK_USER has no effect here. BTW it may be a good idea to add some warning (e.g. during spark startup) that running from root account is not very healthy idea. E.g. mapping this function def test(x): f = open('/etc/testTMF.txt', 'w') return 0 on a rdd creates a file in /etc/ (surprisingly calls like f.Write(text) end with an exception) Thanks, Tomasz -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7791) Set user for executors in standalone-mode
[ https://issues.apache.org/jira/browse/SPARK-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649753#comment-14649753 ] Niels Becker commented on SPARK-7791: - I run into the same problem saving a dataframe as parquet. Our Environment: - Ubuntu 14 - Spark 1.4.1 prebuild for Hadoop 2.6 - GlusterFS 3.7 - Mesos 0.23.0 - Docker 1.7.1 Start _pyspark_ as _sparkuser_ and load some data into a dataframe {{df}}. Then run {{df.write.format(parquet).save(/data/test/wikipedia_test.parquet)}} _/data_ is a GlusterFS voulme on each node _/data/test_ permissions: {code} # owner: sparkuser # group: sparkuser # flags: -s- user::rwx group::rwx other::r-x default:user::rwx default:group::rwx default:other::r-x {code} Tomasz described a workaround in [https://www.mail-archive.com/user@spark.apache.org/msg28820.html] but that does not work for us. Interesting thing is that {{*.gz.parquet}} files have {noformat}root:sparkuser -rw-r--r--{noformat} permissions but {{*.gz.parquet.crc}} files have {noformat}root:sparkuser -rw-rw-r--{noformat} permissions like they should have been. This sugests that spark does not use default file permissions at least for parquet files. I can confirm that setting {{SPARK_USER}} to either {{root}} nor {{sparkuser}} has no effekt. Running pyspark as root works. I assume that all spark tasks are executed as root and overwrite the default file permissions but do not change the user. So after the job is done the driver tries to rename the files to its final destination but fails because lack of permissions. Set user for executors in standalone-mode - Key: SPARK-7791 URL: https://issues.apache.org/jira/browse/SPARK-7791 Project: Spark Issue Type: Wish Components: Spark Core Reporter: Tomasz Früboes I'm opening this following a discussion in https://www.mail-archive.com/user@spark.apache.org/msg28633.html Our setup was following. Spark (1.3.1, prebuilt for hadoop 2.6, also 2.4) was installed in the standalone mode and started manually from the root account. Everything worked properly apart of operations such us rdd.saveAsPickleFile(ofile) which end with exception: py4j.protocol.Py4JJavaError: An error occurred while calling o27.save. : java.io.IOException: Failed to rename DeprecatedRawLocalFileStatus{path=file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/_temporary/0/task_201505191540_0009_r_01/part-r-2.parquet; isDirectory=false; length=534; replication=1; blocksize=33554432; modification_time=1432042832000; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false} to file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/part-r-2.parquet at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346) (files created in _temporary were owned by user root). It would be great if spark could set the user for the executor also in standalone mode. Setting SPARK_USER has no effect here. BTW it may be a good idea to add some warning (e.g. during spark startup) that running from root account is not very healthy idea. E.g. mapping this function def test(x): f = open('/etc/testTMF.txt', 'w') return 0 on a rdd creates a file in /etc/ (surprisingly calls like f.Write(text) end with an exception) Thanks, Tomasz -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7791) Set user for executors in standalone-mode
[ https://issues.apache.org/jira/browse/SPARK-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649921#comment-14649921 ] Tomasz Früboes commented on SPARK-7791: --- For us the final solution was to leave the standalone mode and setup YARN (and than use spark on YARN). Yarn can be configured to use LinuxContainerExecutor, which can be configured to set proper userids. No idea how mesos works but maybe something similar can be done? Set user for executors in standalone-mode - Key: SPARK-7791 URL: https://issues.apache.org/jira/browse/SPARK-7791 Project: Spark Issue Type: Wish Components: Spark Core Reporter: Tomasz Früboes I'm opening this following a discussion in https://www.mail-archive.com/user@spark.apache.org/msg28633.html Our setup was following. Spark (1.3.1, prebuilt for hadoop 2.6, also 2.4) was installed in the standalone mode and started manually from the root account. Everything worked properly apart of operations such us rdd.saveAsPickleFile(ofile) which end with exception: py4j.protocol.Py4JJavaError: An error occurred while calling o27.save. : java.io.IOException: Failed to rename DeprecatedRawLocalFileStatus{path=file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/_temporary/0/task_201505191540_0009_r_01/part-r-2.parquet; isDirectory=false; length=534; replication=1; blocksize=33554432; modification_time=1432042832000; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false} to file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/part-r-2.parquet at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346) (files created in _temporary were owned by user root). It would be great if spark could set the user for the executor also in standalone mode. Setting SPARK_USER has no effect here. BTW it may be a good idea to add some warning (e.g. during spark startup) that running from root account is not very healthy idea. E.g. mapping this function def test(x): f = open('/etc/testTMF.txt', 'w') return 0 on a rdd creates a file in /etc/ (surprisingly calls like f.Write(text) end with an exception) Thanks, Tomasz -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org