Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

Iulian Dragoș Wed, 20 May 2015 01:58:06 -0700

You could try setting `SPARK_USER` to the user under which your workers are
running. I couldn't find many references to this variable, but at least
Yarn and Mesos take it into account when spawning executors. Chances are
that standalone mode also does it.


iulian

On Wed, May 20, 2015 at 9:29 AM, Tomasz Fruboes <tomasz.frub...@fuw.edu.pl>
wrote:

> Hi,
>
>  thanks for answer. The rights are
>
> drwxr-xr-x 3 tfruboes all 5632 05-19 15:40 test19EE/
>
>  I have tried setting the rights to 777 for this directory prior to
> execution. This does not get propagated down the chain, ie the directory
> created as a result of the "save" call (namesAndAges.parquet2 in the path
> in the dump [1] below) is created with the drwxr-xr-x rights (owned by the
> user submitting the job, ie tfruboes). The temp directories created inside
>
> namesAndAges.parquet2/_temporary/0/
>
> (e.g. task_201505200920_0009_r_000001) are owned by root, again with
> drwxr-xr-x access rights
>
>  Cheers,
>   Tomasz
>
> W dniu 19.05.2015 o 23:56, Davies Liu pisze:
>
>  It surprises me, could you list the owner information of
>> /mnt/lustre/bigdata/med_home/tmp/test19EE/ ?
>>
>> On Tue, May 19, 2015 at 8:15 AM, Tomasz Fruboes
>> <tomasz.frub...@fuw.edu.pl> wrote:
>>
>>> Dear Experts,
>>>
>>>   we have a spark cluster (standalone mode) in which master and workers
>>> are
>>> started from root account. Everything runs correctly to the point when we
>>> try doing operations such as
>>>
>>>      dataFrame.select("name", "age").save(ofile, "parquet")
>>>
>>> or
>>>
>>>      rdd.saveAsPickleFile(ofile)
>>>
>>> , where ofile is path on a network exported filesystem (visible on all
>>> nodes, in our case this is lustre, I guess on nfs effect would be
>>> similar).
>>>
>>>   Unsurprisingly temp files created on workers are owned by root, which
>>> then
>>> leads to a crash (see [1] below). Is there a solution/workaround for this
>>> (e.g. controlling file creation mode of the temporary files)?
>>>
>>> Cheers,
>>>   Tomasz
>>>
>>>
>>> ps I've tried to google this problem, couple of similar reports, but no
>>> clear answer/solution found
>>>
>>> ps2 For completeness - running master/workers as a regular user solves
>>> the
>>> problem only for the given user. For other users submitting to this
>>> master
>>> the result is given in [2] below
>>>
>>>
>>> [0] Cluster details:
>>> Master/workers: centos 6.5
>>> Spark 1.3.1 prebuilt for hadoop 2.4 (same behaviour for the 2.6 build)
>>>
>>>
>>> [1]
>>> ##################################################################
>>>     File
>>>
>>> "/mnt/home/tfruboes/2015.05.SparkLocal/spark-1.3.1-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>>> line 300, in get_return_value
>>> py4j.protocol.Py4JJavaError: An error occurred while calling o27.save.
>>> : java.io.IOException: Failed to rename
>>>
>>> DeprecatedRawLocalFileStatus{path=file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/_temporary/0/task_201505191540_0009_r_000001/part-r-00002.parquet;
>>> isDirectory=false; length=534; replication=1; blocksize=33554432;
>>> modification_time=1432042832000; access_time=0; owner=; group=;
>>> permission=rw-rw-rw-; isSymlink=false} to
>>>
>>> file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/part-r-00002.parquet
>>>          at
>>>
>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346)
>>>          at
>>>
>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
>>>          at
>>>
>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
>>>          at
>>>
>>> parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:43)
>>>          at
>>>
>>> org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:690)
>>>          at
>>>
>>> org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:129)
>>>          at
>>> org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240)
>>>          at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1196)
>>>          at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1181)
>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>          at
>>>
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>          at
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>          at java.lang.reflect.Method.invoke(Method.java:606)
>>>          at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>>>          at
>>> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>>          at py4j.Gateway.invoke(Gateway.java:259)
>>>          at
>>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>>>          at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>>          at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>>          at java.lang.Thread.run(Thread.java:745)
>>> ##################################################################
>>>
>>>
>>>
>>> [2]
>>> ##################################################################
>>> 15/05/19 14:45:19 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 3,
>>> wn23023.cis.gov.pl): java.io.IOException: Mkdirs failed to create
>>>
>>> file:/mnt/lustre/bigdata/med_home/tmp/test18/namesAndAges.parquet2/_temporary/0/_temporary/attempt_201505191445_0009_r_000000_0
>>>          at
>>>
>>> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438)
>>>          at
>>>
>>> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
>>>          at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>>>          at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>>>          at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>>>          at
>>> parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:154)
>>>          at
>>>
>>> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:279)
>>>          at
>>>
>>> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
>>>          at
>>> org.apache.spark.sql.parquet.ParquetRelation2.org
>>> $apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667)
>>>          at
>>>
>>> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
>>>          at
>>>
>>> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
>>>          at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>>          at org.apache.spark.scheduler.Task.run(Task.scala:64)
>>>          at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>>>          at
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>          at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>          at java.lang.Thread.run(Thread.java:745)
>>> ##################################################################
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com

Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

Reply via email to