Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

Tomasz Fruboes Wed, 20 May 2015 00:33:08 -0700

Hi,

 thanks for answer. The rights are


drwxr-xr-x 3 tfruboes all 5632 05-19 15:40 test19EE/

I have tried setting the rights to 777 for this directory prior toexecution. This does not get propagated down the chain, ie the directorycreated as a result of the "save" call (namesAndAges.parquet2 in thepath in the dump [1] below) is created with the drwxr-xr-x rights (ownedby the user submitting the job, ie tfruboes). The temp directoriescreated inside


namesAndAges.parquet2/_temporary/0/

(e.g. task_201505200920_0009_r_000001) are owned by root, again withdrwxr-xr-x access rights


 Cheers,
  Tomasz

W dniu 19.05.2015 o 23:56, Davies Liu pisze:

It surprises me, could you list the owner information of
/mnt/lustre/bigdata/med_home/tmp/test19EE/ ?

On Tue, May 19, 2015 at 8:15 AM, Tomasz Fruboes
<tomasz.frub...@fuw.edu.pl> wrote:

Dear Experts,

  we have a spark cluster (standalone mode) in which master and workers are
started from root account. Everything runs correctly to the point when we
try doing operations such as

     dataFrame.select("name", "age").save(ofile, "parquet")

or

     rdd.saveAsPickleFile(ofile)

, where ofile is path on a network exported filesystem (visible on all
nodes, in our case this is lustre, I guess on nfs effect would be similar).

  Unsurprisingly temp files created on workers are owned by root, which then
leads to a crash (see [1] below). Is there a solution/workaround for this
(e.g. controlling file creation mode of the temporary files)?

Cheers,
  Tomasz


ps I've tried to google this problem, couple of similar reports, but no
clear answer/solution found

ps2 For completeness - running master/workers as a regular user solves the
problem only for the given user. For other users submitting to this master
the result is given in [2] below


[0] Cluster details:
Master/workers: centos 6.5
Spark 1.3.1 prebuilt for hadoop 2.4 (same behaviour for the 2.6 build)


[1]
##################################################################
    File
"/mnt/home/tfruboes/2015.05.SparkLocal/spark-1.3.1-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o27.save.
: java.io.IOException: Failed to rename
DeprecatedRawLocalFileStatus{path=file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/_temporary/0/task_201505191540_0009_r_000001/part-r-00002.parquet;
isDirectory=false; length=534; replication=1; blocksize=33554432;
modification_time=1432042832000; access_time=0; owner=; group=;
permission=rw-rw-rw-; isSymlink=false} to
file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/part-r-00002.parquet
         at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346)
         at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
         at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
         at
parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:43)
         at
org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:690)
         at
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:129)
         at
org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240)
         at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1196)
         at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1181)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
         at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:606)
         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
         at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
         at py4j.Gateway.invoke(Gateway.java:259)
         at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
         at py4j.commands.CallCommand.execute(CallCommand.java:79)
         at py4j.GatewayConnection.run(GatewayConnection.java:207)
         at java.lang.Thread.run(Thread.java:745)
##################################################################



[2]
##################################################################
15/05/19 14:45:19 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 3,
wn23023.cis.gov.pl): java.io.IOException: Mkdirs failed to create
file:/mnt/lustre/bigdata/med_home/tmp/test18/namesAndAges.parquet2/_temporary/0/_temporary/attempt_201505191445_0009_r_000000_0
         at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438)
         at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
         at
parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:154)
         at
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:279)
         at
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
         at
org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667)
         at
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
         at
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
         at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
         at org.apache.spark.scheduler.Task.run(Task.scala:64)
         at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
         at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
         at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
         at java.lang.Thread.run(Thread.java:745)
##################################################################

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

Reply via email to