Re: write.df is failing on Spark Cluster

Kevin Mellott Tue, 20 Sep 2016 09:10:56 -0700

Can you please post the line of code that is doing the df.write command?

On Tue, Sep 20, 2016 at 9:29 AM, Sankar Mittapally <
sankar.mittapa...@creditvidya.com> wrote:


> Hey Kevin,
>
> It is a empty directory, It is able to write part files to the directory
> but while merging those part files we are getting above error.
>
> Regards
>
>
> On Tue, Sep 20, 2016 at 7:46 PM, Kevin Mellott <kevin.r.mell...@gmail.com>
> wrote:
>
>> Have you checked to see if any files already exist at
>> /nfspartition/sankar/banking_l1_v2.csv? If so, you will need to delete
>> them before attempting to save your DataFrame to that location.
>> Alternatively, you may be able to specify the "mode" setting of the
>> df.write operation to "overwrite", depending on the version of Spark you
>> are running.
>>
>> *ERROR (from log)*
>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task
>> _201609170802_0013_m_000000/.part-r-00000-46a7f178-2490-444
>> e-9110-510978eaaecb.csv.crc]:
>> it still exists.
>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task
>> _201609170802_0013_m_000000/part-r-00000-46a7f178-2490-444
>> e-9110-510978eaaecb.csv]:
>> it still exists.
>>
>> *df.write Documentation*
>> http://spark.apache.org/docs/latest/api/R/write.df.html
>>
>> Thanks,
>> Kevin
>>
>> On Tue, Sep 20, 2016 at 12:16 AM, sankarmittapally <
>> sankar.mittapa...@creditvidya.com> wrote:
>>
>>>  We have setup a spark cluster which is on NFS shared storage, there is
>>> no
>>> permission issues with NFS storage, all the users are able to write to
>>> NFS
>>> storage. When I fired write.df command in SparkR, I am getting below. Can
>>> some one please help me to fix this issue.
>>>
>>>
>>> 16/09/17 08:03:28 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
>>> java.io.IOException: Failed to rename DeprecatedRawLocalFileStatus
>>> {path=file:/nfspartition/sankar/banking_l1_v2.csv/_temporary
>>> /0/task_201609170802_0013_m_000000/part-r-00000-46a7f178-249
>>> 0-444e-9110-510978eaaecb.csv;
>>> isDirectory=false; length=436486316; replication=1; blocksize=33554432;
>>> modification_time=1474099400000; access_time=0; owner=; group=;
>>> permission=rw-rw-rw-; isSymlink=false}
>>> to
>>> file:/nfspartition/sankar/banking_l1_v2.csv/part-r-00000-46a
>>> 7f178-2490-444e-9110-510978eaaecb.csv
>>> at
>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.m
>>> ergePaths(FileOutputCommitter.java:371)
>>> at
>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.m
>>> ergePaths(FileOutputCommitter.java:384)
>>> at
>>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.c
>>> ommitJob(FileOutputCommitter.java:326)
>>> at
>>> org.apache.spark.sql.execution.datasources.BaseWriterContain
>>> er.commitJob(WriterContainer.scala:222)
>>> at
>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>> sRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoo
>>> pFsRelationCommand.scala:144)
>>> at
>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>>> tionCommand.scala:115)
>>> at
>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>>> tionCommand.scala:115)
>>> at
>>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
>>> nId(SQLExecution.scala:57)
>>> at
>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>> sRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>>> at
>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>>> ideEffectResult$lzycompute(commands.scala:60)
>>> at
>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>>> ideEffectResult(commands.scala:58)
>>> at
>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.d
>>> oExecute(commands.scala:74)
>>> at
>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.
>>> apply(SparkPlan.scala:115)
>>> at
>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.
>>> apply(SparkPlan.scala:115)
>>> at
>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQue
>>> ry$1.apply(SparkPlan.scala:136)
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>>> onScope.scala:151)
>>> at
>>> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkP
>>> lan.scala:133)
>>> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>>> at
>>> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompu
>>> te(QueryExecution.scala:86)
>>> at
>>> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExe
>>> cution.scala:86)
>>> at
>>> org.apache.spark.sql.execution.datasources.DataSource.write(
>>> DataSource.scala:487)
>>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
>>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>> ssorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>> thodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at
>>> org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBac
>>> kendHandler.scala:141)
>>> at
>>> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackend
>>> Handler.scala:86)
>>> at
>>> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackend
>>> Handler.scala:38)
>>> at
>>> io.netty.channel.SimpleChannelInboundHandler.channelRead(Sim
>>> pleChannelInboundHandler.java:105)
>>> at
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>>> Read(AbstractChannelHandlerContext.java:308)
>>> at
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>>> ad(AbstractChannelHandlerContext.java:294)
>>> at
>>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(M
>>> essageToMessageDecoder.java:103)
>>> at
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>>> Read(AbstractChannelHandlerContext.java:308)
>>> at
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>>> ad(AbstractChannelHandlerContext.java:294)
>>> at
>>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(Byte
>>> ToMessageDecoder.java:244)
>>> at
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannel
>>> Read(AbstractChannelHandlerContext.java:308)
>>> at
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRe
>>> ad(AbstractChannelHandlerContext.java:294)
>>> at
>>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(Defa
>>> ultChannelPipeline.java:846)
>>> at
>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.re
>>> ad(AbstractNioByteChannel.java:131)
>>> at
>>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEven
>>> tLoop.java:511)
>>> at
>>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimiz
>>> ed(NioEventLoop.java:468)
>>> at
>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEve
>>> ntLoop.java:382)
>>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>>> at
>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(Sin
>>> gleThreadEventExecutor.java:111)
>>> at
>>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnabl
>>> eDecorator.run(DefaultThreadFactory.java:137)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task
>>> _201609170802_0013_m_000000/.part-r-00000-46a7f178-2490-444
>>> e-9110-510978eaaecb.csv.crc]:
>>> it still exists.
>>> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or
>>> dir[/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task
>>> _201609170802_0013_m_000000/part-r-00000-46a7f178-2490-444
>>> e-9110-510978eaaecb.csv]:
>>> it still exists.
>>> 16/09/17 08:03:28 ERROR DefaultWriterContainer: Job job_201609170803_0000
>>> aborted.
>>> 16/09/17 08:03:28 ERROR RBackendHandler: save on 625 failed
>>> Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
>>> org.apache.spark.SparkException: Job aborted.
>>> at
>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>> sRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoo
>>> pFsRelationCommand.scala:149)
>>> at
>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>>> tionCommand.scala:115)
>>> at
>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>> sRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRela
>>> tionCommand.scala:115)
>>> at
>>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
>>> nId(SQLExecution.scala:57)
>>> at
>>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopF
>>> sRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>>> at
>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>>> ideEffectResult$lzycompute(commands.scala:60)
>>> at
>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.s
>>> ideEffectResult(commands.scala:58)
>>> at org.apache.spark.sql.execution.command.ExecutedCommandExec.doE
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/write-df-is-failing-on-Spark-Cluster-tp27761.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>
>
>
> --
> Regards
>
> Sankar Mittapally
> Senior Software Engineer
>

Re: write.df is failing on Spark Cluster

Reply via email to