[jira] [Comment Edited] (SPARK-17572) Write.df is failing on spark cluster

Tarun Parmar (JIRA) Thu, 27 Dec 2018 14:52:32 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729921#comment-16729921
 ]


Tarun Parmar edited comment on SPARK-17572 at 12/27/18 10:51 PM:
-----------------------------------------------------------------

I am facing similar issue, my Spark+Hadoop version is same as Sankar's. I am 
using Spark with RStudio without hadoop to generate parquet files and store 
them in local/nfs mount. 

What I noticed is the _temporary directory is owned by my userid but the '0' 
directory inside _temporary is owned by root which is probably why it is 
failing to delete. 

Already checked with RStudio, they don't think this it is an issue with 
sparklyr package. 

 

 


was (Author: tarunparmar):
I am facing similar issue, my Spark+Hadoop version is same as Sankar's. I am 
using Spark with RStudio without hadoop to generate parquet files and store 
them in local/nfs mount. 

What I noticed is the _temporary directory is owned by my userid but the '0' 
directory inside _temporary is owned by root which is probably why it is 
failing to delete. 

Already checked with RStudio, they don't this it is an issue with sparklyr 
package. 

 

 

> Write.df is failing on spark cluster
> ------------------------------------
>
>                 Key: SPARK-17572
>                 URL: https://issues.apache.org/jira/browse/SPARK-17572
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.0.0
>            Reporter: Sankar Mittapally
>            Priority: Major
>
> Hi,
>  We have spark cluster with four nodes, all four nodes have NFS partition 
> shared(there is no HDFS), We have same uid on all servers. When we are trying 
> to write data we are getting following exceptions. I am not sure whether it 
> is a error or not and not sure will I lost the data in the output.
> The command which I am using to save the data.
> {code}
> saveDF(banking_l1_1,"banking_l1_v2.csv",source="csv",mode="append",schema="true")
> {code}
> {noformat}
> 16/09/17 08:03:28 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
> java.io.IOException: Failed to rename 
> DeprecatedRawLocalFileStatus{path=file:/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task_201609170802_0013_m_000000/part-r-00000-46a7f178-2490-444e-9110-510978eaaecb.csv;
>  isDirectory=false; length=436486316; replication=1; blocksize=33554432; 
> modification_time=1474099400000; access_time=0; owner=; group=; 
> permission=rw-rw-rw-; isSymlink=false} to 
> file:/nfspartition/sankar/banking_l1_v2.csv/part-r-00000-46a7f178-2490-444e-9110-510978eaaecb.csv
>     at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:371)
>     at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:384)
>     at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:326)
>     at 
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:222)
>     at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144)
>     at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>     at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>     at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>     at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>     at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
>     at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
>     at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>     at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>     at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>     at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>     at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>     at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>     at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>     at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
>     at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
>     at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:487)
>     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
>     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
>     at 
> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
>     at 
> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
>     at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>     at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>     at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>     at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>     at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>     at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>     at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>     at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>     at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>     at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>     at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>     at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>     at java.lang.Thread.run(Thread.java:745)
> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or dir 
> [/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task_201609170802_0013_m_000000/.part-r-00000-46a7f178-2490-444e-9110-510978eaaecb.csv.crc]:
>  it still exists.
> 16/09/17 08:03:28 WARN FileUtil: Failed to delete file or dir 
> [/nfspartition/sankar/banking_l1_v2.csv/_temporary/0/task_201609170802_0013_m_000000/part-r-00000-46a7f178-2490-444e-9110-510978eaaecb.csv]:
>  it still exists.
> 16/09/17 08:03:28 ERROR DefaultWriterContainer: Job job_201609170803_0000 
> aborted.
> 16/09/17 08:03:28 ERROR RBackendHandler: save on 625 failed
> Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : 
>   org.apache.spark.SparkException: Job aborted.
>     at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:149)
>     at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>     at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>     at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>     at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>     at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
>     at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
>     at org.apache.spark.sql.execution.command.ExecutedCommandExec.doE
> {noformat}
> Thanks
> Sankar



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-17572) Write.df is failing on spark cluster

Reply via email to