Talha Azaz created HADOOP-16994:
-----------------------------------

             Summary: hadoop output to ftp gives rename error on 
FileOutputCommitter.mergePaths
                 Key: HADOOP-16994
                 URL: https://issues.apache.org/jira/browse/HADOOP-16994
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs
            Reporter: Talha Azaz


i'm using spark in kubernetes cluster mode and trying to write read data from 
DB and write in parquet format to ftp server. I'm using hadoop ftp filesystem 
for writing. When the task completes, it tries to rename 
/sensor_values/1585353600000/_temporary/0/_temporary/attempt_20200414075519_0000_m_000021_21/part-00021-d7cef14e-151b-4c3b-a8d8-4e9ab33e80f9-c000.snappy.parquet
to 
/sensor_values/1585353600000/part-00021-d7cef14e-151b-4c3b-a8d8-4e9ab33e80f9-c000.snappy.parquet

But the problem is it gives the following error:

```
Lost task 21.0 in stage 0.0 (TID 21, 10.233.90.137, executor 3): 
org.apache.spark.SparkException: Task failed while writing rows.
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:257)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
 at org.apache.spark.scheduler.Task.run(Task.scala:123)
 at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Cannot rename source: 
ftp://user:pass@host/sensor_values/1585353600000/_temporary/0/_temporary/attempt_20200414075519_0000_m_000021_21/part-00021-d7cef14e-151b-4c3b-a8d8-4e9ab33e80f9-c000.snappy.parquet
 to 
ftp://user:pass@host/sensor_values/1585353600000/part-00021-d7cef14e-151b-4c3b-a8d8-4e9ab33e80f9-c000.snappy.parquet
 -only same directory renames are supported
 at org.apache.hadoop.fs.ftp.FTPFileSystem.rename(FTPFileSystem.java:674)
 at org.apache.hadoop.fs.ftp.FTPFileSystem.rename(FTPFileSystem.java:613)
 at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:472)
 at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:486)
 at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:597)
 at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:560)
 at 
org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
 at 
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:77)
 at 
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:225)
 at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:78)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242)
 at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248)
 ... 10 more
```

I have done the same thing on Azure filesystem using same spark and hadoop 
implimentation. 
Is there any configuration in hadoop or spark that needs to be changed or is it 
just not supported in hadoop ftp file System?
Thanks a lot!!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to