Bang Xiao created SPARK-26332: --------------------------------- Summary: Spark sql write orc table on viewFS throws exception Key: SPARK-26332 URL: https://issues.apache.org/jira/browse/SPARK-26332 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.1 Reporter: Bang Xiao
Using SparkSQL write orc table on viewFs will cause exception: {code:java} Task failed while writing rows. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:634) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:2103) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:2120) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:352) at org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168) at org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2413) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:86) at org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:149) at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:392) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:269) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:267) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1414) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272) ... 8 more Suppressed: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:634) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:2103) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:2120) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2425) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106) at org.apache.spark.sql.hive.execution.HiveOutputWriter.close(HiveFileFormat.scala:154) at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.releaseResources(FileFormatWriter.scala:405) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$1.apply$mcV$sp(FileFormatWriter.scala:275) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1423) ... 9 more{code} this exception can be reproduced by follow sqls: {code:java} spark-sql> CREATE EXTERNAL TABLE test_orc(test_id INT, test_age INT, test_rank INT) STORED AS ORC LOCATION 'viewfs://nsX/user/hive/warehouse/ultraman_tmp.db/test_orc'; spark-sql> CREATE TABLE source(id INT, age INT, rank INT); spark-sql> INSERT INTO source VALUES(1,1,1); spark-sql> INSERT OVERWRITE TABLE test_orc SELECT * FROM source; {code} this is related to https://issues.apache.org/jira/browse/HIVE-10790. and resolved after hive-2.0.0 , While SparkSQL depends on hive-1.2.1-Spark2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org