[ https://issues.apache.org/jira/browse/SPARK-31675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108073#comment-17108073 ]
Wenchen Fan commented on SPARK-31675: ------------------------------------- This is not a new bug in 3.0, and shouldn't be marked as blocker. I'm changing to major. > Fail to insert data to a table with remote location which causes by hive > encryption check > ----------------------------------------------------------------------------------------- > > Key: SPARK-31675 > URL: https://issues.apache.org/jira/browse/SPARK-31675 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.6, 3.0.0, 3.1.0 > Reporter: Kent Yao > Priority: Blocker > > Before this fix https://issues.apache.org/jira/browse/HIVE-14380 in Hive > 2.2.0, when moving files from staging dir to the final table dir, Hive will > do encryption check for the srcPaths and destPaths > {code:java} > // Some comments here > if (!isSrcLocal) { > // For NOT local src file, rename the file > if (hdfsEncryptionShim != null && > (hdfsEncryptionShim.isPathEncrypted(srcf) || > hdfsEncryptionShim.isPathEncrypted(destf)) > && !hdfsEncryptionShim.arePathsOnSameEncryptionZone(srcf, destf)) > { > LOG.info("Copying source " + srcf + " to " + destf + " because HDFS > encryption zones are different."); > success = FileUtils.copy(srcf.getFileSystem(conf), srcf, > destf.getFileSystem(conf), destf, > true, // delete source > replace, // overwrite destination > conf); > } else { > {code} > The hdfsEncryptionShim instance holds a global FileSystem instance belong to > the default fileSystem. It causes failures when checking a path that belongs > to a remote file system. > For example, I > {code:sql} > key int NULL > # Detailed Table Information > Database bdms_hzyaoqin_test_2 > Table abc > Owner bdms_hzyaoqin > Created Time Mon May 11 15:14:15 CST 2020 > Last Access Thu Jan 01 08:00:00 CST 1970 > Created By Spark 2.4.3 > Type MANAGED > Provider hive > Table Properties [transient_lastDdlTime=1589181255] > Location hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc > Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > InputFormat org.apache.hadoop.mapred.TextInputFormat > OutputFormat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Storage Properties [serialization.format=1] > Partition Provider Catalog > Time taken: 0.224 seconds, Fetched 18 row(s) > {code} > The table abc belongs to the remote hdfs 'hdfs://cluster2', and when we run > command below via a spark sql job with default fs is ' 'hdfs://cluster1' > {code:sql} > insert into bdms_hzyaoqin_test_2.abc values(1); > {code} > {code:java} > Error in query: java.lang.IllegalArgumentException: Wrong FS: > hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-10000/part-00000-badf2a31-ab36-4b60-82a1-0848774e4af5-c000, > expected: hdfs://cluster1 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org