[jira] [Assigned] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21067: Assignee: Apache Spark > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0, 2.4.0, 2.4.3 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard >Assignee: Apache Spark >Priority: Major > Attachments: SPARK-21067.patch > > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.
[jira] [Assigned] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21067: Assignee: (was: Apache Spark) > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0, 2.4.0, 2.4.3 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard >Priority: Major > Attachments: SPARK-21067.patch > > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) > at or