[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-06-17 Thread Dominic Ricard (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865592#comment-16865592
 ] 

Dominic Ricard commented on SPARK-21067:


In 2.4, the problem also affect Parquet tables creation... So we had to move 
all our CTAS to a different process which does not use the thrift server...

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0, 2.4.0, 2.4.3
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> 

[jira] [Updated] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-06-17 Thread Dominic Ricard (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Ricard updated SPARK-21067:
---
Affects Version/s: 2.4.3

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0, 2.4.0, 2.4.3
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
> at 

[jira] [Updated] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-03-11 Thread Dominic Ricard (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Ricard updated SPARK-21067:
---
Affects Version/s: 2.4.0

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0, 2.4.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
> at 

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-09-08 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158802#comment-16158802
 ] 

Dominic Ricard commented on SPARK-21067:


[~zhangxin0112zx] Our solution was to migrate the CTAS code to use Parquet... 
CTAS for Hive Tables is Broken when using the Thrift server.

Still looking forward for a fix to this issue...

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-11 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080503#comment-16080503
 ] 

Dominic Ricard edited comment on SPARK-21067 at 7/11/17 8:00 PM:
-

Interesting findings today.

- Open *{color:#59afe1}Beeline Session 1{color}*
-- Create Table 1 (Success)
- Open *{color:#14892c}Beeline Session 2{color}*
-- Create Table 2 (Success)
- Close *{color:#59afe1}Beeline Session 1{color}*
- Create Table 3 in *{color:#14892c}Beeline Session 2{color}* 
({color:#d04437}FAIL{color})

So, it seems like the problem occurs after the 1st session is closed. What does 
the Thrift server do when a session is closed that could cause this issue?

Looking at the Hive Metastore logs, I notice that the same SQL query (CREATE 
TABLE ...) translate to different actions between the 1st and later sessions:

Session 1:
{noformat}
...
PERFLOG method=create_table_with_environment_context 
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler
...
PERFLOG method=alter_table_with_cascade 
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler
{noformat}

Session 2:
{noformat}
...
PERFLOG method=create_table_with_environment_context 
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
...
PERFLOG method=drop_table_with_environment_context 
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler
{noformat}


was (Author: dricard):
Interesting findings today.

- Open *{color:#59afe1}Beeline Session 1{color}*
-- Create Table 1 (Success)
- Open *{color:#14892c}Beeline Session 2{color}*
-- Create Table 2 (Success)
- Close *{color:#59afe1}Beeline Session 1{color}*
- Create Table 3 in *{color:#14892c}Beeline Session 2{color}* 
({color:#d04437}FAIL{color})

So, it seems like the problem occurs after the 1st session is closed. What does 
the Thrift server do when a session is closed that could cause this issue?

Looking at the Hive Metastore logs, I notice that the same SQL query (CREATE 
TABLE ...) translate to different actions between the 1st and later sessions:

Session 1:
{noformat}
PERFLOG method=alter_table_with_cascade 
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler
{noformat}

Session 2:
{noformat}
PERFLOG method=drop_table_with_environment_context 
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler
{noformat}

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-11 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082441#comment-16082441
 ] 

Dominic Ricard edited comment on SPARK-21067 at 7/11/17 7:16 PM:
-

[~smilegator] Hey Xiao, I see that you have made many contribution around 
CREATE TABLE issues. Could you shed some light on this issue? We are either 
looking for a fix or for a property to set hive.default.fileformat in Spark 2 
to have it use parquet instead of textfile, since the issue is not present when 
the fileformat is set to "parquet". 

Looks like your Pull Request to add spark.sql.default.fileformat 
(https://issues.apache.org/jira/browse/SPARK-16825) did not go in and that 
would have solve the issue with default store / fileformat...

Thanks.

FWD: [~cloud_fan], you might also know how this can be resolved...


was (Author: dricard):
[~smilegator] Hey Xiao, I see that you have made many contribution around 
CREATE TABLE issues. Could you shed some light on this issue? We are either 
looking for a fix or for a property to set hive.default.fileformat in Spark 2 
to have it use parquet instead of textfile, since the issue is not present when 
the fileformat is set to "parquet". 

Looks like your Pull Request to add spark.sql.default.fileformat 
(https://issues.apache.org/jira/browse/SPARK-16825) did not go in and that 
would have solve the issue with default store / fileformat...

Thanks.

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> 

[jira] [Updated] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-11 Thread Dominic Ricard (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Ricard updated SPARK-21067:
---
Affects Version/s: 2.2.0

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
> at 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-11 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082441#comment-16082441
 ] 

Dominic Ricard edited comment on SPARK-21067 at 7/11/17 5:15 PM:
-

[~smilegator] Hey Xiao, I see that you have made many contribution around 
CREATE TABLE issues. Could you shed some light on this issue? We are either 
looking for a fix or for a property to set hive.default.fileformat in Spark 2 
to have it use parquet instead of textfile, since the issue is not present when 
the fileformat is set to "parquet". 

Looks like your Pull Request to add spark.sql.default.fileformat 
(https://issues.apache.org/jira/browse/SPARK-16825) did not go in and that 
would have solve the issue with default store / fileformat...

Thanks.


was (Author: dricard):
[~smilegator] Hey Xiao, I see that you have made many contribution around 
CREATE TABLE issues. Could you shed some light on this issue? We are either 
looking for a fix or for a property to set hive.default.fileformat in Spark 2. 
Looks like your Pull Request to add spark.sql.default.fileformat 
(https://issues.apache.org/jira/browse/SPARK-16825) did not go in...

Thanks.

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
>

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-11 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082441#comment-16082441
 ] 

Dominic Ricard commented on SPARK-21067:


[~smilegator] Hey Xiao, I see that you have made many contribution around 
CREATE TABLE issues. Could you shed some light on this issue? We are either 
looking for a fix or for a property to set hive.default.fileformat in Spark 2. 
Looks like your Pull Request to add spark.sql.default.fileformat 
(https://issues.apache.org/jira/browse/SPARK-16825) did not go in...

Thanks.

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-10 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080503#comment-16080503
 ] 

Dominic Ricard edited comment on SPARK-21067 at 7/10/17 9:04 PM:
-

Interesting findings today.

- Open *{color:#59afe1}Beeline Session 1{color}*
-- Create Table 1 (Success)
- Open *{color:#14892c}Beeline Session 2{color}*
-- Create Table 2 (Success)
- Close *{color:#59afe1}Beeline Session 1{color}*
- Create Table 3 in *{color:#14892c}Beeline Session 2{color}* 
({color:#d04437}FAIL{color})

So, it seems like the problem occurs after the 1st session is closed. What does 
the Thrift server do when a session is closed that could cause this issue?

Looking at the Hive Metastore logs, I notice that the same SQL query (CREATE 
TABLE ...) translate to different actions between the 1st and later sessions:

Session 1:
{noformat}
PERFLOG method=alter_table_with_cascade 
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler
{noformat}

Session 2:
{noformat}
PERFLOG method=drop_table_with_environment_context 
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler
{noformat}


was (Author: dricard):
Interesting findings today.

- Open *{color:#59afe1}Beeline Session 1{color}*
-- Create Table 1 (Success)
- Open *{color:#14892c}Beeline Session 2{color}*
-- Create Table 2 (Success)
- Close *{color:#59afe1}Beeline Session 1{color}*
- Create Table 3 in *{color:#14892c}Beeline Session 2{color}* 
({color:#d04437}FAIL{color})

So, it seems like the problem occurs after the 1st session is closed. What does 
the Thrift server do when a session is closed that could cause this issue?

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-10 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080503#comment-16080503
 ] 

Dominic Ricard edited comment on SPARK-21067 at 7/10/17 3:43 PM:
-

Interesting findings today.

- Open *{color:#59afe1}Beeline Session 1{color}*
-- Create Table 1 (Success)
- Open *{color:#14892c}Beeline Session 2{color}*
-- Create Table 2 (Success)
- Close *{color:#59afe1}Beeline Session 1{color}*
- Create Table 3 in *{color:#14892c}Beeline Session 2{color}* 
({color:#d04437}FAIL{color})

So, it seems like the problem occurs after the 1st session is closed. What does 
the Thrift server do when a session is closed that could cause this issue?


was (Author: dricard):
Interesting findings today.

- Open Session 1
-- Create Table 1 (Success)
- Open Session 2
-- Create Table 2 (Success)
- Close Session 1
- Create Table 3 in Session 2 (FAIL)

So, it seems like the problem occurs after the 1st session is closed. What does 
the Thrift server do when a session is closed that could cause this issue?

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> 

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-10 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080503#comment-16080503
 ] 

Dominic Ricard commented on SPARK-21067:


Interesting findings today.

- Open Session 1
-- Create Table 1 (Success)
- Open Session 2
-- Create Table 2 (Success)
- Close Session 1
- Create Table 3 in Session 2 (FAIL)

So, it seems like the problem occurs after the 1st session is closed. What does 
the Thrift server do when a session is closed that could cause this issue?

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-10 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080399#comment-16080399
 ] 

Dominic Ricard commented on SPARK-21067:


[~q79969786] Using the setting proposed, I get the same error on my 2nd session 
of Beeline:

{noformat}
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://nameservice1/user/hive/warehouse/staging/.hive-staging_hive_2017-07-10_14-17-43_389_2840765500912655498-3/-ext-1/part-0
 to destination 
hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
(state=,code=0)
{noformat}

This is what makes this very frustrating. The 1st session seems to work without 
issues. After closing and restarting Beeline, the same queries gives out the 
error...

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-06-27 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064833#comment-16064833
 ] 

Dominic Ricard edited comment on SPARK-21067 at 6/27/17 1:31 PM:
-

[~q79969786], yes. As stated in the description, ours is set to 
"/tmp/hive-staging/\{user.name\}"


was (Author: dricard):
[~q79969786], yes. As stated in the description, ours is set to 
"/tmp/hive-staging/{user.name}"

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> 

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-06-27 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064833#comment-16064833
 ] 

Dominic Ricard commented on SPARK-21067:


[~q79969786], yes. As stated in the description, ours is set to 
"/tmp/hive-staging/{user.name}"

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
> 

[jira] [Updated] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-06-14 Thread Dominic Ricard (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Ricard updated SPARK-21067:
---
Description: 
After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
would fail, sometimes...

Most of the time, the CTAS would work only once, after starting the thrift 
server. After that, dropping the table and re-issuing the same CTAS would fail 
with the following message (Sometime, it fails right away, sometime it work for 
a long period of time):

{noformat}
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
 to destination 
hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
(state=,code=0)
{noformat}

We have already found the following Jira 
(https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
{{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
ours set to "/tmp/hive-staging/\{user.name\}"

Same issue with INSERT statements:
{noformat}
CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
dricard.test SELECT 1;
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
 to destination 
hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
(state=,code=0)
{noformat}

This worked fine in 1.6.2, which we currently run in our Production Environment 
but since 2.0+, we haven't been able to CREATE TABLE consistently on the 
cluster.

SQL to reproduce issue:
{noformat}
DROP SCHEMA IF EXISTS dricard CASCADE; 
CREATE SCHEMA dricard; 
CREATE TABLE dricard.test (col1 int); 
INSERT INTO TABLE dricard.test SELECT 1; 
SELECT * from dricard.test; 
DROP TABLE dricard.test; 
CREATE TABLE dricard.test AS select 1 as `col1`;
SELECT * from dricard.test
{noformat}

Thrift server usually fails at INSERT...

Tried the same procedure in a spark context using spark.sql() and didn't 
encounter the same issue.

Full stack Trace:
{noformat}
17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
executing query, currentState RUNNING,
org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
 to desti
nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
at org.apache.spark.sql.Dataset.(Dataset.scala:185)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:231)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
at java.security.AccessController.doPrivileged(Native Method)
at 

[jira] [Updated] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-06-14 Thread Dominic Ricard (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Ricard updated SPARK-21067:
---
Description: 
After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
would fail, sometimes...

Most of the time, the CTAS would work only once, after starting the thrift 
server. After that, dropping the table and re-issuing the same CTAS would fail 
with the following message (Sometime, it fails right away, sometime it work for 
a long period of time):

{noformat}
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
 to destination 
hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
(state=,code=0)
{noformat}

We have already found the following Jira 
(https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
{{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
ours set to "/tmp/hive-staging/\{user.name\}"

Same issue with INSERT statements:
{noformat}
CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
dricard.test SELECT 1;
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
 to destination 
hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
(state=,code=0)
{noformat}

This worked fine in 1.6.2, which we currently run in our Production Environment 
but since 2.0+, we haven't been able to CREATE TABLE consistently on the 
cluster.

SQL to reproduce issue:
{noformat}
DROP SCHEMA IF EXISTS dricard CASCADE; 
CREATE SCHEMA dricard; 
CREATE TABLE dricard.test (col1 int); 
INSERT INTO TABLE dricard.test SELECT 1; 
SELECT * from dricard.test; 
DROP TABLE dricard.test; 
CREATE TABLE dricard.test AS select 1 as `col1`;
SELECT * from dricard.test
{noformat}

Thrift server usually fails at INSERT...

Tried the same procedure in a spark context using spark.sql() and didn't 
encounter the same issue.

  was:
After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
would fail, sometimes...

Most of the time, the CTAS would work only once, after starting the thrift 
server. After that, dropping the table and re-issuing the same CTAS would fail 
with the following message (Sometime, it fails right away, sometime it work for 
a long period of time):

{noformat}
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
 to destination 
hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
(state=,code=0)
{noformat}

We have already found the following Jira 
(https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
{{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
ours set to "/tmp/hive-staging/\{user.name\}"

Same issue with INSERT statements:
{noformat}
CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
dricard.test SELECT 1;
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
 to destination 
hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
(state=,code=0)
{noformat}

This worked fine in 1.6.2, which we currently run in our Production Environment 
but since 2.0+, we haven't been able to CREATE TABLE consistently on the 
cluster.



> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> 

[jira] [Updated] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-06-13 Thread Dominic Ricard (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Ricard updated SPARK-21067:
---
Description: 
After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
would fail, sometimes...

Most of the time, the CTAS would work only once, after starting the thrift 
server. After that, dropping the table and re-issuing the same CTAS would fail 
with the following message (Sometime, it fails right away, sometime it work for 
a long period of time):

{noformat}
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
 to destination 
hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
(state=,code=0)
{noformat}

We have already found the following Jira 
(https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
{{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
ours set to "/tmp/hive-staging/\{user.name\}"

Same issue with INSERT statements:
{noformat}
CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
dricard.test SELECT 1;
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
 to destination 
hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
(state=,code=0)
{noformat}

This worked fine in 1.6.2, which we currently run in our Production Environment 
but since 2.0+, we haven't been able to CREATE TABLE consistently on the 
cluster.


  was:
After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
would fail, sometimes...

Most of the time, the CTAS would work only once after starting the thrift 
server. After that, dropping the table and re-issuing the same CTAS would fail 
with the following message (Sometime, it fails right away, sometime it work for 
a long period of time):

{noformat}
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
 to destination 
hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
(state=,code=0)
{noformat}

We have already found the following Jira 
(https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
{{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
ours set to "/tmp/hive-staging/\{user.name\}"

Same issue with INSERT statements:
{noformat}
CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
dricard.test SELECT 1;
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
 to destination 
hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
(state=,code=0)
{noformat}

This worked fine in 1.6.2, which we currently run in our Production Environment 
but since 2.0+, we haven't been able to CREATE TABLE consistently on the 
cluster.



> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for 

[jira] [Created] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-06-12 Thread Dominic Ricard (JIRA)
Dominic Ricard created SPARK-21067:
--

 Summary: Thrift Server - CTAS fail with Unable to move source
 Key: SPARK-21067
 URL: https://issues.apache.org/jira/browse/SPARK-21067
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.1
 Environment: Yarn
Hive MetaStore
HDFS (HA)
Reporter: Dominic Ricard


After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
would fail, sometimes...

Most of the time, the CTAS would work only once after starting the thrift 
server. After that, dropping the table and re-issuing the same CTAS would fail 
with the following message (Sometime, it fails right away, sometime it work for 
a long period of time):

{noformat}
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
 to destination 
hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
(state=,code=0)
{noformat}

We have already found the following Jira 
(https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
{{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
ours set to "/tmp/hive-staging/\{user.name\}"

Same issue with INSERT statements:
{noformat}
CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
dricard.test SELECT 1;
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
 to destination 
hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
(state=,code=0)
{noformat}

This worked fine in 1.6.2, which we currently run in our Production Environment 
but since 2.0+, we haven't been able to CREATE TABLE consistently on the 
cluster.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20581) Using AVG or SUM on a INT/BIGINT column with fraction operator will yield BIGINT instead of DOUBLE

2017-05-03 Thread Dominic Ricard (JIRA)
Dominic Ricard created SPARK-20581:
--

 Summary: Using AVG or SUM on a INT/BIGINT column with fraction 
operator will yield BIGINT instead of DOUBLE
 Key: SPARK-20581
 URL: https://issues.apache.org/jira/browse/SPARK-20581
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.2
Reporter: Dominic Ricard


We stumbled on this multiple times and every time we are baffled by the 
behavior of AVG and SUM.

Given the following SQL (Executed through Thrift):
{noformat}
SELECT SUM(col/2) FROM
(SELECT 3 as `col`) t
{noformat}

The result will be "1", when the expected and accurate result is 1.5

Here's the explain plan:
{noformat}
== Physical Plan == 
TungstenAggregate(key=[], functions=[(sum(cast((cast(col#1519342 as double) / 
2.0) as bigint)),mode=Final,isDistinct=false)], output=[_c0#1519344L])
+- TungstenExchange SinglePartition, None   
   +- TungstenAggregate(key=[], functions=[(sum(cast((cast(col#1519342 as 
double) / 2.0) as bigint)),mode=Partial,isDistinct=false)], 
output=[sum#1519347L])
  +- Project [3 AS col#1519342] 
 +- Scan OneRowRelation[]   
{noformat}

Why the extra cast to BIGINT?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17495) Hive hash implementation

2017-04-04 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955714#comment-15955714
 ] 

Dominic Ricard commented on SPARK-17495:


[~tejasp] We use murmur3 hash internally in some of our data pipelines, 
non-SQL, and I would like to know if the goal of this task to expose a new UDF 
(ex: murmur3()), similar to md5() and hash()?

I believe that would be the best approach to preserve compatibility with 
previously generated data and queries using hash().



> Hive hash implementation
> 
>
> Key: SPARK-17495
> URL: https://issues.apache.org/jira/browse/SPARK-17495
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Tejas Patil
>Assignee: Tejas Patil
>Priority: Minor
> Fix For: 2.2.0
>
>
> Spark internally uses Murmur3Hash for partitioning. This is different from 
> the one used by Hive. For queries which use bucketing this leads to different 
> results if one tries the same query on both engines. For us, we want users to 
> have backward compatibility to that one can switch parts of applications 
> across the engines without observing regressions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12818) Implement Bloom filter and count-min sketch in DataFrames

2017-03-30 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949828#comment-15949828
 ] 

Dominic Ricard commented on SPARK-12818:


[~lian cheng] Is there plan to extend support to SQL? Being able to save 
sketches and sum them to get count distincts would be awesome.

> Implement Bloom filter and count-min sketch in DataFrames
> -
>
> Key: SPARK-12818
> URL: https://issues.apache.org/jira/browse/SPARK-12818
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Cheng Lian
> Fix For: 2.0.0
>
> Attachments: BloomFilterandCount-MinSketchinSpark2.0.pdf
>
>
> This ticket tracks implementing Bloom filter and count-min sketch support in 
> DataFrames. Please see the attached design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12835) StackOverflowError when aggregating over column from window function

2017-02-07 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856229#comment-15856229
 ] 

Dominic Ricard commented on SPARK-12835:


I ran into a the same issue today when trying to do:

_SQL_
{noformat}
select
  sum(row_number() OVER (partition by column1 order by column2))
from
  (select 1 as `column1`, 1 as `column2`) t
{noformat}

The actual use case is a bit more evolved than this but this is the minimum 
required to reproduce the issue.

The workaround I found was to calculate the row_number in a subselect then do 
the sum in the select:
{noformat}
select sum (tt.row_num) from 
(select
  row_number() OVER (partition by column1 order by column2) as `row_num`
from
  (select 1 as `column1`, 1 as `column2`) t) tt
{noformat}


> StackOverflowError when aggregating over column from window function
> 
>
> Key: SPARK-12835
> URL: https://issues.apache.org/jira/browse/SPARK-12835
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.6.0
>Reporter: Kalle Jepsen
>
> I am encountering a StackoverflowError with a very long traceback, when I try 
> to directly aggregate on a column created by a window function.
> E.g. I am trying to determine the average timespan between dates in a 
> Dataframe column by using a window-function:
> {code}
> from pyspark import SparkContext
> from pyspark.sql import HiveContext, Window, functions
> from datetime import datetime
> sc = SparkContext()
> sq = HiveContext(sc)
> data = [
> [datetime(2014,1,1)],
> [datetime(2014,2,1)],
> [datetime(2014,3,1)],
> [datetime(2014,3,6)],
> [datetime(2014,8,23)],
> [datetime(2014,10,1)],
> ]
> df = sq.createDataFrame(data, schema=['ts'])
> ts = functions.col('ts')
>
> w = Window.orderBy(ts)
> diff = functions.datediff(
> ts,
> functions.lag(ts, count=1).over(w)
> )
> avg_diff = functions.avg(diff)
> {code}
> While {{df.select(diff.alias('diff')).show()}} correctly renders as
> {noformat}
> ++
> |diff|
> ++
> |null|
> |  31|
> |  28|
> |   5|
> | 170|
> |  39|
> ++
> {noformat}
> doing {code}
> df.select(avg_diff).show()
> {code} throws a {{java.lang.StackOverflowError}}.
> When I say
> {code}
> df2 = df.select(diff.alias('diff'))
> df2.select(functions.avg('diff'))
> {code}
> however, there's no error.
> Am I wrong to assume that the above should work?
> I've already described the same in [this question on 
> stackoverflow.com|http://stackoverflow.com/questions/34793999/averaging-over-window-function-leads-to-stackoverflowerror].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16441) Spark application hang when dynamic allocation is enabled

2017-02-05 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853300#comment-15853300
 ] 

Dominic Ricard commented on SPARK-16441:


Could this be fixed by either SPARK-16533 or SPARK-14209?

> Spark application hang when dynamic allocation is enabled
> -
>
> Key: SPARK-16441
> URL: https://issues.apache.org/jira/browse/SPARK-16441
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2, 2.0.0
> Environment: hadoop 2.7.2  spark1.6.2
>Reporter: cen yuhai
>
> spark application are waiting for rpc response all the time and spark 
> listener are blocked by dynamic allocation. Executors can not connect to 
> driver and lost.
> "spark-dynamic-executor-allocation" #239 daemon prio=5 os_prio=0 
> tid=0x7fa304438000 nid=0xcec6 waiting on condition [0x7fa2b81e4000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00070fdb94f8> (a 
> scala.concurrent.impl.Promise$CompletionLatch)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
>   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>   at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>   at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>   at scala.concurrent.Await$.result(package.scala:107)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.doRequestTotalExecutors(YarnSchedulerBackend.scala:59)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:436)
>   - locked <0x828a8960> (a 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend)
>   at 
> org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1438)
>   at 
> org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:359)
>   at 
> org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:310)
>   - locked <0x880e6308> (a 
> org.apache.spark.ExecutorAllocationManager)
>   at 
> org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:264)
>   - locked <0x880e6308> (a 
> org.apache.spark.ExecutorAllocationManager)
>   at 
> org.apache.spark.ExecutorAllocationManager$$anon$2.run(ExecutorAllocationManager.scala:223)
> "SparkListenerBus" #161 daemon prio=5 os_prio=0 tid=0x7fa3053be000 
> nid=0xcec9 waiting for monitor entry [0x7fa2b3dfc000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:618)
>   - waiting to lock <0x880e6308> (a 
> org.apache.spark.ExecutorAllocationManager)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
>   at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>   at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
>   at 
> org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>   at 
> 

[jira] [Resolved] (SPARK-14666) Using DISTINCT on a UDF (like CONCAT) is not supported

2016-08-08 Thread Dominic Ricard (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Ricard resolved SPARK-14666.

   Resolution: Fixed
Fix Version/s: 2.0.0

> Using DISTINCT on a UDF (like CONCAT) is not supported
> --
>
> Key: SPARK-14666
> URL: https://issues.apache.org/jira/browse/SPARK-14666
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Dominic Ricard
>Priority: Minor
> Fix For: 2.0.0
>
>
> The following query fails with:
> {noformat}
> Java::JavaSql::SQLException: org.apache.spark.sql.AnalysisException: cannot 
> resolve 'column_1' given input columns: [_c0]; line # pos ##
> {noformat}
> Query:
> {noformat}
> select
>   distinct concat(column_1, ' : ', column_2)
> from
>   table
> order by
>   concat(column_1, ' : ', column_2);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14666) Using DISTINCT on a UDF (like CONCAT) is not supported

2016-08-08 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412243#comment-15412243
 ] 

Dominic Ricard commented on SPARK-14666:


It does indeed work in Spark 2.0. Thanks.

> Using DISTINCT on a UDF (like CONCAT) is not supported
> --
>
> Key: SPARK-14666
> URL: https://issues.apache.org/jira/browse/SPARK-14666
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Dominic Ricard
>Priority: Minor
>
> The following query fails with:
> {noformat}
> Java::JavaSql::SQLException: org.apache.spark.sql.AnalysisException: cannot 
> resolve 'column_1' given input columns: [_c0]; line # pos ##
> {noformat}
> Query:
> {noformat}
> select
>   distinct concat(column_1, ' : ', column_2)
> from
>   table
> order by
>   concat(column_1, ' : ', column_2);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14666) Using DISTINCT on a UDF (like CONCAT) is not supported

2016-04-15 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243455#comment-15243455
 ] 

Dominic Ricard edited comment on SPARK-14666 at 4/15/16 7:33 PM:
-

It appears that the order by clause is expecting the column name. The proper 
query looks like this:

{noformat}
select
  distinct concat(column_1, ' : ', column_2) as `column1`
from
  table
order by
  `column1`;
{noformat}

Why isn't {{concat(column_1, ' : ', column2)}} converted to the {{_c0}} alias 
in {{order by}} when used in conjunction with {{distinct}}?


was (Author: dricard):
It appears that the order by clause is expecting the column name. The proper 
query looks like this:

{noformat}
select
  distinct concat(column_1, ' : ', column_2) as `column1`
from
  table
order by
  `column1`;
{noformat}

Why isn't concat(column_1, ' : ', column2) converted to the _c0 alias in order 
by when used in conjunction with distinct?

> Using DISTINCT on a UDF (like CONCAT) is not supported
> --
>
> Key: SPARK-14666
> URL: https://issues.apache.org/jira/browse/SPARK-14666
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Dominic Ricard
>Priority: Minor
>
> The following query fails with:
> {noformat}
> Java::JavaSql::SQLException: org.apache.spark.sql.AnalysisException: cannot 
> resolve 'column_1' given input columns: [_c0]; line # pos ##
> {noformat}
> Query:
> {noformat}
> select
>   distinct concat(column_1, ' : ', column_2)
> from
>   table
> order by
>   concat(column_1, ' : ', column_2);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14666) Using DISTINCT on a UDF (like CONCAT) is not supported

2016-04-15 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243455#comment-15243455
 ] 

Dominic Ricard edited comment on SPARK-14666 at 4/15/16 7:33 PM:
-

It appears that the order by clause is expecting the column name. The proper 
query looks like this:

{noformat}
select
  distinct concat(column_1, ' : ', column_2) as `column1`
from
  table
order by
  `column1`;
{noformat}

Why isn't concat(column_1, ' : ', column2) converted to the _c0 alias in order 
by when used in conjunction with distinct?


was (Author: dricard):
It appears that the order by clause is expecting the column name. The proper 
query looks like this:

{noformat}
select
  distinct concat(column_1, ' : ', column_2) as `column1`
from
  table
order by
  `column1`;
{noformat}

> Using DISTINCT on a UDF (like CONCAT) is not supported
> --
>
> Key: SPARK-14666
> URL: https://issues.apache.org/jira/browse/SPARK-14666
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Dominic Ricard
>Priority: Minor
>
> The following query fails with:
> {noformat}
> Java::JavaSql::SQLException: org.apache.spark.sql.AnalysisException: cannot 
> resolve 'column_1' given input columns: [_c0]; line # pos ##
> {noformat}
> Query:
> {noformat}
> select
>   distinct concat(column_1, ' : ', column_2)
> from
>   table
> order by
>   concat(column_1, ' : ', column_2);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14666) Using DISTINCT on a UDF (like CONCAT) is not supported

2016-04-15 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243455#comment-15243455
 ] 

Dominic Ricard commented on SPARK-14666:


It appears that the order by clause is expecting the column name. The proper 
query looks like this:

{noformat}
select
  distinct concat(column_1, ' : ', column_2) as `column1`
from
  table
order by
  `column1`;
{noformat}

> Using DISTINCT on a UDF (like CONCAT) is not supported
> --
>
> Key: SPARK-14666
> URL: https://issues.apache.org/jira/browse/SPARK-14666
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Dominic Ricard
>Priority: Minor
>
> The following query fails with:
> {noformat}
> Java::JavaSql::SQLException: org.apache.spark.sql.AnalysisException: cannot 
> resolve 'column_1' given input columns: [_c0]; line # pos ##
> {noformat}
> Query:
> {noformat}
> select
>   distinct concat(column_1, ' : ', column_2)
> from
>   table
> order by
>   concat(column_1, ' : ', column_2);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14666) Using DISTINCT on a UDF (like CONCAT) is not supported

2016-04-15 Thread Dominic Ricard (JIRA)
Dominic Ricard created SPARK-14666:
--

 Summary: Using DISTINCT on a UDF (like CONCAT) is not supported
 Key: SPARK-14666
 URL: https://issues.apache.org/jira/browse/SPARK-14666
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.1
Reporter: Dominic Ricard
Priority: Minor


The following query fails with:
{noformat}
Java::JavaSql::SQLException: org.apache.spark.sql.AnalysisException: cannot 
resolve 'column_1' given input columns: [_c0]; line # pos ##
{noformat}

Query:
{noformat}
select
  distinct concat(column_1, ' : ', column_2)
from
  table
order by
  concat(column_1, ' : ', column_2);
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6964) Support Cancellation in the Thrift Server

2015-11-20 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018079#comment-15018079
 ] 

Dominic Ricard commented on SPARK-6964:
---

I don't know if my expectations are wrong but this does not seem to be fixed. 
Since 1.5.0 (1.5.1, 1.5.2), I have re-tested this in our spark cluster and 
spark still continues execution of a query even though the client has gone away 
(beeline CTRL-C of a long running query).

Am I reading this wrong?

Thanks.



> Support Cancellation in the Thrift Server
> -
>
> Key: SPARK-6964
> URL: https://issues.apache.org/jira/browse/SPARK-6964
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Dong Wang
>Priority: Critical
> Fix For: 1.5.0
>
>
> There is already a hook in {{ExecuteStatementOperation}}, we just need to 
> connect it to the job group cancellation support we already have and make 
> sure the various drivers support it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10865) [Spark SQL] [UDF] the ceil/ceiling function got wrong return value type

2015-11-11 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000404#comment-15000404
 ] 

Dominic Ricard commented on SPARK-10865:


Is it possible for this fix to be included in the 1.5.2 release? This is 
causing major issues with some BI Tool that apply transformation and expect an 
INT from Ceil.

Thanks

> [Spark SQL] [UDF] the ceil/ceiling function got wrong return value type
> ---
>
> Key: SPARK-10865
> URL: https://issues.apache.org/jira/browse/SPARK-10865
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yi Zhou
>Assignee: Cheng Hao
> Fix For: 1.6.0
>
>
> As per ceil/ceiling definition,it should get BIGINT return value
> -ceil(DOUBLE a), ceiling(DOUBLE a)
> -Returns the minimum BIGINT value that is equal to or greater than a.
> But in current Spark implementation, it got wrong value type.
> e.g., 
> select ceil(2642.12) from udf_test_web_sales limit 1;
> 2643.0
> In hive implementation, it got return value type like below:
> hive> select ceil(2642.12) from udf_test_web_sales limit 1;
> OK
> 2643



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11103) Filter applied on Merged Parquet shema with new column fail with (java.lang.IllegalArgumentException: Column [column_name] was not found in schema!)

2015-10-15 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958841#comment-14958841
 ] 

Dominic Ricard commented on SPARK-11103:


Setting the property {{spark.sql.parquet.filterPushdown}} to {{false}} fixed 
the issue.

Knowing all this, does this indicate a bug in the filter2 implementation of the 
Parquet library? Maybe this issue should be moved to the Parquet project for 
someone to look at...



> Filter applied on Merged Parquet shema with new column fail with 
> (java.lang.IllegalArgumentException: Column [column_name] was not found in 
> schema!)
> 
>
> Key: SPARK-11103
> URL: https://issues.apache.org/jira/browse/SPARK-11103
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Dominic Ricard
>
> When evolving a schema in parquet files, spark properly expose all columns 
> found in the different parquet files but when trying to query the data, it is 
> not possible to apply a filter on a column that is not present in all files.
> To reproduce:
> *SQL:*
> {noformat}
> create table `table1` STORED AS PARQUET LOCATION 
> 'hdfs://:/path/to/table/id=1/' as select 1 as `col1`;
> create table `table2` STORED AS PARQUET LOCATION 
> 'hdfs://:/path/to/table/id=2/' as select 1 as `col1`, 2 as 
> `col2`;
> create table `table3` USING org.apache.spark.sql.parquet OPTIONS (path 
> "hdfs://:/path/to/table");
> select col1 from `table3` where col2 = 2;
> {noformat}
> The last select will output the following Stack Trace:
> {noformat}
> An error occurred when executing the SQL command:
> select col1 from `table3` where col2 = 2
> [Simba][HiveJDBCDriver](500051) ERROR processing query/statement. Error Code: 
> 0, SQL state: TStatus(statusCode:ERROR_STATUS, 
> infoMessages:[*org.apache.hive.service.cli.HiveSQLException:org.apache.spark.SparkException:
>  Job aborted due to stage failure: Task 0 in stage 7212.0 failed 4 times, 
> most recent failure: Lost task 0.3 in stage 7212.0 (TID 138449, 
> 208.92.52.88): java.lang.IllegalArgumentException: Column [col2] was not 
> found in schema!
>   at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:190)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:178)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:160)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:94)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:59)
>   at 
> org.apache.parquet.filter2.predicate.Operators$Eq.accept(Operators.java:180)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:64)
>   at 
> org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:59)
>   at 
> org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:40)
>   at 
> org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:126)
>   at 
> org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:46)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:160)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
>   at 
> org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.(SqlNewHadoopRDD.scala:155)
>   at 
> org.apache.spark.rdd.SqlNewHadoopRDD.compute(SqlNewHadoopRDD.scala:120)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>   at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>   at 
> 

[jira] [Comment Edited] (SPARK-11103) Filter applied on Merged Parquet shema with new column fail with (java.lang.IllegalArgumentException: Column [column_name] was not found in schema!)

2015-10-14 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957317#comment-14957317
 ] 

Dominic Ricard edited comment on SPARK-11103 at 10/14/15 5:24 PM:
--

Strangely enough, this works perfectly:
{noformat}
select 
  col1
from
  `table3`
where
  (CASE WHEN col2 = 2 THEN true ELSE false END) = true;
{noformat}

And returns only the row that contains col2 = 2


was (Author: dricard):
Strangely enough, this works perfectly:
{noformat}
select col1 from `table3` where CASE WHEN col2 = 2 THEN true ELSE false END = 
true;
{noformat}

And returns only the row that contains col2 = 2

> Filter applied on Merged Parquet shema with new column fail with 
> (java.lang.IllegalArgumentException: Column [column_name] was not found in 
> schema!)
> 
>
> Key: SPARK-11103
> URL: https://issues.apache.org/jira/browse/SPARK-11103
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Dominic Ricard
>
> When evolving a schema in parquet files, spark properly expose all columns 
> found in the different parquet files but when trying to query the data, it is 
> not possible to apply a filter on a column that is not present in all files.
> To reproduce:
> *SQL:*
> {noformat}
> create table `table1` STORED AS PARQUET LOCATION 
> 'hdfs://:/path/to/table/id=1/' as select 1 as `col1`;
> create table `table2` STORED AS PARQUET LOCATION 
> 'hdfs://:/path/to/table/id=2/' as select 1 as `col1`, 2 as 
> `col2`;
> create table `table3` USING org.apache.spark.sql.parquet OPTIONS (path 
> "hdfs://:/path/to/table");
> select col1 from `table3` where col2 = 2;
> {noformat}
> The last select will output the following Stack Trace:
> {noformat}
> An error occurred when executing the SQL command:
> select col1 from `table3` where col2 = 2
> [Simba][HiveJDBCDriver](500051) ERROR processing query/statement. Error Code: 
> 0, SQL state: TStatus(statusCode:ERROR_STATUS, 
> infoMessages:[*org.apache.hive.service.cli.HiveSQLException:org.apache.spark.SparkException:
>  Job aborted due to stage failure: Task 0 in stage 7212.0 failed 4 times, 
> most recent failure: Lost task 0.3 in stage 7212.0 (TID 138449, 
> 208.92.52.88): java.lang.IllegalArgumentException: Column [col2] was not 
> found in schema!
>   at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:190)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:178)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:160)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:94)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:59)
>   at 
> org.apache.parquet.filter2.predicate.Operators$Eq.accept(Operators.java:180)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:64)
>   at 
> org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:59)
>   at 
> org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:40)
>   at 
> org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:126)
>   at 
> org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:46)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:160)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
>   at 
> org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.(SqlNewHadoopRDD.scala:155)
>   at 
> org.apache.spark.rdd.SqlNewHadoopRDD.compute(SqlNewHadoopRDD.scala:120)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>   at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>

[jira] [Commented] (SPARK-11103) Filter applied on Merged Parquet shema with new column fail with (java.lang.IllegalArgumentException: Column [column_name] was not found in schema!)

2015-10-14 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957317#comment-14957317
 ] 

Dominic Ricard commented on SPARK-11103:


Strangely enough, this works perfectly:
{noformat}
select col1 from `table3` where CASE WHEN col2 = 2 THEN true ELSE false END = 
true;
{noformat}

And returns only the row that contains col2 = 2

> Filter applied on Merged Parquet shema with new column fail with 
> (java.lang.IllegalArgumentException: Column [column_name] was not found in 
> schema!)
> 
>
> Key: SPARK-11103
> URL: https://issues.apache.org/jira/browse/SPARK-11103
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Dominic Ricard
>
> When evolving a schema in parquet files, spark properly expose all columns 
> found in the different parquet files but when trying to query the data, it is 
> not possible to apply a filter on a column that is not present in all files.
> To reproduce:
> *SQL:*
> {noformat}
> create table `table1` STORED AS PARQUET LOCATION 
> 'hdfs://:/path/to/table/id=1/' as select 1 as `col1`;
> create table `table2` STORED AS PARQUET LOCATION 
> 'hdfs://:/path/to/table/id=2/' as select 1 as `col1`, 2 as 
> `col2`;
> create table `table3` USING org.apache.spark.sql.parquet OPTIONS (path 
> "hdfs://:/path/to/table");
> select col1 from `table3` where col2 = 2;
> {noformat}
> The last select will output the following Stack Trace:
> {noformat}
> An error occurred when executing the SQL command:
> select col1 from `table3` where col2 = 2
> [Simba][HiveJDBCDriver](500051) ERROR processing query/statement. Error Code: 
> 0, SQL state: TStatus(statusCode:ERROR_STATUS, 
> infoMessages:[*org.apache.hive.service.cli.HiveSQLException:org.apache.spark.SparkException:
>  Job aborted due to stage failure: Task 0 in stage 7212.0 failed 4 times, 
> most recent failure: Lost task 0.3 in stage 7212.0 (TID 138449, 
> 208.92.52.88): java.lang.IllegalArgumentException: Column [col2] was not 
> found in schema!
>   at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:190)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:178)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:160)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:94)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:59)
>   at 
> org.apache.parquet.filter2.predicate.Operators$Eq.accept(Operators.java:180)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:64)
>   at 
> org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:59)
>   at 
> org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:40)
>   at 
> org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:126)
>   at 
> org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:46)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:160)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
>   at 
> org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.(SqlNewHadoopRDD.scala:155)
>   at 
> org.apache.spark.rdd.SqlNewHadoopRDD.compute(SqlNewHadoopRDD.scala:120)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>   at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at 

[jira] [Created] (SPARK-11103) Filter applied on Merged Parquet shema with new column fail with (java.lang.IllegalArgumentException: Column [column_name] was not found in schema!)

2015-10-14 Thread Dominic Ricard (JIRA)
Dominic Ricard created SPARK-11103:
--

 Summary: Filter applied on Merged Parquet shema with new column 
fail with (java.lang.IllegalArgumentException: Column [column_name] was not 
found in schema!)
 Key: SPARK-11103
 URL: https://issues.apache.org/jira/browse/SPARK-11103
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.1
Reporter: Dominic Ricard


When evolving a schema in parquet files, spark properly expose all columns 
found in the different parquet files but when trying to query the data, it is 
not possible to apply a filter on a column that is not present in all files.

To reproduce:
*SQL:*
{noformat}
create table `table1` STORED AS PARQUET LOCATION 
'hdfs://:/path/to/table/id=1/' as select 1 as `col1`;
create table `table2` STORED AS PARQUET LOCATION 
'hdfs://:/path/to/table/id=2/' as select 1 as `col1`, 2 as `col2`;
create table `table3` USING org.apache.spark.sql.parquet OPTIONS (path 
"hdfs://:/path/to/table");
select col1 from `table3` where col2 = 2;
{noformat}

The last select will output the following Stack Trace:
{noformat}
An error occurred when executing the SQL command:
select col1 from `table3` where col2 = 2

[Simba][HiveJDBCDriver](500051) ERROR processing query/statement. Error Code: 
0, SQL state: TStatus(statusCode:ERROR_STATUS, 
infoMessages:[*org.apache.hive.service.cli.HiveSQLException:org.apache.spark.SparkException:
 Job aborted due to stage failure: Task 0 in stage 7212.0 failed 4 times, most 
recent failure: Lost task 0.3 in stage 7212.0 (TID 138449, 208.92.52.88): 
java.lang.IllegalArgumentException: Column [col2] was not found in schema!
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.getColumnDescriptor(SchemaCompatibilityValidator.java:190)
at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:178)
at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:160)
at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:94)
at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:59)
at 
org.apache.parquet.filter2.predicate.Operators$Eq.accept(Operators.java:180)
at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:64)
at 
org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:59)
at 
org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:40)
at 
org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:126)
at 
org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:46)
at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:160)
at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
at 
org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.(SqlNewHadoopRDD.scala:155)
at 
org.apache.spark.rdd.SqlNewHadoopRDD.compute(SqlNewHadoopRDD.scala:120)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace::26:25,