[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-03-11 Thread Mori[A]rty (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789589#comment-16789589
 ] 

Mori[A]rty edited comment on SPARK-21067 at 3/11/19 2:04 PM:
-

Spark 2.4.0 still has the same problem. And I want to elaborate a bit more 
about how this bug happens.
  
 To reproduce this bug, we have to set hive.server2.enable.doAs to true in 
hive-site.xml. It causes SparkThriftServer to execute 

SparkExecuteStatementOperation with 
org.apache.hive.service.cli.session.HiveSessionImplwithUGI#sessionUgi.

When the first time a CTAS statement is executed, 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs is initialized using ugi 
from HiveSessionImplwithUGI#sessionUgi. 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs will be closed when 
session closes.

Since HiveClientImpl is shared among SparkThriftServer and 
HiveClientImpl#state#hdfsEncryptionShim won't be initialized again, subsequent  
usage of HiveClientImpl#state#hdfsEncryptionShim will throw 
"java.io.IOException: Filesystem closed".

Here is a simpler way to reproduce:

1. Open Session1 with beeline and execute SQL: 
{code:java}
CREATE TABLE tbl (i int);
INSERT INTO tbl SELECT 1;{code}
2. Close Session1

3. Open Session2 with beeline and execute SQL: 
{code:java}
INSERT INTO tbl SELECT 1;
{code}
4. Get exception in Session2:
{code:java}
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://xxx/user/hive/warehouse/tbl/.hive-staging_hive_2019-03-11_20-22-53_627_4903214085803350096-2/-ext-1/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000
 to destination 
hdfs://xxx/user/hive/warehouse/tbl/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000;
 (state=,code=0){code}
By the way, [~xinxianyin]'s patch works for me and I think it is the simplest 
to solve this issue.


was (Author: moriarty279):
Spark 2.4.0 still has the same problem. And I want to elaborate a bit more 
about how this bug happened.
  
 To reproduce this bug, we have to set hive.server2.enable.doAs to true in 
hive-site.xml. It causes SparkThriftServer to execute 

SparkExecuteStatementOperation with 
org.apache.hive.service.cli.session.HiveSessionImplwithUGI#sessionUgi.

When the first time a CTAS statement is executed, 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs is initialized using ugi 
from HiveSessionImplwithUGI#sessionUgi. 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs will be closed when 
session closes.

Since HiveClientImpl is shared among SparkThriftServer and 
HiveClientImpl#state#hdfsEncryptionShim won't be initialized again, subsequent  
usage of HiveClientImpl#state#hdfsEncryptionShim will throw 
"java.io.IOException: Filesystem closed".

Here are the most simple steps to reproduce:

1. Open Session1 with beeline and execute SQL: 
{code:java}
CREATE TABLE tbl (i int);
INSERT INTO tbl SELECT 1;{code}
2. Close Session1

3. Open Session2 with beeline and execute SQL: 
{code:java}
INSERT INTO tbl SELECT 1;
{code}
4. Get exception in Session2:
{code:java}
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://xxx/user/hive/warehouse/tbl/.hive-staging_hive_2019-03-11_20-22-53_627_4903214085803350096-2/-ext-1/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000
 to destination 
hdfs://xxx/user/hive/warehouse/tbl/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000;
 (state=,code=0)
{code}

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-03-11 Thread Mori[A]rty (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789589#comment-16789589
 ] 

Mori[A]rty edited comment on SPARK-21067 at 3/11/19 1:57 PM:
-

Spark 2.4.0 still has the same problem. And I want to elaborate a bit more 
about how this bug happened.
  
 To reproduce this bug, we have to set hive.server2.enable.doAs to true in 
hive-site.xml. It causes SparkThriftServer to execute 

SparkExecuteStatementOperation with 
org.apache.hive.service.cli.session.HiveSessionImplwithUGI#sessionUgi.

When the first time a CTAS statement is executed, 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs is initialized using ugi 
from HiveSessionImplwithUGI#sessionUgi. 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs will be closed when 
session closes.

Since HiveClientImpl is shared among SparkThriftServer and 
HiveClientImpl#state#hdfsEncryptionShim won't be initialized again, subsequent  
usage of HiveClientImpl#state#hdfsEncryptionShim will throw 
"java.io.IOException: Filesystem closed".

Here are the most simple steps to reproduce:

1. Open Session1 with beeline and execute SQL: 
{code:java}
CREATE TABLE tbl (i int);
INSERT INTO tbl SELECT 1;{code}
2. Close Session1

3. Open Session2 with beeline and execute SQL: 
{code:java}
INSERT INTO tbl SELECT 1;
{code}
4. Get exception in Session2:
{code:java}
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://xxx/user/hive/warehouse/tbl/.hive-staging_hive_2019-03-11_20-22-53_627_4903214085803350096-2/-ext-1/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000
 to destination 
hdfs://xxx/user/hive/warehouse/tbl/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000;
 (state=,code=0)
{code}


was (Author: moriarty279):
Spark 2.4.0 still has the same problem. And I want to elaborate a bit more 
about how this bug happened.
 
To reproduce this bug, we have to set hive.server2.enable.doAs to true in 
hive-site.xml. It causes SparkThriftServer to execute 

SparkExecuteStatementOperation with 
org.apache.hive.service.cli.session.HiveSessionImplwithUGI#sessionUgi.

When the first time a CTAS statement is executed, 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs is initialized using ugi 
from HiveSessionImplwithUGI#sessionUgi. 
HiveClientImpl#state#hdfsEncryptionShim#hdfsAdmin#dfs will be closed when 
session closes.

Since HiveClientImpl is shared among SparkThriftServer and 
HiveClientImpl#state#hdfsEncryptionShim won't be initialized again. Subsequent  
usage of HiveClientImpl#state#hdfsEncryptionShim will throw 
"java.io.IOException: Filesystem closed".

Here are the most simple steps to reproduce:

1. Open Session1 with beeline and execute SQL: 
{code:java}
CREATE TABLE tbl (i int);
INSERT INTO tbl select 1;{code}
2. Close Session1

3. Open Session2 with beeline and execute SQL: 
{code:java}
INSERT INTO tbl select 1;
{code}
4. Get exception in Session2:
{code:java}
Error: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
hdfs://xxx/user/hive/warehouse/tbl/.hive-staging_hive_2019-03-11_20-22-53_627_4903214085803350096-2/-ext-1/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000
 to destination 
hdfs://xxx/user/hive/warehouse/tbl/part-0-b6277b02-c70f-4784-bc99-1201ac6e6364-c000;
 (state=,code=0)
{code}

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-11-02 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235127#comment-16235127
 ] 

xinzhang edited comment on SPARK-21067 at 11/2/17 7:23 AM:
---

[~dricard]
Please check issue here link and try .
[https://issues.apache.org/jira/browse/SPARK-21725]


was (Author: zhangxin0112zx):
[~dricard]
Please say issue here link and try .
[https://issues.apache.org/jira/browse/SPARK-21725]

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-09-18 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169797#comment-16169797
 ] 

xinzhang edited comment on SPARK-21067 at 9/18/17 9:35 AM:
---

I try to solve it by coding the source code by myself. It is too complex to me. 
Hope the community or anyone could give a hand and fix it. [~rxin]



was (Author: zhangxin0112zx):
i try to solve it by coding the source code by myself. It is too complex to me. 
Hope the community or anyone could give a hand and fix it. [~rxin]


> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-09-10 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160550#comment-16160550
 ] 

xinzhang edited comment on SPARK-21067 at 9/11/17 2:02 AM:
---

[~dricard]

Thanks for your reply.
So do we . Use the parquet . But another pro is when u use sql like "insert 
overwrite table a partition(pt='2') select" . 
It will also cause the thriftserver fail . Do you happen to have the same 
problem?
Only happend with the table which use partitions . this all right when use 
parquet without partition. "insert overwrite table a  select"


was (Author: zhangxin0112zx):
[~dricard]

Thanks for your reply.
So do we . Use the parquet . But another pro is when u use sql like "insert 
overwrite table a partition(pt='2') select" . 
It will also cause the thriftserver fail . Do you happen to have the same 
problem?

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-08-14 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108303#comment-16108303
 ] 

xinzhang edited comment on SPARK-21067 at 8/15/17 1:09 AM:
---

hi .
 I use the parquet to avoid the issuse about create table as.
It appear in insert overwrite table (partition). I could not find any ways to 
avoid this issuse ?Any suggests will be great helpful.
https://issues.apache.org/jira/browse/SPARK-21725


was (Author: zhangxin0112zx):
hi .
 I use the parquet to avoid the issuse about create table as.
It appear in insert overwrite table (partition). I could not find any ways to 
avoid this issuse ?Any suggests will great helpful.
https://issues.apache.org/jira/browse/SPARK-21725

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-08-14 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108303#comment-16108303
 ] 

xinzhang edited comment on SPARK-21067 at 8/15/17 1:07 AM:
---

hi .
 I use the parquet to avoid the issuse about create table as.
It appear in insert overwrite table (partition). I could not find any ways to 
avoid this issuse ?Any suggests will great helpful.
https://issues.apache.org/jira/browse/SPARK-21725


was (Author: zhangxin0112zx):
hi srowen.could u  consider about this .give some suggest.[~srowen]


> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-08-14 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119310#comment-16119310
 ] 

xinzhang edited comment on SPARK-21067 at 8/15/17 1:04 AM:
---

hi [~smilegator]
Can you push this BUG repair? 
In my consideration, this is a very big obstacle for us when we go to 
popularization and use SparkSQL(thriftserver).


was (Author: zhangxin0112zx):
hi [~cloud_fan] 
Can you push this BUG repair? 
In my consideration, this is a very big obstacle for us when we go to 
popularization and use SparkSQL(thriftserver).

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-08-08 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119310#comment-16119310
 ] 

xinzhang edited comment on SPARK-21067 at 8/9/17 2:04 AM:
--

hi [~cloud_fan] 
Can you push this BUG repair? 
In my consideration, this is a very big obstacle for us when we go to 
popularization and use SparkSQL(thriftserver).


was (Author: zhangxin0112zx):
hi [~cloud_fan] 
Can you push this BUG repair? 
In my consideration, this is a very big obstacle for us when we go to use 
SparkSQL(thriftserver).

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-08-08 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119310#comment-16119310
 ] 

xinzhang edited comment on SPARK-21067 at 8/9/17 2:03 AM:
--

hi [~cloud_fan] 
Can you push this BUG repair? 
In my consideration, this is a very big obstacle for us when we go to use 
SparkSQL(thriftserver).


was (Author: zhangxin0112zx):
hi [~cloud_fan] 
Can you push this BUG repair? 
In my consideration, this is a very big obstacle for us when we go to use 
Spark(thriftserver).

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-08-04 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108303#comment-16108303
 ] 

xinzhang edited comment on SPARK-21067 at 8/5/17 1:13 AM:
--

hi srowen.could u  consider about this .give some suggest.[~srowen]



was (Author: zhangxin0112zx):
hi guoxiaolongzte.could u  consider about this .give some 
suggest.[~guoxiaolongzte]


> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-29 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106057#comment-16106057
 ] 

xinzhang edited comment on SPARK-21067 at 7/29/17 7:09 AM:
---

hi.Reynold Xin  i am looking forwad to your reply [~rxin]


was (Author: zhangxin0112zx):
hi.Reynold Xin  i am looking forwad to your reply

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-27 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103202#comment-16103202
 ] 

xinzhang edited comment on SPARK-21067 at 7/28/17 1:06 AM:
---

same here.
problem  reappeared in Spark 2.1.0 thriftserver :

Open Beeline Session 1
Create Table 1 (Success)
Open Beeline Session 2
Create Table 2 (Success)
Close Beeline Session 1
Create Table 3 in Beeline Session 2 (FAIL)

use parquet,  the issue is not present .

[~cloud_fan]


was (Author: zhangxin0112zx):
same here.
problem  reappeared in Spark 2.1.0 thriftserver :

Open Beeline Session 1
Create Table 1 (Success)
Open Beeline Session 2
Create Table 2 (Success)
Close Beeline Session 1
Create Table 3 in Beeline Session 2 (FAIL)

use parquet,  the issue is not present .

@Wenchen Fan

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-27 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103202#comment-16103202
 ] 

xinzhang edited comment on SPARK-21067 at 7/27/17 1:27 PM:
---

same here.
problem  reappeared in Spark 2.1.0 thriftserver :

Open Beeline Session 1
Create Table 1 (Success)
Open Beeline Session 2
Create Table 2 (Success)
Close Beeline Session 1
Create Table 3 in Beeline Session 2 (FAIL)

use parquet,  the issue is not present .

@Wenchen Fan


was (Author: zhangxin0112zx):
same here.
problem  reappeared in Spark 2.1.0 thriftserver :

Open Beeline Session 1
Create Table 1 (Success)
Open Beeline Session 2
Create Table 2 (Success)
Close Beeline Session 1
Create Table 3 in Beeline Session 2 (FAIL)

use parquet,  the issue is not present .

Wenchen Fan

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-11 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080503#comment-16080503
 ] 

Dominic Ricard edited comment on SPARK-21067 at 7/11/17 8:00 PM:
-

Interesting findings today.

- Open *{color:#59afe1}Beeline Session 1{color}*
-- Create Table 1 (Success)
- Open *{color:#14892c}Beeline Session 2{color}*
-- Create Table 2 (Success)
- Close *{color:#59afe1}Beeline Session 1{color}*
- Create Table 3 in *{color:#14892c}Beeline Session 2{color}* 
({color:#d04437}FAIL{color})

So, it seems like the problem occurs after the 1st session is closed. What does 
the Thrift server do when a session is closed that could cause this issue?

Looking at the Hive Metastore logs, I notice that the same SQL query (CREATE 
TABLE ...) translate to different actions between the 1st and later sessions:

Session 1:
{noformat}
...
PERFLOG method=create_table_with_environment_context 
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler
...
PERFLOG method=alter_table_with_cascade 
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler
{noformat}

Session 2:
{noformat}
...
PERFLOG method=create_table_with_environment_context 
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler>
...
PERFLOG method=drop_table_with_environment_context 
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler
{noformat}


was (Author: dricard):
Interesting findings today.

- Open *{color:#59afe1}Beeline Session 1{color}*
-- Create Table 1 (Success)
- Open *{color:#14892c}Beeline Session 2{color}*
-- Create Table 2 (Success)
- Close *{color:#59afe1}Beeline Session 1{color}*
- Create Table 3 in *{color:#14892c}Beeline Session 2{color}* 
({color:#d04437}FAIL{color})

So, it seems like the problem occurs after the 1st session is closed. What does 
the Thrift server do when a session is closed that could cause this issue?

Looking at the Hive Metastore logs, I notice that the same SQL query (CREATE 
TABLE ...) translate to different actions between the 1st and later sessions:

Session 1:
{noformat}
PERFLOG method=alter_table_with_cascade 
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler
{noformat}

Session 2:
{noformat}
PERFLOG method=drop_table_with_environment_context 
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler
{noformat}

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-11 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082441#comment-16082441
 ] 

Dominic Ricard edited comment on SPARK-21067 at 7/11/17 7:16 PM:
-

[~smilegator] Hey Xiao, I see that you have made many contribution around 
CREATE TABLE issues. Could you shed some light on this issue? We are either 
looking for a fix or for a property to set hive.default.fileformat in Spark 2 
to have it use parquet instead of textfile, since the issue is not present when 
the fileformat is set to "parquet". 

Looks like your Pull Request to add spark.sql.default.fileformat 
(https://issues.apache.org/jira/browse/SPARK-16825) did not go in and that 
would have solve the issue with default store / fileformat...

Thanks.

FWD: [~cloud_fan], you might also know how this can be resolved...


was (Author: dricard):
[~smilegator] Hey Xiao, I see that you have made many contribution around 
CREATE TABLE issues. Could you shed some light on this issue? We are either 
looking for a fix or for a property to set hive.default.fileformat in Spark 2 
to have it use parquet instead of textfile, since the issue is not present when 
the fileformat is set to "parquet". 

Looks like your Pull Request to add spark.sql.default.fileformat 
(https://issues.apache.org/jira/browse/SPARK-16825) did not go in and that 
would have solve the issue with default store / fileformat...

Thanks.

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-11 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082441#comment-16082441
 ] 

Dominic Ricard edited comment on SPARK-21067 at 7/11/17 5:15 PM:
-

[~smilegator] Hey Xiao, I see that you have made many contribution around 
CREATE TABLE issues. Could you shed some light on this issue? We are either 
looking for a fix or for a property to set hive.default.fileformat in Spark 2 
to have it use parquet instead of textfile, since the issue is not present when 
the fileformat is set to "parquet". 

Looks like your Pull Request to add spark.sql.default.fileformat 
(https://issues.apache.org/jira/browse/SPARK-16825) did not go in and that 
would have solve the issue with default store / fileformat...

Thanks.


was (Author: dricard):
[~smilegator] Hey Xiao, I see that you have made many contribution around 
CREATE TABLE issues. Could you shed some light on this issue? We are either 
looking for a fix or for a property to set hive.default.fileformat in Spark 2. 
Looks like your Pull Request to add spark.sql.default.fileformat 
(https://issues.apache.org/jira/browse/SPARK-16825) did not go in...

Thanks.

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
>

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-10 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080503#comment-16080503
 ] 

Dominic Ricard edited comment on SPARK-21067 at 7/10/17 9:04 PM:
-

Interesting findings today.

- Open *{color:#59afe1}Beeline Session 1{color}*
-- Create Table 1 (Success)
- Open *{color:#14892c}Beeline Session 2{color}*
-- Create Table 2 (Success)
- Close *{color:#59afe1}Beeline Session 1{color}*
- Create Table 3 in *{color:#14892c}Beeline Session 2{color}* 
({color:#d04437}FAIL{color})

So, it seems like the problem occurs after the 1st session is closed. What does 
the Thrift server do when a session is closed that could cause this issue?

Looking at the Hive Metastore logs, I notice that the same SQL query (CREATE 
TABLE ...) translate to different actions between the 1st and later sessions:

Session 1:
{noformat}
PERFLOG method=alter_table_with_cascade 
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler
{noformat}

Session 2:
{noformat}
PERFLOG method=drop_table_with_environment_context 
from=org.apache.hadoop.hive.metastore.RetryingHMSHandler
{noformat}


was (Author: dricard):
Interesting findings today.

- Open *{color:#59afe1}Beeline Session 1{color}*
-- Create Table 1 (Success)
- Open *{color:#14892c}Beeline Session 2{color}*
-- Create Table 2 (Success)
- Close *{color:#59afe1}Beeline Session 1{color}*
- Create Table 3 in *{color:#14892c}Beeline Session 2{color}* 
({color:#d04437}FAIL{color})

So, it seems like the problem occurs after the 1st session is closed. What does 
the Thrift server do when a session is closed that could cause this issue?

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-10 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080503#comment-16080503
 ] 

Dominic Ricard edited comment on SPARK-21067 at 7/10/17 3:43 PM:
-

Interesting findings today.

- Open *{color:#59afe1}Beeline Session 1{color}*
-- Create Table 1 (Success)
- Open *{color:#14892c}Beeline Session 2{color}*
-- Create Table 2 (Success)
- Close *{color:#59afe1}Beeline Session 1{color}*
- Create Table 3 in *{color:#14892c}Beeline Session 2{color}* 
({color:#d04437}FAIL{color})

So, it seems like the problem occurs after the 1st session is closed. What does 
the Thrift server do when a session is closed that could cause this issue?


was (Author: dricard):
Interesting findings today.

- Open Session 1
-- Create Table 1 (Success)
- Open Session 2
-- Create Table 2 (Success)
- Close Session 1
- Create Table 3 in Session 2 (FAIL)

So, it seems like the problem occurs after the 1st session is closed. What does 
the Thrift server do when a session is closed that could cause this issue?

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-06-27 Thread Dominic Ricard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064833#comment-16064833
 ] 

Dominic Ricard edited comment on SPARK-21067 at 6/27/17 1:31 PM:
-

[~q79969786], yes. As stated in the description, ours is set to 
"/tmp/hive-staging/\{user.name\}"


was (Author: dricard):
[~q79969786], yes. As stated in the description, ours is set to 
"/tmp/hive-staging/{user.name}"

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
>