[jira] [Comment Edited] (SPARK-12497) thriftServer does not support semicolon in sql

2021-02-01 Thread xinzhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276864#comment-17276864
 ] 

xinzhang edited comment on SPARK-12497 at 2/2/21, 5:44 AM:
---

[~kabhwan]

Sorry for the mixed up Tests.

Please recheck the new test. 
 # It's good with Spark 3.0.0 . (BTW: semicolon is good in beeline
 # It's still a bug with Spark 2.4.7 . 

[root@actuatorx-dispatcher-172-25-48-173 spark]# env|grep spark
 SPARK_HOME=/opt/spark/spark-bin
 
PATH=/root/perl5/bin:/opt/scala/scala-bin//bin:/opt/spark/spark-bin/bin:172.25.52.34:/opt/hive/hive-bin/bin/:172.31.10.86:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/swosbf/bin:/usr/local/swosbf/bin/system:/usr/java/jdk/bin:/usr/bin:/usr/sbin:/bin:/sbin:/usr/X11R6/bin:/root/bin
 PWD=/opt/spark
 [root@actuatorx-dispatcher-172-25-48-173 spark]# ll
 total 4
 -rw-r--r-- 1 root root 646 Feb 1 17:44 derby.log
 drwxr-xr-x 5 root root 133 Feb 1 17:44 metastore_db
 drwxr-xr-x 14 root root 255 Sep 22 13:57 spark-2.3.0-bin-hadoop2.6
 drwxr-xr-x 14 1000 1000 240 Feb 2 13:32 spark-2.4.7-bin-hadoop2.6
 drwxr-xr-x 14 root root 240 Feb 2 13:26 spark-3.0.0-bin-hadoop2.7
 lrwxrwxrwx 1 root root 25 Feb 1 15:42 spark-bin -> spark-2.4.7-bin-hadoop2.6
 [root@actuatorx-dispatcher-172-25-48-173 spark]# jps
 3348544 RunJar
 3354564 Jps
 3354234 RunJar
 984853 JarLauncher
 [root@actuatorx-dispatcher-172-25-48-173 spark]# sh 
spark-bin/sbin/start-thriftserver.sh 
 starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to 
/opt/spark/spark-bin/logs/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-actuatorx-dispatcher-172-25-48-173.out

[root@actuatorx-dispatcher-172-25-48-173 spark]# jps
3362650 Jps
984853 JarLauncher
3355197 SparkSubmit
3362444 RunJar
 [root@actuatorx-dispatcher-172-25-48-173 spark]# netstat -anp|grep 3355197
 tcp 0 0 172.25.48.173:21120 0.0.0.0:* LISTEN 3355197/java 
 tcp 0 0 0.0.0.0:4040 0.0.0.0:* LISTEN 3355197/java 
 tcp 0 0 172.25.48.173:22219 0.0.0.0:* LISTEN 3355197/java 
 tcp 0 0 0.0.0.0:50031 0.0.0.0:* LISTEN 3355197/java 
 tcp 0 0 172.25.48.173:51797 172.25.48.231:6033 ESTABLISHED 3355197/java 
 tcp 0 0 172.25.48.173:51795 172.25.48.231:6033 ESTABLISHED 3355197/java 
 tcp 0 0 172.25.48.173:51787 172.25.48.231:6033 ESTABLISHED 3355197/java 
 tcp 0 0 172.25.48.173:51789 172.25.48.231:6033 ESTABLISHED 3355197/java 
 unix 3 [ ] STREAM CONNECTED 534110569 3355197/java 
 unix 3 [ ] STREAM CONNECTED 534110568 3355197/java 
 unix 2 [ ] STREAM CONNECTED 534050562 3355197/java 
 unix 2 [ ] STREAM CONNECTED 534110572 3355197/java 
 [root@actuatorx-dispatcher-172-25-48-173 spark]# 
/opt/spark/spark-bin/bin/beeline -u jdbc:hive2://172.25.48.173:50031/tools -n 
tools 
 Connecting to jdbc:hive2://172.25.48.173:50031/tools
 21/02/02 13:38:57 INFO jdbc.Utils: Supplied authorities: 172.25.48.173:50031
 21/02/02 13:38:57 INFO jdbc.Utils: Resolved authority: 172.25.48.173:50031
 21/02/02 13:38:57 INFO jdbc.HiveConnection: Will try to open client transport 
with JDBC Uri: jdbc:hive2://172.25.48.173:50031/tools
 Connected to: Spark SQL (version 2.4.7)
 Driver: Hive JDBC (version 1.2.1.spark2)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 Beeline version 1.2.1.spark2 by Apache Hive
 0: jdbc:hive2://172.25.48.173:50031/tools> select '\;';
 Error: org.apache.spark.sql.catalyst.parser.ParseException: 
 no viable alternative at input 'select ''(line 1, pos 7)

== SQL ==
 select '\
 ---^^^ (state=,code=0)
 0: jdbc:hive2://172.25.48.173:50031/tools> !exit
 Closing: 0: jdbc:hive2://172.25.48.173:50031/tools
 [root@actuatorx-dispatcher-172-25-48-173 spark]#


was (Author: zhangxin0112zx):
[~kabhwan]

Sorry for the mixed up Tests.

Please recheck the new test. 
 # It's good with Spark 3.0.0 . (BTW: semicolon is good in beeline
 # It's still a bug with Spark 2.4.7 . 

[root@actuatorx-dispatcher-172-25-48-173 spark]# env|grep spark
SPARK_HOME=/opt/spark/spark-bin
PATH=/root/perl5/bin:/opt/scala/scala-bin//bin:/opt/spark/spark-bin/bin:172.25.52.34:/opt/hive/hive-bin/bin/:172.31.10.86:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/swosbf/bin:/usr/local/swosbf/bin/system:/usr/java/jdk/bin:/usr/bin:/usr/sbin:/bin:/sbin:/usr/X11R6/bin:/root/bin
PWD=/opt/spark
[root@actuatorx-dispatcher-172-25-48-173 spark]# ll
total 4
-rw-r--r-- 1 root root 646 Feb 1 17:44 derby.log
drwxr-xr-x 5 root root 133 Feb 1 17:44 metastore_db
drwxr-xr-x 14 root root 255 Sep 22 13:57 spark-2.3.0-bin-hadoop2.6
drwxr-xr-x 14 1000 1000 240 Feb 2 13:32 spark-2.4.7-bin-hadoop2.6
drwxr-xr-x 14 root root 240 Feb 2 13:26 spark-3.0.0-bin-hadoop2.7
lrwxrwxrwx 1 root root 25 Feb 1 15:42 spark-bin -> spark-2.4.7-bin-hadoop2.6
[root@actuatorx-dispatcher-172-25-48-173 spark]# jps
3348544 RunJar
3354564 Jps
3354234 RunJar
984853 JarLauncher
[root@actuatorx-dispatcher-172-25-48-173 spark]# sh 
spark-bin/sbin/start-thriftserver.sh 

[jira] [Commented] (SPARK-12497) thriftServer does not support semicolon in sql

2021-02-01 Thread xinzhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276864#comment-17276864
 ] 

xinzhang commented on SPARK-12497:
--

[~kabhwan]

Sorry for the mixed up Tests.

Please recheck the new test. 
 # It's good with Spark 3.0.0 . (BTW: semicolon is good in beeline
 # It's still a bug with Spark 2.4.7 . 

[root@actuatorx-dispatcher-172-25-48-173 spark]# env|grep spark
SPARK_HOME=/opt/spark/spark-bin
PATH=/root/perl5/bin:/opt/scala/scala-bin//bin:/opt/spark/spark-bin/bin:172.25.52.34:/opt/hive/hive-bin/bin/:172.31.10.86:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/swosbf/bin:/usr/local/swosbf/bin/system:/usr/java/jdk/bin:/usr/bin:/usr/sbin:/bin:/sbin:/usr/X11R6/bin:/root/bin
PWD=/opt/spark
[root@actuatorx-dispatcher-172-25-48-173 spark]# ll
total 4
-rw-r--r-- 1 root root 646 Feb 1 17:44 derby.log
drwxr-xr-x 5 root root 133 Feb 1 17:44 metastore_db
drwxr-xr-x 14 root root 255 Sep 22 13:57 spark-2.3.0-bin-hadoop2.6
drwxr-xr-x 14 1000 1000 240 Feb 2 13:32 spark-2.4.7-bin-hadoop2.6
drwxr-xr-x 14 root root 240 Feb 2 13:26 spark-3.0.0-bin-hadoop2.7
lrwxrwxrwx 1 root root 25 Feb 1 15:42 spark-bin -> spark-2.4.7-bin-hadoop2.6
[root@actuatorx-dispatcher-172-25-48-173 spark]# jps
3348544 RunJar
3354564 Jps
3354234 RunJar
984853 JarLauncher
[root@actuatorx-dispatcher-172-25-48-173 spark]# sh 
spark-bin/sbin/start-thriftserver.sh 
starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to 
/opt/spark/spark-bin/logs/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-actuatorx-dispatcher-172-25-48-173.out
[root@actuatorx-dispatcher-172-25-48-173 spark]# netstat -anp|grep 3355197
tcp 0 0 172.25.48.173:21120 0.0.0.0:* LISTEN 3355197/java 
tcp 0 0 0.0.0.0:4040 0.0.0.0:* LISTEN 3355197/java 
tcp 0 0 172.25.48.173:22219 0.0.0.0:* LISTEN 3355197/java 
tcp 0 0 0.0.0.0:50031 0.0.0.0:* LISTEN 3355197/java 
tcp 0 0 172.25.48.173:51797 172.25.48.231:6033 ESTABLISHED 3355197/java 
tcp 0 0 172.25.48.173:51795 172.25.48.231:6033 ESTABLISHED 3355197/java 
tcp 0 0 172.25.48.173:51787 172.25.48.231:6033 ESTABLISHED 3355197/java 
tcp 0 0 172.25.48.173:51789 172.25.48.231:6033 ESTABLISHED 3355197/java 
unix 3 [ ] STREAM CONNECTED 534110569 3355197/java 
unix 3 [ ] STREAM CONNECTED 534110568 3355197/java 
unix 2 [ ] STREAM CONNECTED 534050562 3355197/java 
unix 2 [ ] STREAM CONNECTED 534110572 3355197/java 
[root@actuatorx-dispatcher-172-25-48-173 spark]# 
/opt/spark/spark-bin/bin/beeline -u jdbc:hive2://172.25.48.173:50031/tools -n 
tools 
Connecting to jdbc:hive2://172.25.48.173:50031/tools
21/02/02 13:38:57 INFO jdbc.Utils: Supplied authorities: 172.25.48.173:50031
21/02/02 13:38:57 INFO jdbc.Utils: Resolved authority: 172.25.48.173:50031
21/02/02 13:38:57 INFO jdbc.HiveConnection: Will try to open client transport 
with JDBC Uri: jdbc:hive2://172.25.48.173:50031/tools
Connected to: Spark SQL (version 2.4.7)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1.spark2 by Apache Hive
0: jdbc:hive2://172.25.48.173:50031/tools> select '\;';
Error: org.apache.spark.sql.catalyst.parser.ParseException: 
no viable alternative at input 'select ''(line 1, pos 7)

== SQL ==
select '\
---^^^ (state=,code=0)
0: jdbc:hive2://172.25.48.173:50031/tools> !exit
Closing: 0: jdbc:hive2://172.25.48.173:50031/tools
[root@actuatorx-dispatcher-172-25-48-173 spark]#

> thriftServer does not support semicolon in sql 
> ---
>
> Key: SPARK-12497
> URL: https://issues.apache.org/jira/browse/SPARK-12497
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: nilonealex
>Priority: Major
>
> 0: jdbc:hive2://192.168.128.130:14005> SELECT ';' from tx_1 limit 1 ;
> Error: org.apache.spark.sql.AnalysisException: cannot recognize input near 
> '' '' '' in select clause; line 1 pos 8 (state=,code=0)
> 0: jdbc:hive2://192.168.128.130:14005> 
> 0: jdbc:hive2://192.168.128.130:14005> select '\;' from tx_1 limit 1 ; 
> Error: org.apache.spark.sql.AnalysisException: cannot recognize input near 
> '' '' '' in select clause; line 1 pos 9 (state=,code=0)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12497) thriftServer does not support semicolon in sql

2021-02-01 Thread xinzhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276207#comment-17276207
 ] 

xinzhang commented on SPARK-12497:
--

It's still a bug with spark 3.0.0 . 
Start spark thriftserver with default port 1, use spark's beeline connect.


Setp 1:

[root@172-25-48-173 spark]# sh 
spark-3.0.0-bin-hadoop2.7/sbin/start-thriftserver.sh 
...

Setp 2:

[root@172-25-48-173 spark]# sh spark-3.0.0-bin-hadoop2.7/bin/beeline 
Beeline version 1.2.1.spark2 by Apache Hive
beeline> !connect jdbc:hive2://172.25.48.173:1
Connecting to jdbc:hive2://172.25.48.173:1
Enter username for jdbc:hive2://172.25.48.173:1: 
Enter password for jdbc:hive2://172.25.48.173:1: 
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Connected to: Spark SQL (version 2.4.7)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://172.25.48.173:1> select '\;';
Error: org.apache.spark.sql.catalyst.parser.ParseException: 
no viable alternative at input 'select ''(line 1, pos 7)

== SQL ==
select '\
---^^^ (state=,code=0)
0: jdbc:hive2://172.25.48.173:1>

Am i missing something?

> thriftServer does not support semicolon in sql 
> ---
>
> Key: SPARK-12497
> URL: https://issues.apache.org/jira/browse/SPARK-12497
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: nilonealex
>Priority: Major
>
> 0: jdbc:hive2://192.168.128.130:14005> SELECT ';' from tx_1 limit 1 ;
> Error: org.apache.spark.sql.AnalysisException: cannot recognize input near 
> '' '' '' in select clause; line 1 pos 8 (state=,code=0)
> 0: jdbc:hive2://192.168.128.130:14005> 
> 0: jdbc:hive2://192.168.128.130:14005> select '\;' from tx_1 limit 1 ; 
> Error: org.apache.spark.sql.AnalysisException: cannot recognize input near 
> '' '' '' in select clause; line 1 pos 9 (state=,code=0)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23022) Spark Thrift Server always cache resource issues

2018-01-09 Thread xinzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinzhang updated SPARK-23022:
-
Description: 
Hi. I use the Thrift Server for SparkSQL . I queried muiltle queries.I deploy 
the Spark  on Yarn. 
When I finish my query.Thrift Server always cache the Yarn Resources. Any 
suggests will be helpful.


Here is the img .

!https://user-images.githubusercontent.com/8244097/34752652-8d224416-f5ee-11e7-89d3-5868c128378d.png!
!https://user-images.githubusercontent.com/8244097/34752397-215bea08-f5ed-11e7-89f4-13ef9ab78904.png!
!https://user-images.githubusercontent.com/8244097/34752403-2756d224-f5ed-11e7-97d3-9c7d21c48f3a.png!
!https://user-images.githubusercontent.com/8244097/34752409-2ba1d3ce-f5ed-11e7-85d6-9e46ae8a3e2b.png!

Here is the Spark Conf .


{code:java}
spark.dynamicAllocation.minExecutors 1
spark.dynamicAllocation.maxExecutors 500
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
spark.executor.instances 2
spark.executor.memory   6g
#serializer
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.max 512m
#spark.kryo.unsafe true
spark.kryo.referenceTracking false
spark.rdd.compress true
spark.memory.offHeap.enabled true
spark.memory.offHeap.size 1g

spark.yarn.archive hdfs://ns/data1/hadooptmp/spark2.2.1/jars
spark.yarn.am.memory 2g
spark.driver.memory 4g
spark.driver.maxResultSize 2g

#SPARK SQL 
spark.sql.shuffle.partitions 500
spark.sql.statistics.fallBackToHdfs true
spark.sql.orc.filterPushdown true
spark.sql.autoBroadcastJoinThreshold 104857600
spark.sql.adaptive.enabled true

spark.history.fs.logDirectory  hdfs://ns/data4/hadooptmp/spark-history
spark.eventLog.enabled  true
spark.eventLog.compress  true
spark.eventLog.dir  hdfs://ns/data4/hadooptmp/spark-history
spark.yarn.historyServer.address  172.31.10.119:18080
spark.io.compression.codec snappy
spark.executor.logs.rolling.enableCompression true
spark.dynamicAllocation.executorIdleTimeout 10s
spark.network.timeout 600s
spark.sql.parquet.writeLegacyFormat true
{code}



  was:
Hi. I use the Thrift Server for SparkSQL . I queried muiltle queries.I deply 
the Spark  on Yarn. 
When I finish my query.Thrift Server always cache the Yarn Resources. Any 
suggests will be helpful.


Here is the img .

!https://user-images.githubusercontent.com/8244097/34752652-8d224416-f5ee-11e7-89d3-5868c128378d.png!
!https://user-images.githubusercontent.com/8244097/34752397-215bea08-f5ed-11e7-89f4-13ef9ab78904.png!
!https://user-images.githubusercontent.com/8244097/34752403-2756d224-f5ed-11e7-97d3-9c7d21c48f3a.png!
!https://user-images.githubusercontent.com/8244097/34752409-2ba1d3ce-f5ed-11e7-85d6-9e46ae8a3e2b.png!

Here is the Spark Conf .


{code:java}
spark.dynamicAllocation.minExecutors 1
spark.dynamicAllocation.maxExecutors 500
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
spark.executor.instances 2
spark.executor.memory   6g
#serializer
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.max 512m
#spark.kryo.unsafe true
spark.kryo.referenceTracking false
spark.rdd.compress true
spark.memory.offHeap.enabled true
spark.memory.offHeap.size 1g

spark.yarn.archive hdfs://ns/data1/hadooptmp/spark2.2.1/jars
spark.yarn.am.memory 2g
spark.driver.memory 4g
spark.driver.maxResultSize 2g

#SPARK SQL 
spark.sql.shuffle.partitions 500
spark.sql.statistics.fallBackToHdfs true
spark.sql.orc.filterPushdown true
spark.sql.autoBroadcastJoinThreshold 104857600
spark.sql.adaptive.enabled true

spark.history.fs.logDirectory  hdfs://ns/data4/hadooptmp/spark-history
spark.eventLog.enabled  true
spark.eventLog.compress  true
spark.eventLog.dir  hdfs://ns/data4/hadooptmp/spark-history
spark.yarn.historyServer.address  172.31.10.119:18080
spark.io.compression.codec snappy
spark.executor.logs.rolling.enableCompression true
spark.dynamicAllocation.executorIdleTimeout 10s
spark.network.timeout 600s
spark.sql.parquet.writeLegacyFormat true
{code}




> Spark Thrift Server always cache resource issues
> 
>
> Key: SPARK-23022
> URL: https://issues.apache.org/jira/browse/SPARK-23022
> Project: Spark
>  Issue 

[jira] [Updated] (SPARK-23022) Spark Thrift Server always cache resource issues

2018-01-09 Thread xinzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinzhang updated SPARK-23022:
-
Description: 
Hi. I use the Thrift Server for SparkSQL . I queried muiltle queries.I deply 
the Spark  on Yarn. 
When I finish my query.Thrift Server always cache the Yarn Resources. Any 
suggests will be helpful.


Here is the img .

!https://user-images.githubusercontent.com/8244097/34752652-8d224416-f5ee-11e7-89d3-5868c128378d.png!
!https://user-images.githubusercontent.com/8244097/34752397-215bea08-f5ed-11e7-89f4-13ef9ab78904.png!
!https://user-images.githubusercontent.com/8244097/34752403-2756d224-f5ed-11e7-97d3-9c7d21c48f3a.png!
!https://user-images.githubusercontent.com/8244097/34752409-2ba1d3ce-f5ed-11e7-85d6-9e46ae8a3e2b.png!

Here is the Spark Conf .


{code:java}
spark.dynamicAllocation.minExecutors 1
spark.dynamicAllocation.maxExecutors 500
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
spark.executor.instances 2
spark.executor.memory   6g
#serializer
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.max 512m
#spark.kryo.unsafe true
spark.kryo.referenceTracking false
spark.rdd.compress true
spark.memory.offHeap.enabled true
spark.memory.offHeap.size 1g

spark.yarn.archive hdfs://ns/data1/hadooptmp/spark2.2.1/jars
spark.yarn.am.memory 2g
spark.driver.memory 4g
spark.driver.maxResultSize 2g

#SPARK SQL 
spark.sql.shuffle.partitions 500
spark.sql.statistics.fallBackToHdfs true
spark.sql.orc.filterPushdown true
spark.sql.autoBroadcastJoinThreshold 104857600
spark.sql.adaptive.enabled true

spark.history.fs.logDirectory  hdfs://ns/data4/hadooptmp/spark-history
spark.eventLog.enabled  true
spark.eventLog.compress  true
spark.eventLog.dir  hdfs://ns/data4/hadooptmp/spark-history
spark.yarn.historyServer.address  172.31.10.119:18080
spark.io.compression.codec snappy
spark.executor.logs.rolling.enableCompression true
spark.dynamicAllocation.executorIdleTimeout 10s
spark.network.timeout 600s
spark.sql.parquet.writeLegacyFormat true
{code}



  was:

Hi. I use the Thrift Server for SparkSQL . I query muiltle query.I deply the 
Spark  on Yarn. 
When I finish my query.Thrift Server always cache the Yarn Resources. Any 
suggests will be helpful.


Here is the img .

!https://user-images.githubusercontent.com/8244097/34752652-8d224416-f5ee-11e7-89d3-5868c128378d.png!
!https://user-images.githubusercontent.com/8244097/34752397-215bea08-f5ed-11e7-89f4-13ef9ab78904.png!
!https://user-images.githubusercontent.com/8244097/34752403-2756d224-f5ed-11e7-97d3-9c7d21c48f3a.png!
!https://user-images.githubusercontent.com/8244097/34752409-2ba1d3ce-f5ed-11e7-85d6-9e46ae8a3e2b.png!

Here is the Spark Conf .


{code:java}
spark.dynamicAllocation.minExecutors 1
spark.dynamicAllocation.maxExecutors 500
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
spark.executor.instances 2
spark.executor.memory   6g
#serializer
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.max 512m
#spark.kryo.unsafe true
spark.kryo.referenceTracking false
spark.rdd.compress true
spark.memory.offHeap.enabled true
spark.memory.offHeap.size 1g

spark.yarn.archive hdfs://ns/data1/hadooptmp/spark2.2.1/jars
spark.yarn.am.memory 2g
spark.driver.memory 4g
spark.driver.maxResultSize 2g

#SPARK SQL 
spark.sql.shuffle.partitions 500
spark.sql.statistics.fallBackToHdfs true
spark.sql.orc.filterPushdown true
spark.sql.autoBroadcastJoinThreshold 104857600
spark.sql.adaptive.enabled true

spark.history.fs.logDirectory  hdfs://ns/data4/hadooptmp/spark-history
spark.eventLog.enabled  true
spark.eventLog.compress  true
spark.eventLog.dir  hdfs://ns/data4/hadooptmp/spark-history
spark.yarn.historyServer.address  172.31.10.119:18080
spark.io.compression.codec snappy
spark.executor.logs.rolling.enableCompression true
spark.dynamicAllocation.executorIdleTimeout 10s
spark.network.timeout 600s
spark.sql.parquet.writeLegacyFormat true
{code}




> Spark Thrift Server always cache resource issues
> 
>
> Key: SPARK-23022
> URL: https://issues.apache.org/jira/browse/SPARK-23022
> Project: Spark
>  Issue Type: 

[jira] [Created] (SPARK-23022) Spark Thrift Server always cache resource issues

2018-01-09 Thread xinzhang (JIRA)
xinzhang created SPARK-23022:


 Summary: Spark Thrift Server always cache resource issues
 Key: SPARK-23022
 URL: https://issues.apache.org/jira/browse/SPARK-23022
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 2.2.1
 Environment: CentOS6.x 
Spark2.x
JDK1.8
Reporter: xinzhang



Hi. I use the Thrift Server for SparkSQL . I query muiltle query.I deply the 
Spark  on Yarn. 
When I finish my query.Thrift Server always cache the Yarn Resources. Any 
suggests will be helpful.


Here is the img .

!https://user-images.githubusercontent.com/8244097/34752652-8d224416-f5ee-11e7-89d3-5868c128378d.png!
!https://user-images.githubusercontent.com/8244097/34752397-215bea08-f5ed-11e7-89f4-13ef9ab78904.png!
!https://user-images.githubusercontent.com/8244097/34752403-2756d224-f5ed-11e7-97d3-9c7d21c48f3a.png!
!https://user-images.githubusercontent.com/8244097/34752409-2ba1d3ce-f5ed-11e7-85d6-9e46ae8a3e2b.png!

Here is the Spark Conf .


{code:java}
spark.dynamicAllocation.minExecutors 1
spark.dynamicAllocation.maxExecutors 500
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
spark.executor.instances 2
spark.executor.memory   6g
#serializer
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.max 512m
#spark.kryo.unsafe true
spark.kryo.referenceTracking false
spark.rdd.compress true
spark.memory.offHeap.enabled true
spark.memory.offHeap.size 1g

spark.yarn.archive hdfs://ns/data1/hadooptmp/spark2.2.1/jars
spark.yarn.am.memory 2g
spark.driver.memory 4g
spark.driver.maxResultSize 2g

#SPARK SQL 
spark.sql.shuffle.partitions 500
spark.sql.statistics.fallBackToHdfs true
spark.sql.orc.filterPushdown true
spark.sql.autoBroadcastJoinThreshold 104857600
spark.sql.adaptive.enabled true

spark.history.fs.logDirectory  hdfs://ns/data4/hadooptmp/spark-history
spark.eventLog.enabled  true
spark.eventLog.compress  true
spark.eventLog.dir  hdfs://ns/data4/hadooptmp/spark-history
spark.yarn.historyServer.address  172.31.10.119:18080
spark.io.compression.codec snappy
spark.executor.logs.rolling.enableCompression true
spark.dynamicAllocation.executorIdleTimeout 10s
spark.network.timeout 600s
spark.sql.parquet.writeLegacyFormat true
{code}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-11-02 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235119#comment-16235119
 ] 

xinzhang edited comment on SPARK-21725 at 11/2/17 7:26 AM:
---

[~mgaido]
Finally.I found the pro where is .
add the conf to hdfs-site.xml

{code:java}

fs.hdfs.impl.disable.cache
true

{code}



reason: spark and hdfs use the same api (at the bottom they use the same 
instance).
 When beeline close a filesystem instance . It close the 
thriftserver's filesystem instance too.
  Second beeline try to get instance , it will always report 
"Caused by: java.io.IOException: Filesystem closed" 



was (Author: zhangxin0112zx):
[~mgaido]
Finally.I found the pro where is .
add the conf to hdfs-site.xml

{code:java}
 
fs.hdfs.impl.disable.cache
true

{code}



reason: spark and hdfs use the same api (at the bottom they use the same 
instance).
 When beeline close a filesystem instance . It close the 
thriftserver's filesystem instance too.
  Second beeline try to get instance , it will always report 
"Caused by: java.io.IOException: Filesystem closed" 


> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-11-02 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235119#comment-16235119
 ] 

xinzhang edited comment on SPARK-21725 at 11/2/17 7:25 AM:
---

[~mgaido]
Finally.I found the pro where is .
add the conf to hdfs-site.xml

{code:java}
 
fs.hdfs.impl.disable.cache
true

{code}



reason: spark and hdfs use the same api (at the bottom they use the same 
instance).
 When beeline close a filesystem instance . It close the 
thriftserver's filesystem instance too.
  Second beeline try to get instance , it will always report 
"Caused by: java.io.IOException: Filesystem closed" 



was (Author: zhangxin0112zx):
[~mgaido]
Finally.I found the pro where is .
add the conf to hdfs-site.xml

{code:java}
 
fs.hdfs.impl.disable.cache
true

{code}



reason: spark and hdfs use the same api (at the bottom they use the same 
instance).
 When beeline close a filesystem instance . It close the 
thriftserver's filesystem instance too.
  Second beeline try to get instance , it will always report 
"Caused by: java.io.IOException: Filesystem closed" 


> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-11-02 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235119#comment-16235119
 ] 

xinzhang edited comment on SPARK-21725 at 11/2/17 7:24 AM:
---

[~mgaido]
Finally.I found the pro where is .
add the conf to hdfs-site.xml

{code:java}
// 
fs.hdfs.impl.disable.cache
true

public String getFoo()
{
return foo;
}
{code}



reason: spark and hdfs use the same api (at the bottom they use the same 
instance).
 When beeline close a filesystem instance . It close the 
thriftserver's filesystem instance too.
  Second beeline try to get instance , it will always report 
"Caused by: java.io.IOException: Filesystem closed" 



was (Author: zhangxin0112zx):
[~mgaido]
Finally.I found the pro where is .
add the conf to hdfs-site.xml

fs.hdfs.impl.disable.cache
true


reason: spark and hdfs use the same api (at the bottom they use the same 
instance).
 When beeline close a filesystem instance . It close the 
thriftserver's filesystem instance too.
  Second beeline try to get instance , it will always report 
"Caused by: java.io.IOException: Filesystem closed" 


> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-11-02 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235119#comment-16235119
 ] 

xinzhang edited comment on SPARK-21725 at 11/2/17 7:24 AM:
---

[~mgaido]
Finally.I found the pro where is .
add the conf to hdfs-site.xml

{code:java}
 
fs.hdfs.impl.disable.cache
true

{code}



reason: spark and hdfs use the same api (at the bottom they use the same 
instance).
 When beeline close a filesystem instance . It close the 
thriftserver's filesystem instance too.
  Second beeline try to get instance , it will always report 
"Caused by: java.io.IOException: Filesystem closed" 



was (Author: zhangxin0112zx):
[~mgaido]
Finally.I found the pro where is .
add the conf to hdfs-site.xml

{code:java}
// 
fs.hdfs.impl.disable.cache
true

public String getFoo()
{
return foo;
}
{code}



reason: spark and hdfs use the same api (at the bottom they use the same 
instance).
 When beeline close a filesystem instance . It close the 
thriftserver's filesystem instance too.
  Second beeline try to get instance , it will always report 
"Caused by: java.io.IOException: Filesystem closed" 


> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-11-02 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235127#comment-16235127
 ] 

xinzhang edited comment on SPARK-21067 at 11/2/17 7:23 AM:
---

[~dricard]
Please check issue here link and try .
[https://issues.apache.org/jira/browse/SPARK-21725]


was (Author: zhangxin0112zx):
[~dricard]
Please say issue here link and try .
[https://issues.apache.org/jira/browse/SPARK-21725]

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> 

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-11-01 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235127#comment-16235127
 ] 

xinzhang commented on SPARK-21067:
--

[~dricard]
Please say issue here link and try .
[https://issues.apache.org/jira/browse/SPARK-21725]

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at 

[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-11-01 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235119#comment-16235119
 ] 

xinzhang commented on SPARK-21725:
--

[~mgaido]
Finally.I found the pro where is .
add the conf to hdfs-site.xml

fs.hdfs.impl.disable.cache
true


reason: spark and hdfs use the same api (at the bottom they use the same 
instance).
 When beeline close a filesystem instance . It close the 
thriftserver's filesystem instance too.
  Second beeline try to get instance , it will always report 
"Caused by: java.io.IOException: Filesystem closed" 


> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-11-01 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235039#comment-16235039
 ] 

xinzhang edited comment on SPARK-21725 at 11/2/17 1:09 AM:
---

could u tell me which version hadoop in your env .
cdh ? ambari ? mapr ? databricks ? or the pure community hadoop ?


was (Author: zhangxin0112zx):
could u tell me which version hadoop in your env .
cdh ? ambari ? the mapr ? databricks ? or the pure community hadoop ?

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-11-01 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235039#comment-16235039
 ] 

xinzhang commented on SPARK-21725:
--

could u tell me which version hadoop in your env .
cdh ? ambari ? the mapr ? databricks ? or the pure community hadoop ?

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-11-01 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234149#comment-16234149
 ] 

xinzhang commented on SPARK-21725:
--

I can't believe it. I build hadoop 2.8 last night.
It still appear .I think the issues here are relevant . 
[https://issues.apache.org/jira/browse/SPARK-21067]
[https://stackoverflow.com/questions/44233523/spark-sql-2-1-1-thrift-server-unable-to-move-source-hdfs-to-target]
[https://issues.apache.org/jira/browse/SPARK-11083]

My Env is Centos 6.5  Jvm 8 .And to be honest. I still cannot believe u could 
not reproduce it !! 
Now we use thriftserver 1.6. It is OK . I tried  all 2.x. I am curious what is 
the different between your env and my env.
Would u give me some suggests what should I check in my env ?

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-11-01 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233875#comment-16233875
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 11:18 AM:


[~mgaido]

That is my target package log (+mysql bad)
[https://github.com/zhangxin0112/java/blob/zxis/src/target_package_mysql.out]

That is my target package log (+derby bad)
[https://github.com/zhangxin0112/java/blob/zxis/src/target_package_derby.out]

That is my source code log (+mysql bad)
[https://github.com/zhangxin0112/java/blob/zxis/src/source_code_mysql.out]
That is my source code log (+derby good)
[https://github.com/zhangxin0112/java/blob/zxis/src/source_code_derby.out]


was (Author: zhangxin0112zx):
[~mgaido]

That is my target package log (+mysql)
[https://github.com/zhangxin0112/java/blob/zxis/src/target_package_mysql.out]

That is my target package log (+derby)
[https://github.com/zhangxin0112/java/blob/zxis/src/target_package_derby.out]

That is my source code log (+mysql)
[https://github.com/zhangxin0112/java/blob/zxis/src/source_code_mysql.out]
That is my source code log (+derby)
[https://github.com/zhangxin0112/java/blob/zxis/src/source_code_derby.out]

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-11-01 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233875#comment-16233875
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 11:17 AM:


[~mgaido]

That is my target package log (+mysql)
[https://github.com/zhangxin0112/java/blob/zxis/src/target_package_mysql.out]

That is my target package log (+derby)
[https://github.com/zhangxin0112/java/blob/zxis/src/target_package_derby.out]

That is my source code log (+mysql)
[https://github.com/zhangxin0112/java/blob/zxis/src/source_code_mysql.out]
That is my source code log (+derby)
[https://github.com/zhangxin0112/java/blob/zxis/src/source_code_derby.out]


was (Author: zhangxin0112zx):
[~mgaido]

That is my target package log (+mysql)
[https://github.com/zhangxin0112/java/blob/zxis/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-node3.out]

That is my source code log (+mysql)
[https://github.com/zhangxin0112/java/blob/zxis/src/source_code_mysql.out]
That is my source code log (+derby)
[https://github.com/zhangxin0112/java/blob/zxis/src/source_code_derby.out]

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-11-01 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233875#comment-16233875
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 11:05 AM:


[~mgaido]

That is my target package log (+mysql)
[https://github.com/zhangxin0112/java/blob/zxis/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-node3.out]

That is my source code log (+mysql)
[https://github.com/zhangxin0112/java/blob/zxis/src/source_code_mysql.out]
That is my source code log (+derby)
[https://github.com/zhangxin0112/java/blob/zxis/src/source_code_derby.out]


was (Author: zhangxin0112zx):
[~mgaido]

That is my target package log (+mysql)
[https://github.com/zhangxin0112/java/blob/zxis/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-node3.out]

That is my source code log (+mysql)
[https://github.com/zhangxin0112/java/blob/zxis/src/2.out]

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-11-01 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233875#comment-16233875
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 10:06 AM:


[~mgaido]

That is my target package log (+mysql)
[https://github.com/zhangxin0112/java/blob/zxis/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-node3.out]

That is my source code log (+mysql)
[https://github.com/zhangxin0112/java/blob/zxis/src/2.out]


was (Author: zhangxin0112zx):
That is my target package log (+mysql)
[https://github.com/zhangxin0112/java/blob/zxis/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-node3.out]

That is my source code log (+mysql)
[https://github.com/zhangxin0112/java/blob/zxis/src/2.out]

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-11-01 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233875#comment-16233875
 ] 

xinzhang commented on SPARK-21725:
--

That is my target package log (+mysql)
[https://github.com/zhangxin0112/java/blob/zxis/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-node3.out]

That is my source code log (+mysql)
[https://github.com/zhangxin0112/java/blob/zxis/src/2.out]

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 5:25 AM:
---

[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

{color:red}Hi .Do not test it in the spark source code directory !!! Test it 
with mysql(maybe derby) && Test it with the target package 
spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz  . 

target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz  + mysql : thrift 
server bad
target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz  + derby : thrift 
server bad
spark source code directory  + derby : thrift server good
spark source code directory  + mysql : thrift server bad

Under the two conditions , it always appear the pro. Could u test it {color}


was (Author: zhangxin0112zx):
[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

{color:red}Hi .Do not test it in the spark source code directory !!! Test it 
with mysql(maybe derby) && Test it with the target package 
spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz  . 

target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz  + mysql : thrift 
server bad
target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz  + derby : thrift 
server bad
spark source code directory  + derby : thrift server good

Under the two conditions , it always appear the pro. Could u test it {color}

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> 

[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 3:51 AM:
---

[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

{color:red}Hi .Do not test it in the spark source code directory !!! Test it 
with mysql(maybe derby) && Test it with the target package 
spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz  . 

target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz  + mysql : thrift 
server bad
target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz  + derby : thrift 
server bad
spark source code directory  + derby : thrift server good

Under the two conditions , it always appear the pro. Could u test it {color}


was (Author: zhangxin0112zx):
[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

{color:red}Hi .Do not test it in the spark source code directory !!! Test it 
with mysql(maybe derby) && Test it with the target package 
spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz  . Under the two conditions , it 
always appear the pro. Could u test it {color}

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> 

[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 3:49 AM:
---

[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

{color:red}Hi .Do not test it in the spark source code directory !!! Test it 
with mysql(maybe derby) && Test it with the target package 
spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz  . Under the two conditions , it 
always appear the pro. Could u test it {color}


was (Author: zhangxin0112zx):
[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

{color:red}Hi .Do not test it in the spark source code directory !!! Test it 
with mysql && Test it with the target package 
spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz  . Under the two conditions , it 
always appear the pro. Could u test it {color}

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive 

[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 3:48 AM:
---

[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

{color:red}Hi .Do not test it in the spark source code directory !!! Test it 
with mysql && Test it with the target package 
spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz  . Under the two conditions , it 
always appear the pro. Could u test it {color}


was (Author: zhangxin0112zx):
[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

{color:red}Hi .Do not test it in the spark source code directory !!! Test it 
with mysql && Test it with the target spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz 
 . Under the two conditions , it always appear the pro. Could u test it {color}

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> 

[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 3:46 AM:
---

[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

{color:red}Hi .Do not test it in the spark source code directory !!! Test it 
with mysql && Test it with the target spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz 
 . Under the two conditions , it always appear the pro. Could u test it {color}


was (Author: zhangxin0112zx):
[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

{color:red}Hi .It is the metastore !!! I test it with derby .Thriftserver is 
OK. I change it to mysql . It always appear the pro. Could u test it {color}

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet 

[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 3:16 AM:
---

[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

{color:red}Hi .It is the metastore !!! I test it with derby .Thriftserver is 
OK. I change it to mysql . It always appear the pro. Could u test it {color}


was (Author: zhangxin0112zx):
[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .



> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I 

[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 3:08 AM:
---

[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .




was (Author: zhangxin0112zx):
[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

{color:red}Hi .It is the metastore !!! I test it with derby .Thriftserver is 
OK. I change it to mysql . It always appear the pro. Could u test it {color}

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I 

[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 2:59 AM:
---

[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

{color:red}Hi .It is the metastore !!! I test it with derby .Thriftserver is 
OK. I change it to mysql . It always appear the pro. Could u test it {color}


was (Author: zhangxin0112zx):
[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

{color:red}Hi .May be the point is the metastore .I test it with derby 
.Thriftserver is OK. I change it to mysql . It always appear the pro. {color}

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the 

[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 2:52 AM:
---

[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

{color:red}Hi .May be the point is the metastore .I test it with derby 
.Thriftserver is OK. I change it to mysql . It always appear the pro. {color}


was (Author: zhangxin0112zx):
[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid 

[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 2:38 AM:
---

[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

{color:red}*The metastore has changed from derby to mysql . My suggest is could 
u do it with a new env. Without your current exit env. U could rebuild it 
.*{color}
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .


was (Author: zhangxin0112zx):
[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

The metastore has changed from derby to mysql . My suggest is could u do it 
with a new env. Without your current exit env. U could rebuild it .
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: 

[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 2:37 AM:
---

[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083 or not .Keep metastore 
do not change.It is not a point) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

!https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png!

The metastore has changed from derby to mysql . My suggest is could u do it 
with a new env. Without your current exit env. U could rebuild it .
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .


was (Author: zhangxin0112zx):
[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

The metastore has changed from derby to mysql . My suggest is could u do it 
with a new env. Without your current exit env. U could rebuild it .
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 2:32 AM:
---

[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

The metastore has changed from derby to mysql . My suggest is could u do it 
with a new env. Without your current exit env. U could rebuild it .
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .


was (Author: zhangxin0112zx):
[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

The metastore has changed from derby to mysql . My suggest is could u do it 
with a new env. Without your current exit env.
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 2:31 AM:
---

[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

The metastore has changed from derby to mysql . My suggest is could u do it 
with a new env. Without your current exit env.
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .


was (Author: zhangxin0112zx):
[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

The metastore has changed from derby to mysql . My suggest is could u do it as 
a new env without your exit env.
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578
 ] 

xinzhang edited comment on SPARK-21725 at 11/1/17 2:30 AM:
---

[~mgaido]

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

The metastore has changed from derby to mysql . My suggest is could u do it as 
a new env without your exit env.
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .


was (Author: zhangxin0112zx):
1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

The metastore has changed from derby to mysql . My suggest is could u do it as 
a new env without your exit env.
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578
 ] 

xinzhang commented on SPARK-21725:
--

1. hive 1.2.1  
   download a new tar only change hive-site.xml 
  about hive metastore with mysql . metastore(local 9083) 
2.spark-sql copy the hive-site.xml  
3.start spark-thriftserver
4.beeline connect the thriftserver 

The metastore has changed from derby to mysql . My suggest is could u do it as 
a new env without your exit env.
Like what u say might be related to the metastore. I tested the case in 
cdh5.7(hadoop2.6)   and hadoop2.8(new env) , they will always appear , No 
matter what I did . Hope your help . Thanks .

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>Priority: Major
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337
 ] 

xinzhang edited comment on SPARK-21725 at 10/31/17 7:09 AM:


[~mgaido]
[~srowen]
Now I try with the master branch.
The problem is still here.(Important: hive.default.fileformat  Text file is the 
parameter's default value. If I tried set hive.default.fileformat=Parquet; The 
problem has gone! {color:red} But still appear with the partition tables . Do 
not Miss the last pic that is the problem core!!{color})
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the 
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: *{color:red}GOOD{color}*
Second time .Spark-sql result: *{color:red}BAD{color}*
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!

{color:red}---
---{color}
1.set hive.default.fileformat=Parquet; 
2.create partition table the problem again 

!https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png!


was (Author: zhangxin0112zx):
[~mgaido]
[~srowen]
Now I try with the master branch.
The problem is still here.(Important: hive.default.fileformat  Text file is the 
parameter's default value. If I tried set hive.default.fileformat=Parquet; The 
problem has gone!! {color:red}Do not Miss the last pic that is the problem 
core!!{color})
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the 
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: *{color:red}GOOD{color}*
Second time .Spark-sql result: *{color:red}BAD{color}*
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!

{color:red}---
---{color}
1.set hive.default.fileformat=Parquet; 
2.create partition table the problem again 

!https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png!

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: 

[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337
 ] 

xinzhang edited comment on SPARK-21725 at 10/31/17 7:07 AM:


[~mgaido]
[~srowen]
Now I try with the master branch.
The problem is still here.(Important: hive.default.fileformat  Text file is the 
parameter's default value. If I tried set hive.default.fileformat=Parquet; The 
problem has gone!! {color:red}Do not Miss the last pic that is the problem 
core!!{color})
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the 
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: *{color:red}GOOD{color}*
Second time .Spark-sql result: *{color:red}BAD{color}*
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!

{color:red}---
---{color}
1.set hive.default.fileformat=Parquet; 
2.create partition table the problem again 

!https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png!


was (Author: zhangxin0112zx):
[~mgaido]
[~srowen]
Now I try with the master branch.
The problem is still here.(Important: hive.default.fileformat  Text file is the 
parameter's default value. If I tried set hive.default.fileformat=Parquet; The 
problem has gone!! {color:red}Do not Miss the last pic that is the problem 
core!!{color})
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the 
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: GOOD
Second time .Spark-sql result: BAD
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!

---
1.set hive.default.fileformat=Parquet; 
2.create partition table the problem again 

!https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png!

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to 

[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337
 ] 

xinzhang edited comment on SPARK-21725 at 10/31/17 7:07 AM:


[~mgaido]
[~srowen]
Now I try with the master branch.
The problem is still here.(Important: hive.default.fileformat  Text file is the 
parameter's default value. If I tried set hive.default.fileformat=Parquet; The 
problem has gone!! {color:red}Do not Miss the last pic that is the problem 
core!!{color})
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the 
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: *{color:red}GOOD{color}*
Second time .Spark-sql result: *{color:red}BAD{color}*
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!

{color:red}---
---{color}
1.set hive.default.fileformat=Parquet; 
2.create partition table the problem again 

!https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png!


was (Author: zhangxin0112zx):
[~mgaido]
[~srowen]
Now I try with the master branch.
The problem is still here.(Important: hive.default.fileformat  Text file is the 
parameter's default value. If I tried set hive.default.fileformat=Parquet; The 
problem has gone!! {color:red}Do not Miss the last pic that is the problem 
core!!{color})
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the 
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: *{color:red}GOOD{color}*
Second time .Spark-sql result: *{color:red}BAD{color}*
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!

{color:red}---
---{color}
1.set hive.default.fileformat=Parquet; 
2.create partition table the problem again 

!https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png!

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: 

[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337
 ] 

xinzhang edited comment on SPARK-21725 at 10/31/17 7:03 AM:


[~mgaido]
[~srowen]
Now I try with the master branch.
The problem is still here.(Important: hive.default.fileformat  Text file is the 
parameter's default value. If I tried set hive.default.fileformat=Parquet; The 
problem has gone!! {color:red}Do not Miss the last pic that is the problem 
core!!{color})
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the 
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: GOOD
Second time .Spark-sql result: BAD
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!

---
1.set hive.default.fileformat=Parquet; 
2.create partition table the problem again 

!https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png!


was (Author: zhangxin0112zx):
[~mgaido]
[~srowen]
Now I try with the master branch.
The problem is still here.(Important: hive.default.fileformat  Text file is the 
parameter's default value. If I tried set hive.default.fileformat=Parquet; The 
problem has gone!!)
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the 
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: GOOD
Second time .Spark-sql result: BAD
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!

---
1.set hive.default.fileformat=Parquet; 
2.create partition table the problem again 

!https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png!

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at 

[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337
 ] 

xinzhang edited comment on SPARK-21725 at 10/31/17 7:01 AM:


[~mgaido]
[~srowen]
Now I try with the master branch.
The problem is still here.(Important: hive.default.fileformat  Text file is the 
parameter's default value. If I tried set hive.default.fileformat=Parquet; The 
problem has gone!!)
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the 
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: GOOD
Second time .Spark-sql result: BAD
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!

---
1.set hive.default.fileformat=Parquet; 
2.create partition table the problem again 

!https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png!


was (Author: zhangxin0112zx):
[~mgaido]
[~srowen]
Now I try with the master branch.
The problem is still here.(Important: hive.default.fileformat  Text file is the 
parameter's default value. If I tried set hive.default.fileformat=Parquet; The 
problem has gone!!)
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the 
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: GOOD
Second time .Spark-sql result: BAD
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!



> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> 

[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337
 ] 

xinzhang edited comment on SPARK-21725 at 10/31/17 6:55 AM:


[~mgaido]
[~srowen]
Now I try with the master branch.
The problem is still here.(Important: hive.default.fileformat  Text file is the 
parameter's default value. If I tried set hive.default.fileformat=Parquet; The 
problem has gone!!)
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the 
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: GOOD
Second time .Spark-sql result: BAD
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!




was (Author: zhangxin0112zx):
[~mgaido]
[~srowen]
Now I try with the master branch.
The problem is still here.
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the 
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: GOOD
Second time .Spark-sql result: BAD
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!



> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet 

[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337
 ] 

xinzhang edited comment on SPARK-21725 at 10/31/17 6:43 AM:


[~mgaido]
[~srowen]
Now I try with the master branch.
The problem is still here.
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the 
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: GOOD
Second time .Spark-sql result: BAD
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!




was (Author: zhangxin0112zx):
Now I try with the master branch.
The problem is still here.
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the 
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: GOOD
Second time .Spark-sql result: BAD
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!



> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do 

[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337
 ] 

xinzhang commented on SPARK-21725:
--

Now I try with the master branch.
The problem is still here.
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the 
lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: GOOD
Second time .Spark-sql result: BAD
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!



> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-26 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220239#comment-16220239
 ] 

xinzhang commented on SPARK-21725:
--

I tried the spark(version-master)  at 21/Aug2017, it still appear the problem . 
I will try it again now. I will replay u the result what I get . 
Thanks for your replay. [~mgaido]
[~srowen]

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-10-26 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220216#comment-16220216
 ] 

xinzhang commented on SPARK-21725:
--

I download spark 2.1.2 .The problem still appear . Could u give me any suggests 
to avoid the problem . [~mgaido]


> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22244) sparksql successed on yarn but only successed some pieces of all jobs

2017-10-11 Thread xinzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinzhang resolved SPARK-22244.
--
Resolution: Not A Problem

It caused by the client session closed

> sparksql successed on yarn but only successed some pieces of all jobs
> -
>
> Key: SPARK-22244
> URL: https://issues.apache.org/jira/browse/SPARK-22244
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.1.0
>Reporter: xinzhang
>
> Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` 
> --jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f 
> /opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> 
> /data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1
> Describe:
> It's very weird.  Some pics show the strange phenomenon。
> On yarn , the application's status show SUCCEEDED :
> !https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png!
> *{color:red}On Spark History Web, the application has moved into it. But in 
> fact it did not comple all the jobs . The active jobs should be compled. The 
> detail shows :
> {color}*
> !https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png!
> !https://user-images.githubusercontent.com/8244097/31428411-33c80056-ae9d-11e7-9a7e-35169d472a86.png!
> the log stopped :
> !https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png!
> *{color:red}what's the bug? how should i track the pro?  any suggests will 
> helpful.{color}*



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22244) sparksql successed on yarn but only successed some pieces of all jobs

2017-10-11 Thread xinzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinzhang updated SPARK-22244:
-
Description: 
Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` 
--jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f 
/opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> 
/data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1

Describe:
It's very weird.  Some pics show the strange phenomenon。
On yarn , the application's status show SUCCEEDED :

!https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png!

*{color:red}On Spark History Web, the application has moved into it. But in 
fact it did not comple all the jobs . The active jobs should be compled. The 
detail shows :
{color}*
!https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png!
!https://user-images.githubusercontent.com/8244097/31428411-33c80056-ae9d-11e7-9a7e-35169d472a86.png!

the log stopped :
!https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png!

*{color:red}what's the bug? how should i track the pro?  any suggests will 
helpful.{color}*

  was:
Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` 
--jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f 
/opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> 
/data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1

Describe:
It's very weird.  Some pics show the strange phenomenon。
On yarn , the application's status show SUCCEEDED :

!https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png!

*{color:red}On Spark History Web, the application has moved into it. But in 
fact it did not comple all the jobs . The active jobs should be compled. The 
detail shows :
{color}*
!https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png!
!https://user-images.githubusercontent.com/8244097/31427805-2a03f978-ae9b-11e7-8d87-d5a19d9a2eb0.jpg!

the log stopped :
!https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png!

*{color:red}what's the bug? how should i track the pro?  any suggests will 
helpful.{color}*


> sparksql successed on yarn but only successed some pieces of all jobs
> -
>
> Key: SPARK-22244
> URL: https://issues.apache.org/jira/browse/SPARK-22244
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.1.0
>Reporter: xinzhang
>
> Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` 
> --jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f 
> /opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> 
> /data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1
> Describe:
> It's very weird.  Some pics show the strange phenomenon。
> On yarn , the application's status show SUCCEEDED :
> !https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png!
> *{color:red}On Spark History Web, the application has moved into it. But in 
> fact it did not comple all the jobs . The active jobs should be compled. The 
> detail shows :
> {color}*
> !https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png!
> !https://user-images.githubusercontent.com/8244097/31428411-33c80056-ae9d-11e7-9a7e-35169d472a86.png!
> the log stopped :
> !https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png!
> *{color:red}what's the bug? how should i track the pro?  any suggests will 
> helpful.{color}*



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22244) sparksql successed on yarn but only successed some pieces of all jobs

2017-10-11 Thread xinzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinzhang updated SPARK-22244:
-
Description: 
Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` 
--jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f 
/opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> 
/data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1

Describe:
It's very weird.  Some pics show the strange phenomenon。
On yarn , the application's status show SUCCEEDED :

!https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png!

*{color:red}On Spark History Web, the application has moved into it. But in 
fact it did not comple all the jobs . The active jobs should be compled. The 
detail shows :
{color}*

!https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png!


!https://user-images.githubusercontent.com/8244097/31428411-33c80056-ae9d-11e7-9a7e-35169d472a86.png!

the log stopped :

!https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png!

*{color:red}what's the bug? how should i track the pro?  any suggests will 
helpful.{color}*

  was:
Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` 
--jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f 
/opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> 
/data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1

Describe:
It's very weird.  Some pics show the strange phenomenon。
On yarn , the application's status show SUCCEEDED :

!https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png!

*{color:red}On Spark History Web, the application has moved into it. But in 
fact it did not comple all the jobs . The active jobs should be compled. The 
detail shows :
{color}*
!https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png!
!https://user-images.githubusercontent.com/8244097/31428411-33c80056-ae9d-11e7-9a7e-35169d472a86.png!

the log stopped :
!https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png!

*{color:red}what's the bug? how should i track the pro?  any suggests will 
helpful.{color}*


> sparksql successed on yarn but only successed some pieces of all jobs
> -
>
> Key: SPARK-22244
> URL: https://issues.apache.org/jira/browse/SPARK-22244
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.1.0
>Reporter: xinzhang
>
> Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` 
> --jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f 
> /opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> 
> /data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1
> Describe:
> It's very weird.  Some pics show the strange phenomenon。
> On yarn , the application's status show SUCCEEDED :
> !https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png!
> *{color:red}On Spark History Web, the application has moved into it. But in 
> fact it did not comple all the jobs . The active jobs should be compled. The 
> detail shows :
> {color}*
> !https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png!
> !https://user-images.githubusercontent.com/8244097/31428411-33c80056-ae9d-11e7-9a7e-35169d472a86.png!
> the log stopped :
> !https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png!
> *{color:red}what's the bug? how should i track the pro?  any suggests will 
> helpful.{color}*



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22244) sparksql successed on yarn but only successed some pieces of all jobs

2017-10-11 Thread xinzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinzhang updated SPARK-22244:
-
Description: 
Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` 
--jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f 
/opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> 
/data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1

Describe:
It's very weird.  Some pics show the strange phenomenon。
On yarn , the application's status show SUCCEEDED :

!https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png!

*{color:red}On Spark History Web, the application has moved into it. But in 
fact it did not comple all the jobs . The active jobs should be compled. The 
detail shows :
{color}*
!https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png!
!https://user-images.githubusercontent.com/8244097/31427805-2a03f978-ae9b-11e7-8d87-d5a19d9a2eb0.jpg!

the log stopped :
!https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png!

*{color:red}what's the bug? how should i track the pro?  any suggests will 
helpful.{color}*

  was:
Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` 
--jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f 
/opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> 
/data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1

Describe:
It's very weird.  Some pics show the strange phenomenon。
On yarn , the application's status show SUCCEEDED :
!https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png!

On Spark History Web, the application has moved into it. The detail shows :
!https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png!
!https://user-images.githubusercontent.com/8244097/31427805-2a03f978-ae9b-11e7-8d87-d5a19d9a2eb0.jpg!

the log stopped :
!https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png!

what's the bug? how should i track the pro?  any suggests will helpful.


> sparksql successed on yarn but only successed some pieces of all jobs
> -
>
> Key: SPARK-22244
> URL: https://issues.apache.org/jira/browse/SPARK-22244
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.1.0
>Reporter: xinzhang
>
> Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` 
> --jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f 
> /opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> 
> /data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1
> Describe:
> It's very weird.  Some pics show the strange phenomenon。
> On yarn , the application's status show SUCCEEDED :
> !https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png!
> *{color:red}On Spark History Web, the application has moved into it. But in 
> fact it did not comple all the jobs . The active jobs should be compled. The 
> detail shows :
> {color}*
> !https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png!
> !https://user-images.githubusercontent.com/8244097/31427805-2a03f978-ae9b-11e7-8d87-d5a19d9a2eb0.jpg!
> the log stopped :
> !https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png!
> *{color:red}what's the bug? how should i track the pro?  any suggests will 
> helpful.{color}*



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22244) sparksql successed on yarn but only successed some pieces of all jobs

2017-10-11 Thread xinzhang (JIRA)
xinzhang created SPARK-22244:


 Summary: sparksql successed on yarn but only successed some pieces 
of all jobs
 Key: SPARK-22244
 URL: https://issues.apache.org/jira/browse/SPARK-22244
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Spark Shell
Affects Versions: 2.1.0
Reporter: xinzhang


Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` 
--jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f 
/opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> 
/data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1

Describe:
It's very weird.  Some pics show the strange phenomenon。
On yarn , the application's status show SUCCEEDED :
!https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png!

On Spark History Web, the application has moved into it. The detail shows :
!https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png!
!https://user-images.githubusercontent.com/8244097/31427805-2a03f978-ae9b-11e7-8d87-d5a19d9a2eb0.jpg!

the log stopped :
!https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png!

what's the bug? how should i track the pro?  any suggests will helpful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-09-18 Thread xinzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinzhang updated SPARK-21067:
-
Comment: was deleted

(was: I try to solve it by coding the source code by myself. It is too complex 
to me. Hope the community or anyone could give a hand and fix it. [~rxin]
)

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at 

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-09-18 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169797#comment-16169797
 ] 

xinzhang commented on SPARK-21067:
--

i try to solve it by coding the source code by myself. It is too complex to me. 
Hope the community or anyone could give a hand and fix it. [~rxin]


> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-09-18 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169797#comment-16169797
 ] 

xinzhang edited comment on SPARK-21067 at 9/18/17 9:35 AM:
---

I try to solve it by coding the source code by myself. It is too complex to me. 
Hope the community or anyone could give a hand and fix it. [~rxin]



was (Author: zhangxin0112zx):
i try to solve it by coding the source code by myself. It is too complex to me. 
Hope the community or anyone could give a hand and fix it. [~rxin]


> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> 

[jira] [Resolved] (SPARK-22007) spark-submit on yarn or local , got different result

2017-09-14 Thread xinzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinzhang resolved SPARK-22007.
--
Resolution: Won't Fix

> spark-submit on yarn or local , got different result
> 
>
> Key: SPARK-22007
> URL: https://issues.apache.org/jira/browse/SPARK-22007
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell, Spark Submit
>Affects Versions: 2.1.0
>Reporter: xinzhang
>
> submit the py script on local.
> /opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py
> result:
> ++
> |databaseName|
> ++
> | default|
> | |
> |   x|
> ++
> submit the py script on yarn.
> /opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster 
> test_hive.py
> result:
> ++
> |databaseName|
> ++
> | default|
> ++
> the py script :
> [yangtt@dc-gateway119 test]$ cat test_hive.py 
> #!/usr/bin/env python
> #coding=utf-8
> from os.path import expanduser, join, abspath
> from pyspark.sql import SparkSession
> from pyspark.sql import Row
> from pyspark.conf import SparkConf
> def squared(s):
>   return s * s
> warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table')
> spark = SparkSession \
> .builder \
> .appName("Python_Spark_SQL_Hive") \
> .config("spark.sql.warehouse.dir", warehouse_location) \
> .config(conf=SparkConf()) \
> .enableHiveSupport() \
> .getOrCreate()
> spark.udf.register("squared",squared)
> spark.sql("show databases").show()
> Q:why the spark load the different hive metastore
> the yarn always use the DERBY?
> 17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is 
> DERBY
> my current metastore is in mysql.
> any suggest will be helpful.
> thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22007) spark-submit on yarn or local , got different result

2017-09-14 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165972#comment-16165972
 ] 

xinzhang commented on SPARK-22007:
--

ye .i figure it out.
add this with instance sparkSession
.config("hive.metastore.uris", "thrift://11.11.11.11:9083") \

maybe the web here should describe more detail.
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder


> spark-submit on yarn or local , got different result
> 
>
> Key: SPARK-22007
> URL: https://issues.apache.org/jira/browse/SPARK-22007
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell, Spark Submit
>Affects Versions: 2.1.0
>Reporter: xinzhang
>
> submit the py script on local.
> /opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py
> result:
> ++
> |databaseName|
> ++
> | default|
> | |
> |   x|
> ++
> submit the py script on yarn.
> /opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster 
> test_hive.py
> result:
> ++
> |databaseName|
> ++
> | default|
> ++
> the py script :
> [yangtt@dc-gateway119 test]$ cat test_hive.py 
> #!/usr/bin/env python
> #coding=utf-8
> from os.path import expanduser, join, abspath
> from pyspark.sql import SparkSession
> from pyspark.sql import Row
> from pyspark.conf import SparkConf
> def squared(s):
>   return s * s
> warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table')
> spark = SparkSession \
> .builder \
> .appName("Python_Spark_SQL_Hive") \
> .config("spark.sql.warehouse.dir", warehouse_location) \
> .config(conf=SparkConf()) \
> .enableHiveSupport() \
> .getOrCreate()
> spark.udf.register("squared",squared)
> spark.sql("show databases").show()
> Q:why the spark load the different hive metastore
> the yarn always use the DERBY?
> 17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is 
> DERBY
> my current metastore is in mysql.
> any suggest will be helpful.
> thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22007) spark-submit on yarn or local , got different result

2017-09-14 Thread xinzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinzhang updated SPARK-22007:
-
Description: 
submit the py script on local.
/opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py
result:
++
|databaseName|
++
| default|
| |
|   x|
++

submit the py script on yarn.
/opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster 
test_hive.py
result:
++
|databaseName|
++
| default|
++

the py script :

[yangtt@dc-gateway119 test]$ cat test_hive.py 
#!/usr/bin/env python
#coding=utf-8

from os.path import expanduser, join, abspath

from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.conf import SparkConf

def squared(s):
  return s * s

warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table')

spark = SparkSession \
.builder \
.appName("Python_Spark_SQL_Hive") \
.config("spark.sql.warehouse.dir", warehouse_location) \
.config(conf=SparkConf()) \
.enableHiveSupport() \
.getOrCreate()

spark.udf.register("squared",squared)

spark.sql("show databases").show()



Q:why the spark load the different hive metastore
the yarn always use the DERBY?
17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is 
DERBY
my current metastore is in mysql.
any suggest will be helpful.
thanks.

  was:
submit the py script on local.
/opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py
result:
++
|databaseName|
++
| default|
| |
|   x|
++

submit the py script on yarn.
/opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster 
test_hive.py
result:
++
|databaseName|
++
| default|
++

the py script :

[yangtt@dc-gateway119 test]$ cat test_hive.py 
#!/usr/bin/env python
#coding=utf-8

from os.path import expanduser, join, abspath

from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.conf import SparkConf

def squared(s):
  return s * s

# warehouse_location points to the default location for managed databases and 
tables
warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table')

spark = SparkSession \
.builder \
.appName("Python_Spark_SQL_Hive") \
.config("spark.sql.warehouse.dir", warehouse_location) \
.config(conf=SparkConf()) \
.enableHiveSupport() \
.getOrCreate()

spark.udf.register("squared",squared)

spark.sql("show databases").show()



Q:why the spark load the different hive metastore
the yarn always use the DERBY?
17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is 
DERBY
my current metastore is in mysql.
any suggest will be helpful.
thanks.


> spark-submit on yarn or local , got different result
> 
>
> Key: SPARK-22007
> URL: https://issues.apache.org/jira/browse/SPARK-22007
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell, Spark Submit
>Affects Versions: 2.1.0
>Reporter: xinzhang
>
> submit the py script on local.
> /opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py
> result:
> ++
> |databaseName|
> ++
> | default|
> | |
> |   x|
> ++
> submit the py script on yarn.
> /opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster 
> test_hive.py
> result:
> ++
> |databaseName|
> ++
> | default|
> ++
> the py script :
> [yangtt@dc-gateway119 test]$ cat test_hive.py 
> #!/usr/bin/env python
> #coding=utf-8
> from os.path import expanduser, join, abspath
> from pyspark.sql import SparkSession
> from pyspark.sql import Row
> from pyspark.conf import SparkConf
> def squared(s):
>   return s * s
> warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table')
> spark = SparkSession \
> .builder \
> .appName("Python_Spark_SQL_Hive") \
> .config("spark.sql.warehouse.dir", warehouse_location) \
> .config(conf=SparkConf()) \
> .enableHiveSupport() \
> .getOrCreate()
> spark.udf.register("squared",squared)
> spark.sql("show databases").show()
> Q:why the spark load the different hive metastore
> the yarn always use the DERBY?
> 17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is 
> DERBY
> my current metastore is in mysql.
> any suggest will be helpful.
> thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22007) spark-submit on yarn or local , got different result

2017-09-14 Thread xinzhang (JIRA)
xinzhang created SPARK-22007:


 Summary: spark-submit on yarn or local , got different result
 Key: SPARK-22007
 URL: https://issues.apache.org/jira/browse/SPARK-22007
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Spark Shell, Spark Submit
Affects Versions: 2.1.0
Reporter: xinzhang


submit the py script on local.
/opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py
result:
++
|databaseName|
++
| default|
| |
|   x|
++

submit the py script on yarn.
/opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster 
test_hive.py
result:
++
|databaseName|
++
| default|
++

the py script :

[yangtt@dc-gateway119 test]$ cat test_hive.py 
#!/usr/bin/env python
#coding=utf-8

from os.path import expanduser, join, abspath

from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.conf import SparkConf

def squared(s):
  return s * s

# warehouse_location points to the default location for managed databases and 
tables
warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table')

spark = SparkSession \
.builder \
.appName("Python_Spark_SQL_Hive") \
.config("spark.sql.warehouse.dir", warehouse_location) \
.config(conf=SparkConf()) \
.enableHiveSupport() \
.getOrCreate()

spark.udf.register("squared",squared)

spark.sql("show databases").show()



Q:why the spark load the different hive metastore
the yarn always use the DERBY?
17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is 
DERBY
my current metastore is in mysql.
any suggest will be helpful.
thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-09-10 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160550#comment-16160550
 ] 

xinzhang edited comment on SPARK-21067 at 9/11/17 2:02 AM:
---

[~dricard]

Thanks for your reply.
So do we . Use the parquet . But another pro is when u use sql like "insert 
overwrite table a partition(pt='2') select" . 
It will also cause the thriftserver fail . Do you happen to have the same 
problem?
Only happend with the table which use partitions . this all right when use 
parquet without partition. "insert overwrite table a  select"


was (Author: zhangxin0112zx):
[~dricard]

Thanks for your reply.
So do we . Use the parquet . But another pro is when u use sql like "insert 
overwrite table a partition(pt='2') select" . 
It will also cause the thriftserver fail . Do you happen to have the same 
problem?

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> 

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-09-10 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160550#comment-16160550
 ] 

xinzhang commented on SPARK-21067:
--

[~dricard]

Thanks for your reply.
So do we . Use the parquet . But another pro is when u use sql like "insert 
overwrite table a partition(pt='2') select" . 
It will also cause the thriftserver fail . Do you happen to have the same 
problem?

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> 

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-09-08 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158192#comment-16158192
 ] 

xinzhang commented on SPARK-21067:
--

hi  [~dricard]
do u have any solutions now? 
any suggests will helpful.

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
> at 

[jira] [Comment Edited] (SPARK-21814) build spark current master can not use hive metadatamysql

2017-08-23 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138075#comment-16138075
 ] 

xinzhang edited comment on SPARK-21814 at 8/23/17 3:32 PM:
---

Thanks your reply.
(I will del this one hour later may be later)


was (Author: zhangxin0112zx):
Thanks your reply.
(I will del this one hour later)

> build spark current master can not use hive metadatamysql
> -
>
> Key: SPARK-21814
> URL: https://issues.apache.org/jira/browse/SPARK-21814
> Project: Spark
>  Issue Type: Question
>  Components: Build, SQL
>Affects Versions: 2.2.0
>Reporter: xinzhang
>
> Hi. I builded spark(master) source code by myself and it was successful. 
> Useed the cmd :
> ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr 
> -Phive -Phive-thriftserver -Pyarn
> But when I used the 'spark-sql' for connnecting the metadata(I put my Hive's 
> conf hive-site.xml into the $SPARK_HOME/conf/ ) . It seems do not worked.It 
> always connected use derby(My hive-site.xml use MySQL as metadata db).
> I could not judge the problem's reason.
> Is my build cmd right? If not.Which cmd should I use for build the project by 
> myself.
>  Any suggestes will be helpful.
> the spark source code's last commit is :
> [root@node3 spark]# git log
> commit be72b157ea13ea116c5178a9e41e37ae24090f72
> Author: gatorsmile 
> Date:   Tue Aug 22 17:54:39 2017 +0800
> [SPARK-21803][TEST] Remove the HiveDDLCommandSuite
> 
> ## What changes were proposed in this pull request?
> We do not have any Hive-specific parser. It does not make sense to keep a 
> parser-specific test suite `HiveDDLCommandSuite.scala` in the Hive package. 
> This PR is to
> 
> ## How was this patch tested?
> N/A
> 
> Author: gatorsmile 
> 
> Closes #19015 from gatorsmile/combineDDL.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21814) build spark current master can not use hive metadatamysql

2017-08-23 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138088#comment-16138088
 ] 

xinzhang commented on SPARK-21814:
--

BTW
My problem:Why spark-sql always connect the metadata derby db.I had put 
hive-site.xml into the conf as usual.When use the tar which download from 
official (2.2.0/2.1.0) the hive-site.xml always worked. 

> build spark current master can not use hive metadatamysql
> -
>
> Key: SPARK-21814
> URL: https://issues.apache.org/jira/browse/SPARK-21814
> Project: Spark
>  Issue Type: Question
>  Components: Build, SQL
>Affects Versions: 2.2.0
>Reporter: xinzhang
>
> Hi. I builded spark(master) source code by myself and it was successful. 
> Useed the cmd :
> ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr 
> -Phive -Phive-thriftserver -Pyarn
> But when I used the 'spark-sql' for connnecting the metadata(I put my Hive's 
> conf hive-site.xml into the $SPARK_HOME/conf/ ) . It seems do not worked.It 
> always connected use derby(My hive-site.xml use MySQL as metadata db).
> I could not judge the problem's reason.
> Is my build cmd right? If not.Which cmd should I use for build the project by 
> myself.
>  Any suggestes will be helpful.
> the spark source code's last commit is :
> [root@node3 spark]# git log
> commit be72b157ea13ea116c5178a9e41e37ae24090f72
> Author: gatorsmile 
> Date:   Tue Aug 22 17:54:39 2017 +0800
> [SPARK-21803][TEST] Remove the HiveDDLCommandSuite
> 
> ## What changes were proposed in this pull request?
> We do not have any Hive-specific parser. It does not make sense to keep a 
> parser-specific test suite `HiveDDLCommandSuite.scala` in the Hive package. 
> This PR is to
> 
> ## How was this patch tested?
> N/A
> 
> Author: gatorsmile 
> 
> Closes #19015 from gatorsmile/combineDDL.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21814) build spark current master can not use hive metadatamysql

2017-08-23 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138075#comment-16138075
 ] 

xinzhang commented on SPARK-21814:
--

Thanks your reply.
(I will del this one hour later)

> build spark current master can not use hive metadatamysql
> -
>
> Key: SPARK-21814
> URL: https://issues.apache.org/jira/browse/SPARK-21814
> Project: Spark
>  Issue Type: Question
>  Components: Build, SQL
>Affects Versions: 2.2.0
>Reporter: xinzhang
>
> Hi. I builded spark(master) source code by myself and it was successful. 
> Useed the cmd :
> ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr 
> -Phive -Phive-thriftserver -Pyarn
> But when I used the 'spark-sql' for connnecting the metadata(I put my Hive's 
> conf hive-site.xml into the $SPARK_HOME/conf/ ) . It seems do not worked.It 
> always connected use derby(My hive-site.xml use MySQL as metadata db).
> I could not judge the problem's reason.
> Is my build cmd right? If not.Which cmd should I use for build the project by 
> myself.
>  Any suggestes will be helpful.
> the spark source code's last commit is :
> [root@node3 spark]# git log
> commit be72b157ea13ea116c5178a9e41e37ae24090f72
> Author: gatorsmile 
> Date:   Tue Aug 22 17:54:39 2017 +0800
> [SPARK-21803][TEST] Remove the HiveDDLCommandSuite
> 
> ## What changes were proposed in this pull request?
> We do not have any Hive-specific parser. It does not make sense to keep a 
> parser-specific test suite `HiveDDLCommandSuite.scala` in the Hive package. 
> This PR is to
> 
> ## How was this patch tested?
> N/A
> 
> Author: gatorsmile 
> 
> Closes #19015 from gatorsmile/combineDDL.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21814) build spark current master can not use hive metadatamysql

2017-08-22 Thread xinzhang (JIRA)
xinzhang created SPARK-21814:


 Summary: build spark current master can not use hive metadatamysql
 Key: SPARK-21814
 URL: https://issues.apache.org/jira/browse/SPARK-21814
 Project: Spark
  Issue Type: Question
  Components: Build, SQL
Affects Versions: 2.2.0
Reporter: xinzhang


Hi. I builded spark(master) source code by myself and it was successful. Useed 
the cmd :

./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive 
-Phive-thriftserver -Pyarn

But when I used the 'spark-sql' for connnecting the metadata(I put my Hive's 
conf hive-site.xml into the $SPARK_HOME/conf/ ) . It seems do not worked.It 
always connected use derby(My hive-site.xml use MySQL as metadata db).
I could not judge the problem's reason.
Is my build cmd right? If not.Which cmd should I use for build the project by 
myself.
 Any suggestes will be helpful.


the last commit is :
[root@node3 spark]# git log
commit be72b157ea13ea116c5178a9e41e37ae24090f72
Author: gatorsmile 
Date:   Tue Aug 22 17:54:39 2017 +0800

[SPARK-21803][TEST] Remove the HiveDDLCommandSuite

## What changes were proposed in this pull request?
We do not have any Hive-specific parser. It does not make sense to keep a 
parser-specific test suite `HiveDDLCommandSuite.scala` in the Hive package. 
This PR is to

## How was this patch tested?
N/A

Author: gatorsmile 

Closes #19015 from gatorsmile/combineDDL.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21814) build spark current master can not use hive metadatamysql

2017-08-22 Thread xinzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinzhang updated SPARK-21814:
-
Description: 
Hi. I builded spark(master) source code by myself and it was successful. Useed 
the cmd :

./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive 
-Phive-thriftserver -Pyarn

But when I used the 'spark-sql' for connnecting the metadata(I put my Hive's 
conf hive-site.xml into the $SPARK_HOME/conf/ ) . It seems do not worked.It 
always connected use derby(My hive-site.xml use MySQL as metadata db).
I could not judge the problem's reason.
Is my build cmd right? If not.Which cmd should I use for build the project by 
myself.
 Any suggestes will be helpful.


the spark source code's last commit is :
[root@node3 spark]# git log
commit be72b157ea13ea116c5178a9e41e37ae24090f72
Author: gatorsmile 
Date:   Tue Aug 22 17:54:39 2017 +0800

[SPARK-21803][TEST] Remove the HiveDDLCommandSuite

## What changes were proposed in this pull request?
We do not have any Hive-specific parser. It does not make sense to keep a 
parser-specific test suite `HiveDDLCommandSuite.scala` in the Hive package. 
This PR is to

## How was this patch tested?
N/A

Author: gatorsmile 

Closes #19015 from gatorsmile/combineDDL.



  was:
Hi. I builded spark(master) source code by myself and it was successful. Useed 
the cmd :

./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive 
-Phive-thriftserver -Pyarn

But when I used the 'spark-sql' for connnecting the metadata(I put my Hive's 
conf hive-site.xml into the $SPARK_HOME/conf/ ) . It seems do not worked.It 
always connected use derby(My hive-site.xml use MySQL as metadata db).
I could not judge the problem's reason.
Is my build cmd right? If not.Which cmd should I use for build the project by 
myself.
 Any suggestes will be helpful.


the last commit is :
[root@node3 spark]# git log
commit be72b157ea13ea116c5178a9e41e37ae24090f72
Author: gatorsmile 
Date:   Tue Aug 22 17:54:39 2017 +0800

[SPARK-21803][TEST] Remove the HiveDDLCommandSuite

## What changes were proposed in this pull request?
We do not have any Hive-specific parser. It does not make sense to keep a 
parser-specific test suite `HiveDDLCommandSuite.scala` in the Hive package. 
This PR is to

## How was this patch tested?
N/A

Author: gatorsmile 

Closes #19015 from gatorsmile/combineDDL.




> build spark current master can not use hive metadatamysql
> -
>
> Key: SPARK-21814
> URL: https://issues.apache.org/jira/browse/SPARK-21814
> Project: Spark
>  Issue Type: Question
>  Components: Build, SQL
>Affects Versions: 2.2.0
>Reporter: xinzhang
>
> Hi. I builded spark(master) source code by myself and it was successful. 
> Useed the cmd :
> ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr 
> -Phive -Phive-thriftserver -Pyarn
> But when I used the 'spark-sql' for connnecting the metadata(I put my Hive's 
> conf hive-site.xml into the $SPARK_HOME/conf/ ) . It seems do not worked.It 
> always connected use derby(My hive-site.xml use MySQL as metadata db).
> I could not judge the problem's reason.
> Is my build cmd right? If not.Which cmd should I use for build the project by 
> myself.
>  Any suggestes will be helpful.
> the spark source code's last commit is :
> [root@node3 spark]# git log
> commit be72b157ea13ea116c5178a9e41e37ae24090f72
> Author: gatorsmile 
> Date:   Tue Aug 22 17:54:39 2017 +0800
> [SPARK-21803][TEST] Remove the HiveDDLCommandSuite
> 
> ## What changes were proposed in this pull request?
> We do not have any Hive-specific parser. It does not make sense to keep a 
> parser-specific test suite `HiveDDLCommandSuite.scala` in the Hive package. 
> This PR is to
> 
> ## How was this patch tested?
> N/A
> 
> Author: gatorsmile 
> 
> Closes #19015 from gatorsmile/combineDDL.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-08-20 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134642#comment-16134642
 ] 

xinzhang commented on SPARK-21725:
--

Ok. I will retry the version of current master.

> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> -
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
> ..
> ..
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-1/part-0 to destination 
> hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
> ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> 
> -
> the doc about the parquet table desc here 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL 
> will try to use its own Parquet support instead of Hive SerDe for better 
> performance. This behavior is controlled by the 
> spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
> default.
> I am confused the problem appear in the table(partitions)  but it is ok with 
> table(with out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4131) Support "Writing data into the filesystem from queries"

2017-08-14 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126723#comment-16126723
 ] 

xinzhang commented on SPARK-4131:
-

any progress here?

> Support "Writing data into the filesystem from queries"
> ---
>
> Key: SPARK-4131
> URL: https://issues.apache.org/jira/browse/SPARK-4131
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: XiaoJing wang
>Assignee: Fei Wang
>Priority: Critical
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> Writing data into the filesystem from queries,SparkSql is not support .
> eg:
> {code}insert overwrite LOCAL DIRECTORY '/data1/wangxj/sql_spark' select * 
> from page_views;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-08-14 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108303#comment-16108303
 ] 

xinzhang edited comment on SPARK-21067 at 8/15/17 1:09 AM:
---

hi .
 I use the parquet to avoid the issuse about create table as.
It appear in insert overwrite table (partition). I could not find any ways to 
avoid this issuse ?Any suggests will be great helpful.
https://issues.apache.org/jira/browse/SPARK-21725


was (Author: zhangxin0112zx):
hi .
 I use the parquet to avoid the issuse about create table as.
It appear in insert overwrite table (partition). I could not find any ways to 
avoid this issuse ?Any suggests will great helpful.
https://issues.apache.org/jira/browse/SPARK-21725

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-08-14 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108303#comment-16108303
 ] 

xinzhang edited comment on SPARK-21067 at 8/15/17 1:07 AM:
---

hi .
 I use the parquet to avoid the issuse about create table as.
It appear in insert overwrite table (partition). I could not find any ways to 
avoid this issuse ?Any suggests will great helpful.
https://issues.apache.org/jira/browse/SPARK-21725


was (Author: zhangxin0112zx):
hi srowen.could u  consider about this .give some suggest.[~srowen]


> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-08-14 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119310#comment-16119310
 ] 

xinzhang edited comment on SPARK-21067 at 8/15/17 1:04 AM:
---

hi [~smilegator]
Can you push this BUG repair? 
In my consideration, this is a very big obstacle for us when we go to 
popularization and use SparkSQL(thriftserver).


was (Author: zhangxin0112zx):
hi [~cloud_fan] 
Can you push this BUG repair? 
In my consideration, this is a very big obstacle for us when we go to 
popularization and use SparkSQL(thriftserver).

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> 

[jira] [Updated] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-08-14 Thread xinzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinzhang updated SPARK-21725:
-
Description: 
use thriftserver create table with partitions.

session 1:
 SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
partitioned by (pt string) stored as parquet;
--ok
 !exit

session 2:
 SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
partitioned by (pt string) stored as parquet; 
--ok
 !exit

session 3:
--connect the thriftserver
SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
partition(pt='1') select count(1) count from tmp_11;
--ok
 !exit

session 4(do it again):
--connect the thriftserver
SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
partition(pt='1') select count(1) count from tmp_11;
--error
 !exit

-
17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing query, 
currentState RUNNING, 
java.lang.reflect.InvocationTargetException
..
..
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
source 
hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
512282-2/-ext-1/part-0 to destination 
hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
... 45 more
Caused by: java.io.IOException: Filesystem closed

-


the doc about the parquet table desc here 
http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files

Hive metastore Parquet table conversion
When reading from and writing to Hive metastore Parquet tables, Spark SQL will 
try to use its own Parquet support instead of Hive SerDe for better 
performance. This behavior is controlled by the 
spark.sql.hive.convertMetastoreParquet configuration, and is turned on by 
default.

I am confused the problem appear in the table(partitions)  but it is ok with 
table(with out partitions) . It means spark do not use its own parquet ?
Maybe someone give any suggest how could I avoid the issue?

  was:
use thriftserver create table with partitions.

session 1:
 SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
partitioned by (pt string) stored as parquet;
--ok
 !exit

session 2:
 SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
partitioned by (pt string) stored as parquet; 
--ok
 !exit

session 3:
--connect the thriftserver
SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
partition(pt='1') select count(1) count from tmp_11;
--ok
 !exit

session 4(do it again):
--connect the thriftserver
SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
partition(pt='1') select count(1) count from tmp_11;
--error
 !exit

-
17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing query, 
currentState RUNNING, 
java.lang.reflect.InvocationTargetException
..
..
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
source 
hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
512282-2/-ext-1/part-0 to destination 
hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
... 45 more
Caused by: java.io.IOException: Filesystem closed

-




> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET 

[jira] [Updated] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-08-14 Thread xinzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinzhang updated SPARK-21725:
-
Description: 
use thriftserver create table with partitions.

session 1:
 SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
partitioned by (pt string) stored as parquet;
--ok
 !exit

session 2:
 SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
partitioned by (pt string) stored as parquet; 
--ok
 !exit

session 3:
--connect the thriftserver
SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
partition(pt='1') select count(1) count from tmp_11;
--ok
 !exit

session 4(do it again):
--connect the thriftserver
SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
partition(pt='1') select count(1) count from tmp_11;
--error
 !exit

-
17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing query, 
currentState RUNNING, 
java.lang.reflect.InvocationTargetException
..
..
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
source 
hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
512282-2/-ext-1/part-0 to destination 
hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
... 45 more
Caused by: java.io.IOException: Filesystem closed

-



  was:
use thriftserver create table with partitions.
session 1:
 SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
partitioned by (pt string) stored as parquet;
--ok
 !exit
session 2:
 SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
partitioned by (pt string) stored as parquet; 
--ok
 !exit
session 3:
--connect the thriftserver
SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
partition(pt='1') select count(1) count from tmp_11;
--ok
 !exit
session 4(do it again):
--connect the thriftserver
SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
partition(pt='1') select count(1) count from tmp_11;
--error
 !exit

-
17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing query, 
currentState RUNNING, 
java.lang.reflect.InvocationTargetException
..
..
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
source 
hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
512282-2/-ext-1/part-0 to destination 
hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
... 45 more
Caused by: java.io.IOException: Filesystem closed

-




> spark thriftserver insert overwrite table partition select 
> ---
>
> Key: SPARK-21725
> URL: https://issues.apache.org/jira/browse/SPARK-21725
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: centos 6.7 spark 2.1  jdk8
>Reporter: xinzhang
>  Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
> partitioned by (pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
> partitioned by (pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
> partition(pt='1') select count(1) count from tmp_11;
> --error
>  !exit
> 

[jira] [Created] (SPARK-21725) spark thriftserver insert overwrite table partition select

2017-08-14 Thread xinzhang (JIRA)
xinzhang created SPARK-21725:


 Summary: spark thriftserver insert overwrite table partition 
select 
 Key: SPARK-21725
 URL: https://issues.apache.org/jira/browse/SPARK-21725
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.1.0
 Environment: centos 6.7 spark 2.1  jdk8
Reporter: xinzhang


use thriftserver create table with partitions.
session 1:
 SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) 
partitioned by (pt string) stored as parquet;
--ok
 !exit
session 2:
 SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) 
partitioned by (pt string) stored as parquet; 
--ok
 !exit
session 3:
--connect the thriftserver
SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
partition(pt='1') select count(1) count from tmp_11;
--ok
 !exit
session 4(do it again):
--connect the thriftserver
SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 
partition(pt='1') select count(1) count from tmp_11;
--error
 !exit

-
17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing query, 
currentState RUNNING, 
java.lang.reflect.InvocationTargetException
..
..
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
source 
hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
512282-2/-ext-1/part-0 to destination 
hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
... 45 more
Caused by: java.io.IOException: Filesystem closed

-





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-08-08 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119310#comment-16119310
 ] 

xinzhang edited comment on SPARK-21067 at 8/9/17 2:04 AM:
--

hi [~cloud_fan] 
Can you push this BUG repair? 
In my consideration, this is a very big obstacle for us when we go to 
popularization and use SparkSQL(thriftserver).


was (Author: zhangxin0112zx):
hi [~cloud_fan] 
Can you push this BUG repair? 
In my consideration, this is a very big obstacle for us when we go to use 
SparkSQL(thriftserver).

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-08-08 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119310#comment-16119310
 ] 

xinzhang edited comment on SPARK-21067 at 8/9/17 2:03 AM:
--

hi [~cloud_fan] 
Can you push this BUG repair? 
In my consideration, this is a very big obstacle for us when we go to use 
SparkSQL(thriftserver).


was (Author: zhangxin0112zx):
hi [~cloud_fan] 
Can you push this BUG repair? 
In my consideration, this is a very big obstacle for us when we go to use 
Spark(thriftserver).

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> 

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-08-08 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119310#comment-16119310
 ] 

xinzhang commented on SPARK-21067:
--

hi [~cloud_fan] 
Can you push this BUG repair? 
In my consideration, this is a very big obstacle for us when we go to use 
Spark(thriftserver).

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-08-04 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108303#comment-16108303
 ] 

xinzhang edited comment on SPARK-21067 at 8/5/17 1:13 AM:
--

hi srowen.could u  consider about this .give some suggest.[~srowen]



was (Author: zhangxin0112zx):
hi guoxiaolongzte.could u  consider about this .give some 
suggest.[~guoxiaolongzte]


> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> 

[jira] [Issue Comment Deleted] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-08-04 Thread xinzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinzhang updated SPARK-21067:
-
Comment: was deleted

(was: hi.Reynold Xin  i am looking forwad to your reply [~rxin])

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
> at 

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-31 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108303#comment-16108303
 ] 

xinzhang commented on SPARK-21067:
--

hi guoxiaolongzte.could u  consider about this .give some 
suggest.[~guoxiaolongzte]


> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
> at 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-29 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106057#comment-16106057
 ] 

xinzhang edited comment on SPARK-21067 at 7/29/17 7:09 AM:
---

hi.Reynold Xin  i am looking forwad to your reply [~rxin]


was (Author: zhangxin0112zx):
hi.Reynold Xin  i am looking forwad to your reply

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at 

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-29 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106057#comment-16106057
 ] 

xinzhang commented on SPARK-21067:
--

hi.Reynold Xin  i am looking forwad to your reply

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
> at 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-27 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103202#comment-16103202
 ] 

xinzhang edited comment on SPARK-21067 at 7/28/17 1:06 AM:
---

same here.
problem  reappeared in Spark 2.1.0 thriftserver :

Open Beeline Session 1
Create Table 1 (Success)
Open Beeline Session 2
Create Table 2 (Success)
Close Beeline Session 1
Create Table 3 in Beeline Session 2 (FAIL)

use parquet,  the issue is not present .

[~cloud_fan]


was (Author: zhangxin0112zx):
same here.
problem  reappeared in Spark 2.1.0 thriftserver :

Open Beeline Session 1
Create Table 1 (Success)
Open Beeline Session 2
Create Table 2 (Success)
Close Beeline Session 1
Create Table 3 in Beeline Session 2 (FAIL)

use parquet,  the issue is not present .

@Wenchen Fan

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> 

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-27 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103202#comment-16103202
 ] 

xinzhang commented on SPARK-21067:
--

same here.
problem  reappeared in Spark 2.1.0 thriftserver :

Open Beeline Session 1
Create Table 1 (Success)
Open Beeline Session 2
Create Table 2 (Success)
Close Beeline Session 1
Create Table 3 in Beeline Session 2 (FAIL)

use parquet,  the issue is not present .

Wenchen Fan

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> 

[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2017-07-27 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103202#comment-16103202
 ] 

xinzhang edited comment on SPARK-21067 at 7/27/17 1:27 PM:
---

same here.
problem  reappeared in Spark 2.1.0 thriftserver :

Open Beeline Session 1
Create Table 1 (Success)
Open Beeline Session 2
Create Table 2 (Success)
Close Beeline Session 1
Create Table 3 in Beeline Session 2 (FAIL)

use parquet,  the issue is not present .

@Wenchen Fan


was (Author: zhangxin0112zx):
same here.
problem  reappeared in Spark 2.1.0 thriftserver :

Open Beeline Session 1
Create Table 1 (Success)
Open Beeline Session 2
Create Table 2 (Success)
Close Beeline Session 1
Create Table 3 in Beeline Session 2 (FAIL)

use parquet,  the issue is not present .

Wenchen Fan

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> 

[jira] [Commented] (SPARK-19511) insert into table does not work on second session of beeline

2017-07-27 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102804#comment-16102804
 ] 

xinzhang commented on SPARK-19511:
--

[~chenerlu]
hi it always appear . which scene does it do not appear.?

> insert into table does not work on second session of beeline
> 
>
> Key: SPARK-19511
> URL: https://issues.apache.org/jira/browse/SPARK-19511
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Centos 7.2, java 1.7.0_91
>Reporter: sanjiv marathe
>
> same issue spark-11083 ...reopen ?
> insert into table works for the first session of beeline; and fails in the 
> second session of beeline.
> Everytime, I had to restart thrift server and connect again to get it working.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11083) insert overwrite table failed when beeline reconnect

2017-07-27 Thread xinzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102775#comment-16102775
 ] 

xinzhang commented on SPARK-11083:
--

reappeared in Spark 2.1.0. 
any one working on this issue?

> insert overwrite table failed when beeline reconnect
> 
>
> Key: SPARK-11083
> URL: https://issues.apache.org/jira/browse/SPARK-11083
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
> Environment: Spark: master branch
> Hadoop: 2.7.1
> JDK: 1.8.0_60
>Reporter: Weizhong
>Assignee: Davies Liu
>
> 1. Start Thriftserver
> 2. Use beeline connect to thriftserver, then execute "insert overwrite 
> table_name ..." clause -- success
> 3. Exit beelin
> 4. Reconnect to thriftserver, and then execute "insert overwrite table_name 
> ..." clause. -- failed
> {noformat}
> 15/10/13 18:44:35 ERROR SparkExecuteStatementOperation: Error executing 
> query, currentState RUNNING, 
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.sql.hive.client.Shim_v1_2.loadDynamicPartitions(HiveShim.scala:520)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(ClientWrapper.scala:506)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:506)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:506)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:256)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:211)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:248)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.loadDynamicPartitions(ClientWrapper.scala:505)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:225)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:58)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:58)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:144)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:129)
>   at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:739)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.runInternal(SparkExecuteStatementOperation.scala:224)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:182)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> source 
> hdfs://9.91.8.214:9000/user/hive/warehouse/tpcds_bin_partitioned_orc_2.db/catalog_returns/.hive-staging_hive_2015-10-13_18-44-17_606_2400736035447406540-2/-ext-1/cr_returned_date=2003-08-27/part-00048
>  to destination 
>