[jira] [Comment Edited] (SPARK-12497) thriftServer does not support semicolon in sql
[ https://issues.apache.org/jira/browse/SPARK-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276864#comment-17276864 ] xinzhang edited comment on SPARK-12497 at 2/2/21, 5:44 AM: --- [~kabhwan] Sorry for the mixed up Tests. Please recheck the new test. # It's good with Spark 3.0.0 . (BTW: semicolon is good in beeline # It's still a bug with Spark 2.4.7 . [root@actuatorx-dispatcher-172-25-48-173 spark]# env|grep spark SPARK_HOME=/opt/spark/spark-bin PATH=/root/perl5/bin:/opt/scala/scala-bin//bin:/opt/spark/spark-bin/bin:172.25.52.34:/opt/hive/hive-bin/bin/:172.31.10.86:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/swosbf/bin:/usr/local/swosbf/bin/system:/usr/java/jdk/bin:/usr/bin:/usr/sbin:/bin:/sbin:/usr/X11R6/bin:/root/bin PWD=/opt/spark [root@actuatorx-dispatcher-172-25-48-173 spark]# ll total 4 -rw-r--r-- 1 root root 646 Feb 1 17:44 derby.log drwxr-xr-x 5 root root 133 Feb 1 17:44 metastore_db drwxr-xr-x 14 root root 255 Sep 22 13:57 spark-2.3.0-bin-hadoop2.6 drwxr-xr-x 14 1000 1000 240 Feb 2 13:32 spark-2.4.7-bin-hadoop2.6 drwxr-xr-x 14 root root 240 Feb 2 13:26 spark-3.0.0-bin-hadoop2.7 lrwxrwxrwx 1 root root 25 Feb 1 15:42 spark-bin -> spark-2.4.7-bin-hadoop2.6 [root@actuatorx-dispatcher-172-25-48-173 spark]# jps 3348544 RunJar 3354564 Jps 3354234 RunJar 984853 JarLauncher [root@actuatorx-dispatcher-172-25-48-173 spark]# sh spark-bin/sbin/start-thriftserver.sh starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /opt/spark/spark-bin/logs/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-actuatorx-dispatcher-172-25-48-173.out [root@actuatorx-dispatcher-172-25-48-173 spark]# jps 3362650 Jps 984853 JarLauncher 3355197 SparkSubmit 3362444 RunJar [root@actuatorx-dispatcher-172-25-48-173 spark]# netstat -anp|grep 3355197 tcp 0 0 172.25.48.173:21120 0.0.0.0:* LISTEN 3355197/java tcp 0 0 0.0.0.0:4040 0.0.0.0:* LISTEN 3355197/java tcp 0 0 172.25.48.173:22219 0.0.0.0:* LISTEN 3355197/java tcp 0 0 0.0.0.0:50031 0.0.0.0:* LISTEN 3355197/java tcp 0 0 172.25.48.173:51797 172.25.48.231:6033 ESTABLISHED 3355197/java tcp 0 0 172.25.48.173:51795 172.25.48.231:6033 ESTABLISHED 3355197/java tcp 0 0 172.25.48.173:51787 172.25.48.231:6033 ESTABLISHED 3355197/java tcp 0 0 172.25.48.173:51789 172.25.48.231:6033 ESTABLISHED 3355197/java unix 3 [ ] STREAM CONNECTED 534110569 3355197/java unix 3 [ ] STREAM CONNECTED 534110568 3355197/java unix 2 [ ] STREAM CONNECTED 534050562 3355197/java unix 2 [ ] STREAM CONNECTED 534110572 3355197/java [root@actuatorx-dispatcher-172-25-48-173 spark]# /opt/spark/spark-bin/bin/beeline -u jdbc:hive2://172.25.48.173:50031/tools -n tools Connecting to jdbc:hive2://172.25.48.173:50031/tools 21/02/02 13:38:57 INFO jdbc.Utils: Supplied authorities: 172.25.48.173:50031 21/02/02 13:38:57 INFO jdbc.Utils: Resolved authority: 172.25.48.173:50031 21/02/02 13:38:57 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://172.25.48.173:50031/tools Connected to: Spark SQL (version 2.4.7) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1.spark2 by Apache Hive 0: jdbc:hive2://172.25.48.173:50031/tools> select '\;'; Error: org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input 'select ''(line 1, pos 7) == SQL == select '\ ---^^^ (state=,code=0) 0: jdbc:hive2://172.25.48.173:50031/tools> !exit Closing: 0: jdbc:hive2://172.25.48.173:50031/tools [root@actuatorx-dispatcher-172-25-48-173 spark]# was (Author: zhangxin0112zx): [~kabhwan] Sorry for the mixed up Tests. Please recheck the new test. # It's good with Spark 3.0.0 . (BTW: semicolon is good in beeline # It's still a bug with Spark 2.4.7 . [root@actuatorx-dispatcher-172-25-48-173 spark]# env|grep spark SPARK_HOME=/opt/spark/spark-bin PATH=/root/perl5/bin:/opt/scala/scala-bin//bin:/opt/spark/spark-bin/bin:172.25.52.34:/opt/hive/hive-bin/bin/:172.31.10.86:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/swosbf/bin:/usr/local/swosbf/bin/system:/usr/java/jdk/bin:/usr/bin:/usr/sbin:/bin:/sbin:/usr/X11R6/bin:/root/bin PWD=/opt/spark [root@actuatorx-dispatcher-172-25-48-173 spark]# ll total 4 -rw-r--r-- 1 root root 646 Feb 1 17:44 derby.log drwxr-xr-x 5 root root 133 Feb 1 17:44 metastore_db drwxr-xr-x 14 root root 255 Sep 22 13:57 spark-2.3.0-bin-hadoop2.6 drwxr-xr-x 14 1000 1000 240 Feb 2 13:32 spark-2.4.7-bin-hadoop2.6 drwxr-xr-x 14 root root 240 Feb 2 13:26 spark-3.0.0-bin-hadoop2.7 lrwxrwxrwx 1 root root 25 Feb 1 15:42 spark-bin -> spark-2.4.7-bin-hadoop2.6 [root@actuatorx-dispatcher-172-25-48-173 spark]# jps 3348544 RunJar 3354564 Jps 3354234 RunJar 984853 JarLauncher [root@actuatorx-dispatcher-172-25-48-173 spark]# sh spark-bin/sbin/start-thriftserver.sh
[jira] [Commented] (SPARK-12497) thriftServer does not support semicolon in sql
[ https://issues.apache.org/jira/browse/SPARK-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276864#comment-17276864 ] xinzhang commented on SPARK-12497: -- [~kabhwan] Sorry for the mixed up Tests. Please recheck the new test. # It's good with Spark 3.0.0 . (BTW: semicolon is good in beeline # It's still a bug with Spark 2.4.7 . [root@actuatorx-dispatcher-172-25-48-173 spark]# env|grep spark SPARK_HOME=/opt/spark/spark-bin PATH=/root/perl5/bin:/opt/scala/scala-bin//bin:/opt/spark/spark-bin/bin:172.25.52.34:/opt/hive/hive-bin/bin/:172.31.10.86:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/swosbf/bin:/usr/local/swosbf/bin/system:/usr/java/jdk/bin:/usr/bin:/usr/sbin:/bin:/sbin:/usr/X11R6/bin:/root/bin PWD=/opt/spark [root@actuatorx-dispatcher-172-25-48-173 spark]# ll total 4 -rw-r--r-- 1 root root 646 Feb 1 17:44 derby.log drwxr-xr-x 5 root root 133 Feb 1 17:44 metastore_db drwxr-xr-x 14 root root 255 Sep 22 13:57 spark-2.3.0-bin-hadoop2.6 drwxr-xr-x 14 1000 1000 240 Feb 2 13:32 spark-2.4.7-bin-hadoop2.6 drwxr-xr-x 14 root root 240 Feb 2 13:26 spark-3.0.0-bin-hadoop2.7 lrwxrwxrwx 1 root root 25 Feb 1 15:42 spark-bin -> spark-2.4.7-bin-hadoop2.6 [root@actuatorx-dispatcher-172-25-48-173 spark]# jps 3348544 RunJar 3354564 Jps 3354234 RunJar 984853 JarLauncher [root@actuatorx-dispatcher-172-25-48-173 spark]# sh spark-bin/sbin/start-thriftserver.sh starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /opt/spark/spark-bin/logs/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-actuatorx-dispatcher-172-25-48-173.out [root@actuatorx-dispatcher-172-25-48-173 spark]# netstat -anp|grep 3355197 tcp 0 0 172.25.48.173:21120 0.0.0.0:* LISTEN 3355197/java tcp 0 0 0.0.0.0:4040 0.0.0.0:* LISTEN 3355197/java tcp 0 0 172.25.48.173:22219 0.0.0.0:* LISTEN 3355197/java tcp 0 0 0.0.0.0:50031 0.0.0.0:* LISTEN 3355197/java tcp 0 0 172.25.48.173:51797 172.25.48.231:6033 ESTABLISHED 3355197/java tcp 0 0 172.25.48.173:51795 172.25.48.231:6033 ESTABLISHED 3355197/java tcp 0 0 172.25.48.173:51787 172.25.48.231:6033 ESTABLISHED 3355197/java tcp 0 0 172.25.48.173:51789 172.25.48.231:6033 ESTABLISHED 3355197/java unix 3 [ ] STREAM CONNECTED 534110569 3355197/java unix 3 [ ] STREAM CONNECTED 534110568 3355197/java unix 2 [ ] STREAM CONNECTED 534050562 3355197/java unix 2 [ ] STREAM CONNECTED 534110572 3355197/java [root@actuatorx-dispatcher-172-25-48-173 spark]# /opt/spark/spark-bin/bin/beeline -u jdbc:hive2://172.25.48.173:50031/tools -n tools Connecting to jdbc:hive2://172.25.48.173:50031/tools 21/02/02 13:38:57 INFO jdbc.Utils: Supplied authorities: 172.25.48.173:50031 21/02/02 13:38:57 INFO jdbc.Utils: Resolved authority: 172.25.48.173:50031 21/02/02 13:38:57 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://172.25.48.173:50031/tools Connected to: Spark SQL (version 2.4.7) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1.spark2 by Apache Hive 0: jdbc:hive2://172.25.48.173:50031/tools> select '\;'; Error: org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input 'select ''(line 1, pos 7) == SQL == select '\ ---^^^ (state=,code=0) 0: jdbc:hive2://172.25.48.173:50031/tools> !exit Closing: 0: jdbc:hive2://172.25.48.173:50031/tools [root@actuatorx-dispatcher-172-25-48-173 spark]# > thriftServer does not support semicolon in sql > --- > > Key: SPARK-12497 > URL: https://issues.apache.org/jira/browse/SPARK-12497 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2 >Reporter: nilonealex >Priority: Major > > 0: jdbc:hive2://192.168.128.130:14005> SELECT ';' from tx_1 limit 1 ; > Error: org.apache.spark.sql.AnalysisException: cannot recognize input near > '' '' '' in select clause; line 1 pos 8 (state=,code=0) > 0: jdbc:hive2://192.168.128.130:14005> > 0: jdbc:hive2://192.168.128.130:14005> select '\;' from tx_1 limit 1 ; > Error: org.apache.spark.sql.AnalysisException: cannot recognize input near > '' '' '' in select clause; line 1 pos 9 (state=,code=0) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12497) thriftServer does not support semicolon in sql
[ https://issues.apache.org/jira/browse/SPARK-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276207#comment-17276207 ] xinzhang commented on SPARK-12497: -- It's still a bug with spark 3.0.0 . Start spark thriftserver with default port 1, use spark's beeline connect. Setp 1: [root@172-25-48-173 spark]# sh spark-3.0.0-bin-hadoop2.7/sbin/start-thriftserver.sh ... Setp 2: [root@172-25-48-173 spark]# sh spark-3.0.0-bin-hadoop2.7/bin/beeline Beeline version 1.2.1.spark2 by Apache Hive beeline> !connect jdbc:hive2://172.25.48.173:1 Connecting to jdbc:hive2://172.25.48.173:1 Enter username for jdbc:hive2://172.25.48.173:1: Enter password for jdbc:hive2://172.25.48.173:1: log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Connected to: Spark SQL (version 2.4.7) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://172.25.48.173:1> select '\;'; Error: org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input 'select ''(line 1, pos 7) == SQL == select '\ ---^^^ (state=,code=0) 0: jdbc:hive2://172.25.48.173:1> Am i missing something? > thriftServer does not support semicolon in sql > --- > > Key: SPARK-12497 > URL: https://issues.apache.org/jira/browse/SPARK-12497 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2 >Reporter: nilonealex >Priority: Major > > 0: jdbc:hive2://192.168.128.130:14005> SELECT ';' from tx_1 limit 1 ; > Error: org.apache.spark.sql.AnalysisException: cannot recognize input near > '' '' '' in select clause; line 1 pos 8 (state=,code=0) > 0: jdbc:hive2://192.168.128.130:14005> > 0: jdbc:hive2://192.168.128.130:14005> select '\;' from tx_1 limit 1 ; > Error: org.apache.spark.sql.AnalysisException: cannot recognize input near > '' '' '' in select clause; line 1 pos 9 (state=,code=0) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23022) Spark Thrift Server always cache resource issues
[ https://issues.apache.org/jira/browse/SPARK-23022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinzhang updated SPARK-23022: - Description: Hi. I use the Thrift Server for SparkSQL . I queried muiltle queries.I deploy the Spark on Yarn. When I finish my query.Thrift Server always cache the Yarn Resources. Any suggests will be helpful. Here is the img . !https://user-images.githubusercontent.com/8244097/34752652-8d224416-f5ee-11e7-89d3-5868c128378d.png! !https://user-images.githubusercontent.com/8244097/34752397-215bea08-f5ed-11e7-89f4-13ef9ab78904.png! !https://user-images.githubusercontent.com/8244097/34752403-2756d224-f5ed-11e7-97d3-9c7d21c48f3a.png! !https://user-images.githubusercontent.com/8244097/34752409-2ba1d3ce-f5ed-11e7-85d6-9e46ae8a3e2b.png! Here is the Spark Conf . {code:java} spark.dynamicAllocation.minExecutors 1 spark.dynamicAllocation.maxExecutors 500 spark.dynamicAllocation.enabled true spark.shuffle.service.enabled true spark.executor.instances 2 spark.executor.memory 6g #serializer spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryoserializer.buffer.max 512m #spark.kryo.unsafe true spark.kryo.referenceTracking false spark.rdd.compress true spark.memory.offHeap.enabled true spark.memory.offHeap.size 1g spark.yarn.archive hdfs://ns/data1/hadooptmp/spark2.2.1/jars spark.yarn.am.memory 2g spark.driver.memory 4g spark.driver.maxResultSize 2g #SPARK SQL spark.sql.shuffle.partitions 500 spark.sql.statistics.fallBackToHdfs true spark.sql.orc.filterPushdown true spark.sql.autoBroadcastJoinThreshold 104857600 spark.sql.adaptive.enabled true spark.history.fs.logDirectory hdfs://ns/data4/hadooptmp/spark-history spark.eventLog.enabled true spark.eventLog.compress true spark.eventLog.dir hdfs://ns/data4/hadooptmp/spark-history spark.yarn.historyServer.address 172.31.10.119:18080 spark.io.compression.codec snappy spark.executor.logs.rolling.enableCompression true spark.dynamicAllocation.executorIdleTimeout 10s spark.network.timeout 600s spark.sql.parquet.writeLegacyFormat true {code} was: Hi. I use the Thrift Server for SparkSQL . I queried muiltle queries.I deply the Spark on Yarn. When I finish my query.Thrift Server always cache the Yarn Resources. Any suggests will be helpful. Here is the img . !https://user-images.githubusercontent.com/8244097/34752652-8d224416-f5ee-11e7-89d3-5868c128378d.png! !https://user-images.githubusercontent.com/8244097/34752397-215bea08-f5ed-11e7-89f4-13ef9ab78904.png! !https://user-images.githubusercontent.com/8244097/34752403-2756d224-f5ed-11e7-97d3-9c7d21c48f3a.png! !https://user-images.githubusercontent.com/8244097/34752409-2ba1d3ce-f5ed-11e7-85d6-9e46ae8a3e2b.png! Here is the Spark Conf . {code:java} spark.dynamicAllocation.minExecutors 1 spark.dynamicAllocation.maxExecutors 500 spark.dynamicAllocation.enabled true spark.shuffle.service.enabled true spark.executor.instances 2 spark.executor.memory 6g #serializer spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryoserializer.buffer.max 512m #spark.kryo.unsafe true spark.kryo.referenceTracking false spark.rdd.compress true spark.memory.offHeap.enabled true spark.memory.offHeap.size 1g spark.yarn.archive hdfs://ns/data1/hadooptmp/spark2.2.1/jars spark.yarn.am.memory 2g spark.driver.memory 4g spark.driver.maxResultSize 2g #SPARK SQL spark.sql.shuffle.partitions 500 spark.sql.statistics.fallBackToHdfs true spark.sql.orc.filterPushdown true spark.sql.autoBroadcastJoinThreshold 104857600 spark.sql.adaptive.enabled true spark.history.fs.logDirectory hdfs://ns/data4/hadooptmp/spark-history spark.eventLog.enabled true spark.eventLog.compress true spark.eventLog.dir hdfs://ns/data4/hadooptmp/spark-history spark.yarn.historyServer.address 172.31.10.119:18080 spark.io.compression.codec snappy spark.executor.logs.rolling.enableCompression true spark.dynamicAllocation.executorIdleTimeout 10s spark.network.timeout 600s spark.sql.parquet.writeLegacyFormat true {code} > Spark Thrift Server always cache resource issues > > > Key: SPARK-23022 > URL: https://issues.apache.org/jira/browse/SPARK-23022 > Project: Spark > Issue
[jira] [Updated] (SPARK-23022) Spark Thrift Server always cache resource issues
[ https://issues.apache.org/jira/browse/SPARK-23022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinzhang updated SPARK-23022: - Description: Hi. I use the Thrift Server for SparkSQL . I queried muiltle queries.I deply the Spark on Yarn. When I finish my query.Thrift Server always cache the Yarn Resources. Any suggests will be helpful. Here is the img . !https://user-images.githubusercontent.com/8244097/34752652-8d224416-f5ee-11e7-89d3-5868c128378d.png! !https://user-images.githubusercontent.com/8244097/34752397-215bea08-f5ed-11e7-89f4-13ef9ab78904.png! !https://user-images.githubusercontent.com/8244097/34752403-2756d224-f5ed-11e7-97d3-9c7d21c48f3a.png! !https://user-images.githubusercontent.com/8244097/34752409-2ba1d3ce-f5ed-11e7-85d6-9e46ae8a3e2b.png! Here is the Spark Conf . {code:java} spark.dynamicAllocation.minExecutors 1 spark.dynamicAllocation.maxExecutors 500 spark.dynamicAllocation.enabled true spark.shuffle.service.enabled true spark.executor.instances 2 spark.executor.memory 6g #serializer spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryoserializer.buffer.max 512m #spark.kryo.unsafe true spark.kryo.referenceTracking false spark.rdd.compress true spark.memory.offHeap.enabled true spark.memory.offHeap.size 1g spark.yarn.archive hdfs://ns/data1/hadooptmp/spark2.2.1/jars spark.yarn.am.memory 2g spark.driver.memory 4g spark.driver.maxResultSize 2g #SPARK SQL spark.sql.shuffle.partitions 500 spark.sql.statistics.fallBackToHdfs true spark.sql.orc.filterPushdown true spark.sql.autoBroadcastJoinThreshold 104857600 spark.sql.adaptive.enabled true spark.history.fs.logDirectory hdfs://ns/data4/hadooptmp/spark-history spark.eventLog.enabled true spark.eventLog.compress true spark.eventLog.dir hdfs://ns/data4/hadooptmp/spark-history spark.yarn.historyServer.address 172.31.10.119:18080 spark.io.compression.codec snappy spark.executor.logs.rolling.enableCompression true spark.dynamicAllocation.executorIdleTimeout 10s spark.network.timeout 600s spark.sql.parquet.writeLegacyFormat true {code} was: Hi. I use the Thrift Server for SparkSQL . I query muiltle query.I deply the Spark on Yarn. When I finish my query.Thrift Server always cache the Yarn Resources. Any suggests will be helpful. Here is the img . !https://user-images.githubusercontent.com/8244097/34752652-8d224416-f5ee-11e7-89d3-5868c128378d.png! !https://user-images.githubusercontent.com/8244097/34752397-215bea08-f5ed-11e7-89f4-13ef9ab78904.png! !https://user-images.githubusercontent.com/8244097/34752403-2756d224-f5ed-11e7-97d3-9c7d21c48f3a.png! !https://user-images.githubusercontent.com/8244097/34752409-2ba1d3ce-f5ed-11e7-85d6-9e46ae8a3e2b.png! Here is the Spark Conf . {code:java} spark.dynamicAllocation.minExecutors 1 spark.dynamicAllocation.maxExecutors 500 spark.dynamicAllocation.enabled true spark.shuffle.service.enabled true spark.executor.instances 2 spark.executor.memory 6g #serializer spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryoserializer.buffer.max 512m #spark.kryo.unsafe true spark.kryo.referenceTracking false spark.rdd.compress true spark.memory.offHeap.enabled true spark.memory.offHeap.size 1g spark.yarn.archive hdfs://ns/data1/hadooptmp/spark2.2.1/jars spark.yarn.am.memory 2g spark.driver.memory 4g spark.driver.maxResultSize 2g #SPARK SQL spark.sql.shuffle.partitions 500 spark.sql.statistics.fallBackToHdfs true spark.sql.orc.filterPushdown true spark.sql.autoBroadcastJoinThreshold 104857600 spark.sql.adaptive.enabled true spark.history.fs.logDirectory hdfs://ns/data4/hadooptmp/spark-history spark.eventLog.enabled true spark.eventLog.compress true spark.eventLog.dir hdfs://ns/data4/hadooptmp/spark-history spark.yarn.historyServer.address 172.31.10.119:18080 spark.io.compression.codec snappy spark.executor.logs.rolling.enableCompression true spark.dynamicAllocation.executorIdleTimeout 10s spark.network.timeout 600s spark.sql.parquet.writeLegacyFormat true {code} > Spark Thrift Server always cache resource issues > > > Key: SPARK-23022 > URL: https://issues.apache.org/jira/browse/SPARK-23022 > Project: Spark > Issue Type:
[jira] [Created] (SPARK-23022) Spark Thrift Server always cache resource issues
xinzhang created SPARK-23022: Summary: Spark Thrift Server always cache resource issues Key: SPARK-23022 URL: https://issues.apache.org/jira/browse/SPARK-23022 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 2.2.1 Environment: CentOS6.x Spark2.x JDK1.8 Reporter: xinzhang Hi. I use the Thrift Server for SparkSQL . I query muiltle query.I deply the Spark on Yarn. When I finish my query.Thrift Server always cache the Yarn Resources. Any suggests will be helpful. Here is the img . !https://user-images.githubusercontent.com/8244097/34752652-8d224416-f5ee-11e7-89d3-5868c128378d.png! !https://user-images.githubusercontent.com/8244097/34752397-215bea08-f5ed-11e7-89f4-13ef9ab78904.png! !https://user-images.githubusercontent.com/8244097/34752403-2756d224-f5ed-11e7-97d3-9c7d21c48f3a.png! !https://user-images.githubusercontent.com/8244097/34752409-2ba1d3ce-f5ed-11e7-85d6-9e46ae8a3e2b.png! Here is the Spark Conf . {code:java} spark.dynamicAllocation.minExecutors 1 spark.dynamicAllocation.maxExecutors 500 spark.dynamicAllocation.enabled true spark.shuffle.service.enabled true spark.executor.instances 2 spark.executor.memory 6g #serializer spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryoserializer.buffer.max 512m #spark.kryo.unsafe true spark.kryo.referenceTracking false spark.rdd.compress true spark.memory.offHeap.enabled true spark.memory.offHeap.size 1g spark.yarn.archive hdfs://ns/data1/hadooptmp/spark2.2.1/jars spark.yarn.am.memory 2g spark.driver.memory 4g spark.driver.maxResultSize 2g #SPARK SQL spark.sql.shuffle.partitions 500 spark.sql.statistics.fallBackToHdfs true spark.sql.orc.filterPushdown true spark.sql.autoBroadcastJoinThreshold 104857600 spark.sql.adaptive.enabled true spark.history.fs.logDirectory hdfs://ns/data4/hadooptmp/spark-history spark.eventLog.enabled true spark.eventLog.compress true spark.eventLog.dir hdfs://ns/data4/hadooptmp/spark-history spark.yarn.historyServer.address 172.31.10.119:18080 spark.io.compression.codec snappy spark.executor.logs.rolling.enableCompression true spark.dynamicAllocation.executorIdleTimeout 10s spark.network.timeout 600s spark.sql.parquet.writeLegacyFormat true {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235119#comment-16235119 ] xinzhang edited comment on SPARK-21725 at 11/2/17 7:26 AM: --- [~mgaido] Finally.I found the pro where is . add the conf to hdfs-site.xml {code:java} fs.hdfs.impl.disable.cache true {code} reason: spark and hdfs use the same api (at the bottom they use the same instance). When beeline close a filesystem instance . It close the thriftserver's filesystem instance too. Second beeline try to get instance , it will always report "Caused by: java.io.IOException: Filesystem closed" was (Author: zhangxin0112zx): [~mgaido] Finally.I found the pro where is . add the conf to hdfs-site.xml {code:java} fs.hdfs.impl.disable.cache true {code} reason: spark and hdfs use the same api (at the bottom they use the same instance). When beeline close a filesystem instance . It close the thriftserver's filesystem instance too. Second beeline try to get instance , it will always report "Caused by: java.io.IOException: Filesystem closed" > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235119#comment-16235119 ] xinzhang edited comment on SPARK-21725 at 11/2/17 7:25 AM: --- [~mgaido] Finally.I found the pro where is . add the conf to hdfs-site.xml {code:java} fs.hdfs.impl.disable.cache true {code} reason: spark and hdfs use the same api (at the bottom they use the same instance). When beeline close a filesystem instance . It close the thriftserver's filesystem instance too. Second beeline try to get instance , it will always report "Caused by: java.io.IOException: Filesystem closed" was (Author: zhangxin0112zx): [~mgaido] Finally.I found the pro where is . add the conf to hdfs-site.xml {code:java} fs.hdfs.impl.disable.cache true {code} reason: spark and hdfs use the same api (at the bottom they use the same instance). When beeline close a filesystem instance . It close the thriftserver's filesystem instance too. Second beeline try to get instance , it will always report "Caused by: java.io.IOException: Filesystem closed" > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235119#comment-16235119 ] xinzhang edited comment on SPARK-21725 at 11/2/17 7:24 AM: --- [~mgaido] Finally.I found the pro where is . add the conf to hdfs-site.xml {code:java} // fs.hdfs.impl.disable.cache true public String getFoo() { return foo; } {code} reason: spark and hdfs use the same api (at the bottom they use the same instance). When beeline close a filesystem instance . It close the thriftserver's filesystem instance too. Second beeline try to get instance , it will always report "Caused by: java.io.IOException: Filesystem closed" was (Author: zhangxin0112zx): [~mgaido] Finally.I found the pro where is . add the conf to hdfs-site.xml fs.hdfs.impl.disable.cache true reason: spark and hdfs use the same api (at the bottom they use the same instance). When beeline close a filesystem instance . It close the thriftserver's filesystem instance too. Second beeline try to get instance , it will always report "Caused by: java.io.IOException: Filesystem closed" > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235119#comment-16235119 ] xinzhang edited comment on SPARK-21725 at 11/2/17 7:24 AM: --- [~mgaido] Finally.I found the pro where is . add the conf to hdfs-site.xml {code:java} fs.hdfs.impl.disable.cache true {code} reason: spark and hdfs use the same api (at the bottom they use the same instance). When beeline close a filesystem instance . It close the thriftserver's filesystem instance too. Second beeline try to get instance , it will always report "Caused by: java.io.IOException: Filesystem closed" was (Author: zhangxin0112zx): [~mgaido] Finally.I found the pro where is . add the conf to hdfs-site.xml {code:java} // fs.hdfs.impl.disable.cache true public String getFoo() { return foo; } {code} reason: spark and hdfs use the same api (at the bottom they use the same instance). When beeline close a filesystem instance . It close the thriftserver's filesystem instance too. Second beeline try to get instance , it will always report "Caused by: java.io.IOException: Filesystem closed" > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235127#comment-16235127 ] xinzhang edited comment on SPARK-21067 at 11/2/17 7:23 AM: --- [~dricard] Please check issue here link and try . [https://issues.apache.org/jira/browse/SPARK-21725] was (Author: zhangxin0112zx): [~dricard] Please say issue here link and try . [https://issues.apache.org/jira/browse/SPARK-21725] > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard >Priority: Major > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at >
[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235127#comment-16235127 ] xinzhang commented on SPARK-21067: -- [~dricard] Please say issue here link and try . [https://issues.apache.org/jira/browse/SPARK-21725] > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard >Priority: Major > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235119#comment-16235119 ] xinzhang commented on SPARK-21725: -- [~mgaido] Finally.I found the pro where is . add the conf to hdfs-site.xml fs.hdfs.impl.disable.cache true reason: spark and hdfs use the same api (at the bottom they use the same instance). When beeline close a filesystem instance . It close the thriftserver's filesystem instance too. Second beeline try to get instance , it will always report "Caused by: java.io.IOException: Filesystem closed" > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235039#comment-16235039 ] xinzhang edited comment on SPARK-21725 at 11/2/17 1:09 AM: --- could u tell me which version hadoop in your env . cdh ? ambari ? mapr ? databricks ? or the pure community hadoop ? was (Author: zhangxin0112zx): could u tell me which version hadoop in your env . cdh ? ambari ? the mapr ? databricks ? or the pure community hadoop ? > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235039#comment-16235039 ] xinzhang commented on SPARK-21725: -- could u tell me which version hadoop in your env . cdh ? ambari ? the mapr ? databricks ? or the pure community hadoop ? > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234149#comment-16234149 ] xinzhang commented on SPARK-21725: -- I can't believe it. I build hadoop 2.8 last night. It still appear .I think the issues here are relevant . [https://issues.apache.org/jira/browse/SPARK-21067] [https://stackoverflow.com/questions/44233523/spark-sql-2-1-1-thrift-server-unable-to-move-source-hdfs-to-target] [https://issues.apache.org/jira/browse/SPARK-11083] My Env is Centos 6.5 Jvm 8 .And to be honest. I still cannot believe u could not reproduce it !! Now we use thriftserver 1.6. It is OK . I tried all 2.x. I am curious what is the different between your env and my env. Would u give me some suggests what should I check in my env ? > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233875#comment-16233875 ] xinzhang edited comment on SPARK-21725 at 11/1/17 11:18 AM: [~mgaido] That is my target package log (+mysql bad) [https://github.com/zhangxin0112/java/blob/zxis/src/target_package_mysql.out] That is my target package log (+derby bad) [https://github.com/zhangxin0112/java/blob/zxis/src/target_package_derby.out] That is my source code log (+mysql bad) [https://github.com/zhangxin0112/java/blob/zxis/src/source_code_mysql.out] That is my source code log (+derby good) [https://github.com/zhangxin0112/java/blob/zxis/src/source_code_derby.out] was (Author: zhangxin0112zx): [~mgaido] That is my target package log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/src/target_package_mysql.out] That is my target package log (+derby) [https://github.com/zhangxin0112/java/blob/zxis/src/target_package_derby.out] That is my source code log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/src/source_code_mysql.out] That is my source code log (+derby) [https://github.com/zhangxin0112/java/blob/zxis/src/source_code_derby.out] > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233875#comment-16233875 ] xinzhang edited comment on SPARK-21725 at 11/1/17 11:17 AM: [~mgaido] That is my target package log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/src/target_package_mysql.out] That is my target package log (+derby) [https://github.com/zhangxin0112/java/blob/zxis/src/target_package_derby.out] That is my source code log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/src/source_code_mysql.out] That is my source code log (+derby) [https://github.com/zhangxin0112/java/blob/zxis/src/source_code_derby.out] was (Author: zhangxin0112zx): [~mgaido] That is my target package log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-node3.out] That is my source code log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/src/source_code_mysql.out] That is my source code log (+derby) [https://github.com/zhangxin0112/java/blob/zxis/src/source_code_derby.out] > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233875#comment-16233875 ] xinzhang edited comment on SPARK-21725 at 11/1/17 11:05 AM: [~mgaido] That is my target package log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-node3.out] That is my source code log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/src/source_code_mysql.out] That is my source code log (+derby) [https://github.com/zhangxin0112/java/blob/zxis/src/source_code_derby.out] was (Author: zhangxin0112zx): [~mgaido] That is my target package log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-node3.out] That is my source code log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/src/2.out] > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233875#comment-16233875 ] xinzhang edited comment on SPARK-21725 at 11/1/17 10:06 AM: [~mgaido] That is my target package log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-node3.out] That is my source code log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/src/2.out] was (Author: zhangxin0112zx): That is my target package log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-node3.out] That is my source code log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/src/2.out] > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233875#comment-16233875 ] xinzhang commented on SPARK-21725: -- That is my target package log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-node3.out] That is my source code log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/src/2.out] > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578 ] xinzhang edited comment on SPARK-21725 at 11/1/17 5:25 AM: --- [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . {color:red}Hi .Do not test it in the spark source code directory !!! Test it with mysql(maybe derby) && Test it with the target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz . target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz + mysql : thrift server bad target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz + derby : thrift server bad spark source code directory + derby : thrift server good spark source code directory + mysql : thrift server bad Under the two conditions , it always appear the pro. Could u test it {color} was (Author: zhangxin0112zx): [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . {color:red}Hi .Do not test it in the spark source code directory !!! Test it with mysql(maybe derby) && Test it with the target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz . target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz + mysql : thrift server bad target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz + derby : thrift server bad spark source code directory + derby : thrift server good Under the two conditions , it always appear the pro. Could u test it {color} > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at >
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578 ] xinzhang edited comment on SPARK-21725 at 11/1/17 3:51 AM: --- [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . {color:red}Hi .Do not test it in the spark source code directory !!! Test it with mysql(maybe derby) && Test it with the target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz . target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz + mysql : thrift server bad target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz + derby : thrift server bad spark source code directory + derby : thrift server good Under the two conditions , it always appear the pro. Could u test it {color} was (Author: zhangxin0112zx): [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . {color:red}Hi .Do not test it in the spark source code directory !!! Test it with mysql(maybe derby) && Test it with the target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz . Under the two conditions , it always appear the pro. Could u test it {color} > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here >
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578 ] xinzhang edited comment on SPARK-21725 at 11/1/17 3:49 AM: --- [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . {color:red}Hi .Do not test it in the spark source code directory !!! Test it with mysql(maybe derby) && Test it with the target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz . Under the two conditions , it always appear the pro. Could u test it {color} was (Author: zhangxin0112zx): [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . {color:red}Hi .Do not test it in the spark source code directory !!! Test it with mysql && Test it with the target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz . Under the two conditions , it always appear the pro. Could u test it {color} > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578 ] xinzhang edited comment on SPARK-21725 at 11/1/17 3:48 AM: --- [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . {color:red}Hi .Do not test it in the spark source code directory !!! Test it with mysql && Test it with the target package spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz . Under the two conditions , it always appear the pro. Could u test it {color} was (Author: zhangxin0112zx): [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . {color:red}Hi .Do not test it in the spark source code directory !!! Test it with mysql && Test it with the target spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz . Under the two conditions , it always appear the pro. Could u test it {color} > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better >
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578 ] xinzhang edited comment on SPARK-21725 at 11/1/17 3:46 AM: --- [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . {color:red}Hi .Do not test it in the spark source code directory !!! Test it with mysql && Test it with the target spark-2.3.0-SNAPSHOT-bin-custom-spark.tgz . Under the two conditions , it always appear the pro. Could u test it {color} was (Author: zhangxin0112zx): [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . {color:red}Hi .It is the metastore !!! I test it with derby .Thriftserver is OK. I change it to mysql . It always appear the pro. Could u test it {color} > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578 ] xinzhang edited comment on SPARK-21725 at 11/1/17 3:16 AM: --- [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . {color:red}Hi .It is the metastore !!! I test it with derby .Thriftserver is OK. I change it to mysql . It always appear the pro. Could u test it {color} was (Author: zhangxin0112zx): [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578 ] xinzhang edited comment on SPARK-21725 at 11/1/17 3:08 AM: --- [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . was (Author: zhangxin0112zx): [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . {color:red}Hi .It is the metastore !!! I test it with derby .Thriftserver is OK. I change it to mysql . It always appear the pro. Could u test it {color} > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578 ] xinzhang edited comment on SPARK-21725 at 11/1/17 2:59 AM: --- [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . {color:red}Hi .It is the metastore !!! I test it with derby .Thriftserver is OK. I change it to mysql . It always appear the pro. Could u test it {color} was (Author: zhangxin0112zx): [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . {color:red}Hi .May be the point is the metastore .I test it with derby .Thriftserver is OK. I change it to mysql . It always appear the pro. {color} > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578 ] xinzhang edited comment on SPARK-21725 at 11/1/17 2:52 AM: --- [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . {color:red}Hi .May be the point is the metastore .I test it with derby .Thriftserver is OK. I change it to mysql . It always appear the pro. {color} was (Author: zhangxin0112zx): [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578 ] xinzhang edited comment on SPARK-21725 at 11/1/17 2:38 AM: --- [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! {color:red}*The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it .*{color} Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . was (Author: zhangxin0112zx): [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it . Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail:
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578 ] xinzhang edited comment on SPARK-21725 at 11/1/17 2:37 AM: --- [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083 or not .Keep metastore do not change.It is not a point) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver !https://user-images.githubusercontent.com/8244097/32257548-af789d42-bef0-11e7-8c04-99137c50fbbf.png! The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it . Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . was (Author: zhangxin0112zx): [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it . Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578 ] xinzhang edited comment on SPARK-21725 at 11/1/17 2:32 AM: --- [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. U could rebuild it . Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . was (Author: zhangxin0112zx): [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578 ] xinzhang edited comment on SPARK-21725 at 11/1/17 2:31 AM: --- [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver The metastore has changed from derby to mysql . My suggest is could u do it with a new env. Without your current exit env. Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . was (Author: zhangxin0112zx): [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver The metastore has changed from derby to mysql . My suggest is could u do it as a new env without your exit env. Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578 ] xinzhang edited comment on SPARK-21725 at 11/1/17 2:30 AM: --- [~mgaido] 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver The metastore has changed from derby to mysql . My suggest is could u do it as a new env without your exit env. Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . was (Author: zhangxin0112zx): 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver The metastore has changed from derby to mysql . My suggest is could u do it as a new env without your exit env. Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578 ] xinzhang commented on SPARK-21725: -- 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver The metastore has changed from derby to mysql . My suggest is could u do it as a new env without your exit env. Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337 ] xinzhang edited comment on SPARK-21725 at 10/31/17 7:09 AM: [~mgaido] [~srowen] Now I try with the master branch. The problem is still here.(Important: hive.default.fileformat Text file is the parameter's default value. If I tried set hive.default.fileformat=Parquet; The problem has gone! {color:red} But still appear with the partition tables . Do not Miss the last pic that is the problem core!!{color}) Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png! 2.download . install . exec spark-sql (spark-master I build it with master the lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png! Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png! 3.use spark-sql thriftserver First time . Spark-sql result: *{color:red}GOOD{color}* Second time .Spark-sql result: *{color:red}BAD{color}* !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png! {color:red}--- ---{color} 1.set hive.default.fileformat=Parquet; 2.create partition table the problem again !https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png! was (Author: zhangxin0112zx): [~mgaido] [~srowen] Now I try with the master branch. The problem is still here.(Important: hive.default.fileformat Text file is the parameter's default value. If I tried set hive.default.fileformat=Parquet; The problem has gone!! {color:red}Do not Miss the last pic that is the problem core!!{color}) Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png! 2.download . install . exec spark-sql (spark-master I build it with master the lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png! Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png! 3.use spark-sql thriftserver First time . Spark-sql result: *{color:red}GOOD{color}* Second time .Spark-sql result: *{color:red}BAD{color}* !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png! {color:red}--- ---{color} 1.set hive.default.fileformat=Parquet; 2.create partition table the problem again !https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png! > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation:
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337 ] xinzhang edited comment on SPARK-21725 at 10/31/17 7:07 AM: [~mgaido] [~srowen] Now I try with the master branch. The problem is still here.(Important: hive.default.fileformat Text file is the parameter's default value. If I tried set hive.default.fileformat=Parquet; The problem has gone!! {color:red}Do not Miss the last pic that is the problem core!!{color}) Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png! 2.download . install . exec spark-sql (spark-master I build it with master the lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png! Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png! 3.use spark-sql thriftserver First time . Spark-sql result: *{color:red}GOOD{color}* Second time .Spark-sql result: *{color:red}BAD{color}* !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png! {color:red}--- ---{color} 1.set hive.default.fileformat=Parquet; 2.create partition table the problem again !https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png! was (Author: zhangxin0112zx): [~mgaido] [~srowen] Now I try with the master branch. The problem is still here.(Important: hive.default.fileformat Text file is the parameter's default value. If I tried set hive.default.fileformat=Parquet; The problem has gone!! {color:red}Do not Miss the last pic that is the problem core!!{color}) Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png! 2.download . install . exec spark-sql (spark-master I build it with master the lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png! Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png! 3.use spark-sql thriftserver First time . Spark-sql result: GOOD Second time .Spark-sql result: BAD !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png! --- 1.set hive.default.fileformat=Parquet; 2.create partition table the problem again !https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png! > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337 ] xinzhang edited comment on SPARK-21725 at 10/31/17 7:07 AM: [~mgaido] [~srowen] Now I try with the master branch. The problem is still here.(Important: hive.default.fileformat Text file is the parameter's default value. If I tried set hive.default.fileformat=Parquet; The problem has gone!! {color:red}Do not Miss the last pic that is the problem core!!{color}) Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png! 2.download . install . exec spark-sql (spark-master I build it with master the lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png! Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png! 3.use spark-sql thriftserver First time . Spark-sql result: *{color:red}GOOD{color}* Second time .Spark-sql result: *{color:red}BAD{color}* !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png! {color:red}--- ---{color} 1.set hive.default.fileformat=Parquet; 2.create partition table the problem again !https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png! was (Author: zhangxin0112zx): [~mgaido] [~srowen] Now I try with the master branch. The problem is still here.(Important: hive.default.fileformat Text file is the parameter's default value. If I tried set hive.default.fileformat=Parquet; The problem has gone!! {color:red}Do not Miss the last pic that is the problem core!!{color}) Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png! 2.download . install . exec spark-sql (spark-master I build it with master the lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png! Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png! 3.use spark-sql thriftserver First time . Spark-sql result: *{color:red}GOOD{color}* Second time .Spark-sql result: *{color:red}BAD{color}* !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png! {color:red}--- ---{color} 1.set hive.default.fileformat=Parquet; 2.create partition table the problem again !https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png! > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by:
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337 ] xinzhang edited comment on SPARK-21725 at 10/31/17 7:03 AM: [~mgaido] [~srowen] Now I try with the master branch. The problem is still here.(Important: hive.default.fileformat Text file is the parameter's default value. If I tried set hive.default.fileformat=Parquet; The problem has gone!! {color:red}Do not Miss the last pic that is the problem core!!{color}) Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png! 2.download . install . exec spark-sql (spark-master I build it with master the lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png! Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png! 3.use spark-sql thriftserver First time . Spark-sql result: GOOD Second time .Spark-sql result: BAD !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png! --- 1.set hive.default.fileformat=Parquet; 2.create partition table the problem again !https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png! was (Author: zhangxin0112zx): [~mgaido] [~srowen] Now I try with the master branch. The problem is still here.(Important: hive.default.fileformat Text file is the parameter's default value. If I tried set hive.default.fileformat=Parquet; The problem has gone!!) Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png! 2.download . install . exec spark-sql (spark-master I build it with master the lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png! Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png! 3.use spark-sql thriftserver First time . Spark-sql result: GOOD Second time .Spark-sql result: BAD !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png! --- 1.set hive.default.fileformat=Parquet; 2.create partition table the problem again !https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png! > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337 ] xinzhang edited comment on SPARK-21725 at 10/31/17 7:01 AM: [~mgaido] [~srowen] Now I try with the master branch. The problem is still here.(Important: hive.default.fileformat Text file is the parameter's default value. If I tried set hive.default.fileformat=Parquet; The problem has gone!!) Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png! 2.download . install . exec spark-sql (spark-master I build it with master the lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png! Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png! 3.use spark-sql thriftserver First time . Spark-sql result: GOOD Second time .Spark-sql result: BAD !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png! --- 1.set hive.default.fileformat=Parquet; 2.create partition table the problem again !https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png! was (Author: zhangxin0112zx): [~mgaido] [~srowen] Now I try with the master branch. The problem is still here.(Important: hive.default.fileformat Text file is the parameter's default value. If I tried set hive.default.fileformat=Parquet; The problem has gone!!) Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png! 2.download . install . exec spark-sql (spark-master I build it with master the lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png! Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png! 3.use spark-sql thriftserver First time . Spark-sql result: GOOD Second time .Spark-sql result: BAD !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png! > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > >
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337 ] xinzhang edited comment on SPARK-21725 at 10/31/17 6:55 AM: [~mgaido] [~srowen] Now I try with the master branch. The problem is still here.(Important: hive.default.fileformat Text file is the parameter's default value. If I tried set hive.default.fileformat=Parquet; The problem has gone!!) Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png! 2.download . install . exec spark-sql (spark-master I build it with master the lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png! Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png! 3.use spark-sql thriftserver First time . Spark-sql result: GOOD Second time .Spark-sql result: BAD !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png! was (Author: zhangxin0112zx): [~mgaido] [~srowen] Now I try with the master branch. The problem is still here. Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png! 2.download . install . exec spark-sql (spark-master I build it with master the lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png! Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png! 3.use spark-sql thriftserver First time . Spark-sql result: GOOD Second time .Spark-sql result: BAD !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png! > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet
[jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337 ] xinzhang edited comment on SPARK-21725 at 10/31/17 6:43 AM: [~mgaido] [~srowen] Now I try with the master branch. The problem is still here. Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png! 2.download . install . exec spark-sql (spark-master I build it with master the lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png! Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png! 3.use spark-sql thriftserver First time . Spark-sql result: GOOD Second time .Spark-sql result: BAD !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png! was (Author: zhangxin0112zx): Now I try with the master branch. The problem is still here. Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png! 2.download . install . exec spark-sql (spark-master I build it with master the lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png! Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png! 3.use spark-sql thriftserver First time . Spark-sql result: GOOD Second time .Spark-sql result: BAD !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png! > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337 ] xinzhang commented on SPARK-21725: -- Now I try with the master branch. The problem is still here. Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png! 2.download . install . exec spark-sql (spark-master I build it with master the lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png! Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png! 3.use spark-sql thriftserver First time . Spark-sql result: GOOD Second time .Spark-sql result: BAD !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png! > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220239#comment-16220239 ] xinzhang commented on SPARK-21725: -- I tried the spark(version-master) at 21/Aug2017, it still appear the problem . I will try it again now. I will replay u the result what I get . Thanks for your replay. [~mgaido] [~srowen] > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220216#comment-16220216 ] xinzhang commented on SPARK-21725: -- I download spark 2.1.2 .The problem still appear . Could u give me any suggests to avoid the problem . [~mgaido] > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22244) sparksql successed on yarn but only successed some pieces of all jobs
[ https://issues.apache.org/jira/browse/SPARK-22244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinzhang resolved SPARK-22244. -- Resolution: Not A Problem It caused by the client session closed > sparksql successed on yarn but only successed some pieces of all jobs > - > > Key: SPARK-22244 > URL: https://issues.apache.org/jira/browse/SPARK-22244 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.1.0 >Reporter: xinzhang > > Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` > --jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f > /opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> > /data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1 > Describe: > It's very weird. Some pics show the strange phenomenon。 > On yarn , the application's status show SUCCEEDED : > !https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png! > *{color:red}On Spark History Web, the application has moved into it. But in > fact it did not comple all the jobs . The active jobs should be compled. The > detail shows : > {color}* > !https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png! > !https://user-images.githubusercontent.com/8244097/31428411-33c80056-ae9d-11e7-9a7e-35169d472a86.png! > the log stopped : > !https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png! > *{color:red}what's the bug? how should i track the pro? any suggests will > helpful.{color}* -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22244) sparksql successed on yarn but only successed some pieces of all jobs
[ https://issues.apache.org/jira/browse/SPARK-22244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinzhang updated SPARK-22244: - Description: Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` --jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f /opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> /data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1 Describe: It's very weird. Some pics show the strange phenomenon。 On yarn , the application's status show SUCCEEDED : !https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png! *{color:red}On Spark History Web, the application has moved into it. But in fact it did not comple all the jobs . The active jobs should be compled. The detail shows : {color}* !https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png! !https://user-images.githubusercontent.com/8244097/31428411-33c80056-ae9d-11e7-9a7e-35169d472a86.png! the log stopped : !https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png! *{color:red}what's the bug? how should i track the pro? any suggests will helpful.{color}* was: Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` --jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f /opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> /data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1 Describe: It's very weird. Some pics show the strange phenomenon。 On yarn , the application's status show SUCCEEDED : !https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png! *{color:red}On Spark History Web, the application has moved into it. But in fact it did not comple all the jobs . The active jobs should be compled. The detail shows : {color}* !https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png! !https://user-images.githubusercontent.com/8244097/31427805-2a03f978-ae9b-11e7-8d87-d5a19d9a2eb0.jpg! the log stopped : !https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png! *{color:red}what's the bug? how should i track the pro? any suggests will helpful.{color}* > sparksql successed on yarn but only successed some pieces of all jobs > - > > Key: SPARK-22244 > URL: https://issues.apache.org/jira/browse/SPARK-22244 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.1.0 >Reporter: xinzhang > > Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` > --jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f > /opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> > /data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1 > Describe: > It's very weird. Some pics show the strange phenomenon。 > On yarn , the application's status show SUCCEEDED : > !https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png! > *{color:red}On Spark History Web, the application has moved into it. But in > fact it did not comple all the jobs . The active jobs should be compled. The > detail shows : > {color}* > !https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png! > !https://user-images.githubusercontent.com/8244097/31428411-33c80056-ae9d-11e7-9a7e-35169d472a86.png! > the log stopped : > !https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png! > *{color:red}what's the bug? how should i track the pro? any suggests will > helpful.{color}* -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22244) sparksql successed on yarn but only successed some pieces of all jobs
[ https://issues.apache.org/jira/browse/SPARK-22244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinzhang updated SPARK-22244: - Description: Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` --jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f /opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> /data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1 Describe: It's very weird. Some pics show the strange phenomenon。 On yarn , the application's status show SUCCEEDED : !https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png! *{color:red}On Spark History Web, the application has moved into it. But in fact it did not comple all the jobs . The active jobs should be compled. The detail shows : {color}* !https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png! !https://user-images.githubusercontent.com/8244097/31428411-33c80056-ae9d-11e7-9a7e-35169d472a86.png! the log stopped : !https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png! *{color:red}what's the bug? how should i track the pro? any suggests will helpful.{color}* was: Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` --jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f /opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> /data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1 Describe: It's very weird. Some pics show the strange phenomenon。 On yarn , the application's status show SUCCEEDED : !https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png! *{color:red}On Spark History Web, the application has moved into it. But in fact it did not comple all the jobs . The active jobs should be compled. The detail shows : {color}* !https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png! !https://user-images.githubusercontent.com/8244097/31428411-33c80056-ae9d-11e7-9a7e-35169d472a86.png! the log stopped : !https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png! *{color:red}what's the bug? how should i track the pro? any suggests will helpful.{color}* > sparksql successed on yarn but only successed some pieces of all jobs > - > > Key: SPARK-22244 > URL: https://issues.apache.org/jira/browse/SPARK-22244 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.1.0 >Reporter: xinzhang > > Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` > --jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f > /opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> > /data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1 > Describe: > It's very weird. Some pics show the strange phenomenon。 > On yarn , the application's status show SUCCEEDED : > !https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png! > *{color:red}On Spark History Web, the application has moved into it. But in > fact it did not comple all the jobs . The active jobs should be compled. The > detail shows : > {color}* > !https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png! > !https://user-images.githubusercontent.com/8244097/31428411-33c80056-ae9d-11e7-9a7e-35169d472a86.png! > the log stopped : > !https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png! > *{color:red}what's the bug? how should i track the pro? any suggests will > helpful.{color}* -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22244) sparksql successed on yarn but only successed some pieces of all jobs
[ https://issues.apache.org/jira/browse/SPARK-22244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinzhang updated SPARK-22244: - Description: Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` --jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f /opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> /data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1 Describe: It's very weird. Some pics show the strange phenomenon。 On yarn , the application's status show SUCCEEDED : !https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png! *{color:red}On Spark History Web, the application has moved into it. But in fact it did not comple all the jobs . The active jobs should be compled. The detail shows : {color}* !https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png! !https://user-images.githubusercontent.com/8244097/31427805-2a03f978-ae9b-11e7-8d87-d5a19d9a2eb0.jpg! the log stopped : !https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png! *{color:red}what's the bug? how should i track the pro? any suggests will helpful.{color}* was: Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` --jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f /opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> /data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1 Describe: It's very weird. Some pics show the strange phenomenon。 On yarn , the application's status show SUCCEEDED : !https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png! On Spark History Web, the application has moved into it. The detail shows : !https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png! !https://user-images.githubusercontent.com/8244097/31427805-2a03f978-ae9b-11e7-8d87-d5a19d9a2eb0.jpg! the log stopped : !https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png! what's the bug? how should i track the pro? any suggests will helpful. > sparksql successed on yarn but only successed some pieces of all jobs > - > > Key: SPARK-22244 > URL: https://issues.apache.org/jira/browse/SPARK-22244 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.1.0 >Reporter: xinzhang > > Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` > --jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f > /opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> > /data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1 > Describe: > It's very weird. Some pics show the strange phenomenon。 > On yarn , the application's status show SUCCEEDED : > !https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png! > *{color:red}On Spark History Web, the application has moved into it. But in > fact it did not comple all the jobs . The active jobs should be compled. The > detail shows : > {color}* > !https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png! > !https://user-images.githubusercontent.com/8244097/31427805-2a03f978-ae9b-11e7-8d87-d5a19d9a2eb0.jpg! > the log stopped : > !https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png! > *{color:red}what's the bug? how should i track the pro? any suggests will > helpful.{color}* -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22244) sparksql successed on yarn but only successed some pieces of all jobs
xinzhang created SPARK-22244: Summary: sparksql successed on yarn but only successed some pieces of all jobs Key: SPARK-22244 URL: https://issues.apache.org/jira/browse/SPARK-22244 Project: Spark Issue Type: Bug Components: Spark Core, Spark Shell Affects Versions: 2.1.0 Reporter: xinzhang Shell : /opt/spark/spark-bin/bin/spark-sql --master yarn --queue `id -g -n` --jars /opt/spark/spark-bin/jars/hive-udf-sw.jar -f /opt/app/scheduler-tomcat/temp/10945011_spark_dwd_user_url_detail_d5.sql >> /data1/tools/logs/etl_log/2017-10-11/10945011.log 2>&1 Describe: It's very weird. Some pics show the strange phenomenon。 On yarn , the application's status show SUCCEEDED : !https://user-images.githubusercontent.com/8244097/31427768-0505f3b0-ae9b-11e7-9a52-557b6259e030.png! On Spark History Web, the application has moved into it. The detail shows : !https://user-images.githubusercontent.com/8244097/31427786-1a2c6f6c-ae9b-11e7-8560-555d81271d8b.png! !https://user-images.githubusercontent.com/8244097/31427805-2a03f978-ae9b-11e7-8d87-d5a19d9a2eb0.jpg! the log stopped : !https://user-images.githubusercontent.com/8244097/31428025-f319b6f4-ae9b-11e7-8780-e88e75bca14f.png! what's the bug? how should i track the pro? any suggests will helpful. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinzhang updated SPARK-21067: - Comment: was deleted (was: I try to solve it by coding the source code by myself. It is too complex to me. Hope the community or anyone could give a hand and fix it. [~rxin] ) > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at
[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169797#comment-16169797 ] xinzhang commented on SPARK-21067: -- i try to solve it by coding the source code by myself. It is too complex to me. Hope the community or anyone could give a hand and fix it. [~rxin] > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at
[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169797#comment-16169797 ] xinzhang edited comment on SPARK-21067 at 9/18/17 9:35 AM: --- I try to solve it by coding the source code by myself. It is too complex to me. Hope the community or anyone could give a hand and fix it. [~rxin] was (Author: zhangxin0112zx): i try to solve it by coding the source code by myself. It is too complex to me. Hope the community or anyone could give a hand and fix it. [~rxin] > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at >
[jira] [Resolved] (SPARK-22007) spark-submit on yarn or local , got different result
[ https://issues.apache.org/jira/browse/SPARK-22007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinzhang resolved SPARK-22007. -- Resolution: Won't Fix > spark-submit on yarn or local , got different result > > > Key: SPARK-22007 > URL: https://issues.apache.org/jira/browse/SPARK-22007 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell, Spark Submit >Affects Versions: 2.1.0 >Reporter: xinzhang > > submit the py script on local. > /opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py > result: > ++ > |databaseName| > ++ > | default| > | | > | x| > ++ > submit the py script on yarn. > /opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster > test_hive.py > result: > ++ > |databaseName| > ++ > | default| > ++ > the py script : > [yangtt@dc-gateway119 test]$ cat test_hive.py > #!/usr/bin/env python > #coding=utf-8 > from os.path import expanduser, join, abspath > from pyspark.sql import SparkSession > from pyspark.sql import Row > from pyspark.conf import SparkConf > def squared(s): > return s * s > warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table') > spark = SparkSession \ > .builder \ > .appName("Python_Spark_SQL_Hive") \ > .config("spark.sql.warehouse.dir", warehouse_location) \ > .config(conf=SparkConf()) \ > .enableHiveSupport() \ > .getOrCreate() > spark.udf.register("squared",squared) > spark.sql("show databases").show() > Q:why the spark load the different hive metastore > the yarn always use the DERBY? > 17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is > DERBY > my current metastore is in mysql. > any suggest will be helpful. > thanks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22007) spark-submit on yarn or local , got different result
[ https://issues.apache.org/jira/browse/SPARK-22007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165972#comment-16165972 ] xinzhang commented on SPARK-22007: -- ye .i figure it out. add this with instance sparkSession .config("hive.metastore.uris", "thrift://11.11.11.11:9083") \ maybe the web here should describe more detail. https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder > spark-submit on yarn or local , got different result > > > Key: SPARK-22007 > URL: https://issues.apache.org/jira/browse/SPARK-22007 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell, Spark Submit >Affects Versions: 2.1.0 >Reporter: xinzhang > > submit the py script on local. > /opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py > result: > ++ > |databaseName| > ++ > | default| > | | > | x| > ++ > submit the py script on yarn. > /opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster > test_hive.py > result: > ++ > |databaseName| > ++ > | default| > ++ > the py script : > [yangtt@dc-gateway119 test]$ cat test_hive.py > #!/usr/bin/env python > #coding=utf-8 > from os.path import expanduser, join, abspath > from pyspark.sql import SparkSession > from pyspark.sql import Row > from pyspark.conf import SparkConf > def squared(s): > return s * s > warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table') > spark = SparkSession \ > .builder \ > .appName("Python_Spark_SQL_Hive") \ > .config("spark.sql.warehouse.dir", warehouse_location) \ > .config(conf=SparkConf()) \ > .enableHiveSupport() \ > .getOrCreate() > spark.udf.register("squared",squared) > spark.sql("show databases").show() > Q:why the spark load the different hive metastore > the yarn always use the DERBY? > 17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is > DERBY > my current metastore is in mysql. > any suggest will be helpful. > thanks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22007) spark-submit on yarn or local , got different result
[ https://issues.apache.org/jira/browse/SPARK-22007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinzhang updated SPARK-22007: - Description: submit the py script on local. /opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py result: ++ |databaseName| ++ | default| | | | x| ++ submit the py script on yarn. /opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster test_hive.py result: ++ |databaseName| ++ | default| ++ the py script : [yangtt@dc-gateway119 test]$ cat test_hive.py #!/usr/bin/env python #coding=utf-8 from os.path import expanduser, join, abspath from pyspark.sql import SparkSession from pyspark.sql import Row from pyspark.conf import SparkConf def squared(s): return s * s warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table') spark = SparkSession \ .builder \ .appName("Python_Spark_SQL_Hive") \ .config("spark.sql.warehouse.dir", warehouse_location) \ .config(conf=SparkConf()) \ .enableHiveSupport() \ .getOrCreate() spark.udf.register("squared",squared) spark.sql("show databases").show() Q:why the spark load the different hive metastore the yarn always use the DERBY? 17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY my current metastore is in mysql. any suggest will be helpful. thanks. was: submit the py script on local. /opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py result: ++ |databaseName| ++ | default| | | | x| ++ submit the py script on yarn. /opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster test_hive.py result: ++ |databaseName| ++ | default| ++ the py script : [yangtt@dc-gateway119 test]$ cat test_hive.py #!/usr/bin/env python #coding=utf-8 from os.path import expanduser, join, abspath from pyspark.sql import SparkSession from pyspark.sql import Row from pyspark.conf import SparkConf def squared(s): return s * s # warehouse_location points to the default location for managed databases and tables warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table') spark = SparkSession \ .builder \ .appName("Python_Spark_SQL_Hive") \ .config("spark.sql.warehouse.dir", warehouse_location) \ .config(conf=SparkConf()) \ .enableHiveSupport() \ .getOrCreate() spark.udf.register("squared",squared) spark.sql("show databases").show() Q:why the spark load the different hive metastore the yarn always use the DERBY? 17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY my current metastore is in mysql. any suggest will be helpful. thanks. > spark-submit on yarn or local , got different result > > > Key: SPARK-22007 > URL: https://issues.apache.org/jira/browse/SPARK-22007 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell, Spark Submit >Affects Versions: 2.1.0 >Reporter: xinzhang > > submit the py script on local. > /opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py > result: > ++ > |databaseName| > ++ > | default| > | | > | x| > ++ > submit the py script on yarn. > /opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster > test_hive.py > result: > ++ > |databaseName| > ++ > | default| > ++ > the py script : > [yangtt@dc-gateway119 test]$ cat test_hive.py > #!/usr/bin/env python > #coding=utf-8 > from os.path import expanduser, join, abspath > from pyspark.sql import SparkSession > from pyspark.sql import Row > from pyspark.conf import SparkConf > def squared(s): > return s * s > warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table') > spark = SparkSession \ > .builder \ > .appName("Python_Spark_SQL_Hive") \ > .config("spark.sql.warehouse.dir", warehouse_location) \ > .config(conf=SparkConf()) \ > .enableHiveSupport() \ > .getOrCreate() > spark.udf.register("squared",squared) > spark.sql("show databases").show() > Q:why the spark load the different hive metastore > the yarn always use the DERBY? > 17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is > DERBY > my current metastore is in mysql. > any suggest will be helpful. > thanks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22007) spark-submit on yarn or local , got different result
xinzhang created SPARK-22007: Summary: spark-submit on yarn or local , got different result Key: SPARK-22007 URL: https://issues.apache.org/jira/browse/SPARK-22007 Project: Spark Issue Type: Bug Components: Spark Core, Spark Shell, Spark Submit Affects Versions: 2.1.0 Reporter: xinzhang submit the py script on local. /opt/spark/spark-bin/bin/spark-submit --master local cluster test_hive.py result: ++ |databaseName| ++ | default| | | | x| ++ submit the py script on yarn. /opt/spark/spark-bin/bin/spark-submit --master yarn --deploy-mode cluster test_hive.py result: ++ |databaseName| ++ | default| ++ the py script : [yangtt@dc-gateway119 test]$ cat test_hive.py #!/usr/bin/env python #coding=utf-8 from os.path import expanduser, join, abspath from pyspark.sql import SparkSession from pyspark.sql import Row from pyspark.conf import SparkConf def squared(s): return s * s # warehouse_location points to the default location for managed databases and tables warehouse_location = abspath('/group/user/yangtt/meta/hive-temp-table') spark = SparkSession \ .builder \ .appName("Python_Spark_SQL_Hive") \ .config("spark.sql.warehouse.dir", warehouse_location) \ .config(conf=SparkConf()) \ .enableHiveSupport() \ .getOrCreate() spark.udf.register("squared",squared) spark.sql("show databases").show() Q:why the spark load the different hive metastore the yarn always use the DERBY? 17/09/14 16:10:55 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY my current metastore is in mysql. any suggest will be helpful. thanks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160550#comment-16160550 ] xinzhang edited comment on SPARK-21067 at 9/11/17 2:02 AM: --- [~dricard] Thanks for your reply. So do we . Use the parquet . But another pro is when u use sql like "insert overwrite table a partition(pt='2') select" . It will also cause the thriftserver fail . Do you happen to have the same problem? Only happend with the table which use partitions . this all right when use parquet without partition. "insert overwrite table a select" was (Author: zhangxin0112zx): [~dricard] Thanks for your reply. So do we . Use the parquet . But another pro is when u use sql like "insert overwrite table a partition(pt='2') select" . It will also cause the thriftserver fail . Do you happen to have the same problem? > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at >
[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160550#comment-16160550 ] xinzhang commented on SPARK-21067: -- [~dricard] Thanks for your reply. So do we . Use the parquet . But another pro is when u use sql like "insert overwrite table a partition(pt='2') select" . It will also cause the thriftserver fail . Do you happen to have the same problem? > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at >
[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158192#comment-16158192 ] xinzhang commented on SPARK-21067: -- hi [~dricard] do u have any solutions now? any suggests will helpful. > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) > at
[jira] [Comment Edited] (SPARK-21814) build spark current master can not use hive metadatamysql
[ https://issues.apache.org/jira/browse/SPARK-21814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138075#comment-16138075 ] xinzhang edited comment on SPARK-21814 at 8/23/17 3:32 PM: --- Thanks your reply. (I will del this one hour later may be later) was (Author: zhangxin0112zx): Thanks your reply. (I will del this one hour later) > build spark current master can not use hive metadatamysql > - > > Key: SPARK-21814 > URL: https://issues.apache.org/jira/browse/SPARK-21814 > Project: Spark > Issue Type: Question > Components: Build, SQL >Affects Versions: 2.2.0 >Reporter: xinzhang > > Hi. I builded spark(master) source code by myself and it was successful. > Useed the cmd : > ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr > -Phive -Phive-thriftserver -Pyarn > But when I used the 'spark-sql' for connnecting the metadata(I put my Hive's > conf hive-site.xml into the $SPARK_HOME/conf/ ) . It seems do not worked.It > always connected use derby(My hive-site.xml use MySQL as metadata db). > I could not judge the problem's reason. > Is my build cmd right? If not.Which cmd should I use for build the project by > myself. > Any suggestes will be helpful. > the spark source code's last commit is : > [root@node3 spark]# git log > commit be72b157ea13ea116c5178a9e41e37ae24090f72 > Author: gatorsmile> Date: Tue Aug 22 17:54:39 2017 +0800 > [SPARK-21803][TEST] Remove the HiveDDLCommandSuite > > ## What changes were proposed in this pull request? > We do not have any Hive-specific parser. It does not make sense to keep a > parser-specific test suite `HiveDDLCommandSuite.scala` in the Hive package. > This PR is to > > ## How was this patch tested? > N/A > > Author: gatorsmile > > Closes #19015 from gatorsmile/combineDDL. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21814) build spark current master can not use hive metadatamysql
[ https://issues.apache.org/jira/browse/SPARK-21814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138088#comment-16138088 ] xinzhang commented on SPARK-21814: -- BTW My problem:Why spark-sql always connect the metadata derby db.I had put hive-site.xml into the conf as usual.When use the tar which download from official (2.2.0/2.1.0) the hive-site.xml always worked. > build spark current master can not use hive metadatamysql > - > > Key: SPARK-21814 > URL: https://issues.apache.org/jira/browse/SPARK-21814 > Project: Spark > Issue Type: Question > Components: Build, SQL >Affects Versions: 2.2.0 >Reporter: xinzhang > > Hi. I builded spark(master) source code by myself and it was successful. > Useed the cmd : > ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr > -Phive -Phive-thriftserver -Pyarn > But when I used the 'spark-sql' for connnecting the metadata(I put my Hive's > conf hive-site.xml into the $SPARK_HOME/conf/ ) . It seems do not worked.It > always connected use derby(My hive-site.xml use MySQL as metadata db). > I could not judge the problem's reason. > Is my build cmd right? If not.Which cmd should I use for build the project by > myself. > Any suggestes will be helpful. > the spark source code's last commit is : > [root@node3 spark]# git log > commit be72b157ea13ea116c5178a9e41e37ae24090f72 > Author: gatorsmile> Date: Tue Aug 22 17:54:39 2017 +0800 > [SPARK-21803][TEST] Remove the HiveDDLCommandSuite > > ## What changes were proposed in this pull request? > We do not have any Hive-specific parser. It does not make sense to keep a > parser-specific test suite `HiveDDLCommandSuite.scala` in the Hive package. > This PR is to > > ## How was this patch tested? > N/A > > Author: gatorsmile > > Closes #19015 from gatorsmile/combineDDL. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21814) build spark current master can not use hive metadatamysql
[ https://issues.apache.org/jira/browse/SPARK-21814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138075#comment-16138075 ] xinzhang commented on SPARK-21814: -- Thanks your reply. (I will del this one hour later) > build spark current master can not use hive metadatamysql > - > > Key: SPARK-21814 > URL: https://issues.apache.org/jira/browse/SPARK-21814 > Project: Spark > Issue Type: Question > Components: Build, SQL >Affects Versions: 2.2.0 >Reporter: xinzhang > > Hi. I builded spark(master) source code by myself and it was successful. > Useed the cmd : > ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr > -Phive -Phive-thriftserver -Pyarn > But when I used the 'spark-sql' for connnecting the metadata(I put my Hive's > conf hive-site.xml into the $SPARK_HOME/conf/ ) . It seems do not worked.It > always connected use derby(My hive-site.xml use MySQL as metadata db). > I could not judge the problem's reason. > Is my build cmd right? If not.Which cmd should I use for build the project by > myself. > Any suggestes will be helpful. > the spark source code's last commit is : > [root@node3 spark]# git log > commit be72b157ea13ea116c5178a9e41e37ae24090f72 > Author: gatorsmile> Date: Tue Aug 22 17:54:39 2017 +0800 > [SPARK-21803][TEST] Remove the HiveDDLCommandSuite > > ## What changes were proposed in this pull request? > We do not have any Hive-specific parser. It does not make sense to keep a > parser-specific test suite `HiveDDLCommandSuite.scala` in the Hive package. > This PR is to > > ## How was this patch tested? > N/A > > Author: gatorsmile > > Closes #19015 from gatorsmile/combineDDL. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21814) build spark current master can not use hive metadatamysql
xinzhang created SPARK-21814: Summary: build spark current master can not use hive metadatamysql Key: SPARK-21814 URL: https://issues.apache.org/jira/browse/SPARK-21814 Project: Spark Issue Type: Question Components: Build, SQL Affects Versions: 2.2.0 Reporter: xinzhang Hi. I builded spark(master) source code by myself and it was successful. Useed the cmd : ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pyarn But when I used the 'spark-sql' for connnecting the metadata(I put my Hive's conf hive-site.xml into the $SPARK_HOME/conf/ ) . It seems do not worked.It always connected use derby(My hive-site.xml use MySQL as metadata db). I could not judge the problem's reason. Is my build cmd right? If not.Which cmd should I use for build the project by myself. Any suggestes will be helpful. the last commit is : [root@node3 spark]# git log commit be72b157ea13ea116c5178a9e41e37ae24090f72 Author: gatorsmileDate: Tue Aug 22 17:54:39 2017 +0800 [SPARK-21803][TEST] Remove the HiveDDLCommandSuite ## What changes were proposed in this pull request? We do not have any Hive-specific parser. It does not make sense to keep a parser-specific test suite `HiveDDLCommandSuite.scala` in the Hive package. This PR is to ## How was this patch tested? N/A Author: gatorsmile Closes #19015 from gatorsmile/combineDDL. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21814) build spark current master can not use hive metadatamysql
[ https://issues.apache.org/jira/browse/SPARK-21814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinzhang updated SPARK-21814: - Description: Hi. I builded spark(master) source code by myself and it was successful. Useed the cmd : ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pyarn But when I used the 'spark-sql' for connnecting the metadata(I put my Hive's conf hive-site.xml into the $SPARK_HOME/conf/ ) . It seems do not worked.It always connected use derby(My hive-site.xml use MySQL as metadata db). I could not judge the problem's reason. Is my build cmd right? If not.Which cmd should I use for build the project by myself. Any suggestes will be helpful. the spark source code's last commit is : [root@node3 spark]# git log commit be72b157ea13ea116c5178a9e41e37ae24090f72 Author: gatorsmileDate: Tue Aug 22 17:54:39 2017 +0800 [SPARK-21803][TEST] Remove the HiveDDLCommandSuite ## What changes were proposed in this pull request? We do not have any Hive-specific parser. It does not make sense to keep a parser-specific test suite `HiveDDLCommandSuite.scala` in the Hive package. This PR is to ## How was this patch tested? N/A Author: gatorsmile Closes #19015 from gatorsmile/combineDDL. was: Hi. I builded spark(master) source code by myself and it was successful. Useed the cmd : ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pyarn But when I used the 'spark-sql' for connnecting the metadata(I put my Hive's conf hive-site.xml into the $SPARK_HOME/conf/ ) . It seems do not worked.It always connected use derby(My hive-site.xml use MySQL as metadata db). I could not judge the problem's reason. Is my build cmd right? If not.Which cmd should I use for build the project by myself. Any suggestes will be helpful. the last commit is : [root@node3 spark]# git log commit be72b157ea13ea116c5178a9e41e37ae24090f72 Author: gatorsmile Date: Tue Aug 22 17:54:39 2017 +0800 [SPARK-21803][TEST] Remove the HiveDDLCommandSuite ## What changes were proposed in this pull request? We do not have any Hive-specific parser. It does not make sense to keep a parser-specific test suite `HiveDDLCommandSuite.scala` in the Hive package. This PR is to ## How was this patch tested? N/A Author: gatorsmile Closes #19015 from gatorsmile/combineDDL. > build spark current master can not use hive metadatamysql > - > > Key: SPARK-21814 > URL: https://issues.apache.org/jira/browse/SPARK-21814 > Project: Spark > Issue Type: Question > Components: Build, SQL >Affects Versions: 2.2.0 >Reporter: xinzhang > > Hi. I builded spark(master) source code by myself and it was successful. > Useed the cmd : > ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr > -Phive -Phive-thriftserver -Pyarn > But when I used the 'spark-sql' for connnecting the metadata(I put my Hive's > conf hive-site.xml into the $SPARK_HOME/conf/ ) . It seems do not worked.It > always connected use derby(My hive-site.xml use MySQL as metadata db). > I could not judge the problem's reason. > Is my build cmd right? If not.Which cmd should I use for build the project by > myself. > Any suggestes will be helpful. > the spark source code's last commit is : > [root@node3 spark]# git log > commit be72b157ea13ea116c5178a9e41e37ae24090f72 > Author: gatorsmile > Date: Tue Aug 22 17:54:39 2017 +0800 > [SPARK-21803][TEST] Remove the HiveDDLCommandSuite > > ## What changes were proposed in this pull request? > We do not have any Hive-specific parser. It does not make sense to keep a > parser-specific test suite `HiveDDLCommandSuite.scala` in the Hive package. > This PR is to > > ## How was this patch tested? > N/A > > Author: gatorsmile > > Closes #19015 from gatorsmile/combineDDL. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134642#comment-16134642 ] xinzhang commented on SPARK-21725: -- Ok. I will retry the version of current master. > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4131) Support "Writing data into the filesystem from queries"
[ https://issues.apache.org/jira/browse/SPARK-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126723#comment-16126723 ] xinzhang commented on SPARK-4131: - any progress here? > Support "Writing data into the filesystem from queries" > --- > > Key: SPARK-4131 > URL: https://issues.apache.org/jira/browse/SPARK-4131 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.1.0 >Reporter: XiaoJing wang >Assignee: Fei Wang >Priority: Critical > Original Estimate: 0.05h > Remaining Estimate: 0.05h > > Writing data into the filesystem from queries,SparkSql is not support . > eg: > {code}insert overwrite LOCAL DIRECTORY '/data1/wangxj/sql_spark' select * > from page_views; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108303#comment-16108303 ] xinzhang edited comment on SPARK-21067 at 8/15/17 1:09 AM: --- hi . I use the parquet to avoid the issuse about create table as. It appear in insert overwrite table (partition). I could not find any ways to avoid this issuse ?Any suggests will be great helpful. https://issues.apache.org/jira/browse/SPARK-21725 was (Author: zhangxin0112zx): hi . I use the parquet to avoid the issuse about create table as. It appear in insert overwrite table (partition). I could not find any ways to avoid this issuse ?Any suggests will great helpful. https://issues.apache.org/jira/browse/SPARK-21725 > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at >
[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108303#comment-16108303 ] xinzhang edited comment on SPARK-21067 at 8/15/17 1:07 AM: --- hi . I use the parquet to avoid the issuse about create table as. It appear in insert overwrite table (partition). I could not find any ways to avoid this issuse ?Any suggests will great helpful. https://issues.apache.org/jira/browse/SPARK-21725 was (Author: zhangxin0112zx): hi srowen.could u consider about this .give some suggest.[~srowen] > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at >
[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119310#comment-16119310 ] xinzhang edited comment on SPARK-21067 at 8/15/17 1:04 AM: --- hi [~smilegator] Can you push this BUG repair? In my consideration, this is a very big obstacle for us when we go to popularization and use SparkSQL(thriftserver). was (Author: zhangxin0112zx): hi [~cloud_fan] Can you push this BUG repair? In my consideration, this is a very big obstacle for us when we go to popularization and use SparkSQL(thriftserver). > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at >
[jira] [Updated] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinzhang updated SPARK-21725: - Description: use thriftserver create table with partitions. session 1: SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) partitioned by (pt string) stored as parquet; --ok !exit session 2: SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) partitioned by (pt string) stored as parquet; --ok !exit session 3: --connect the thriftserver SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11; --ok !exit session 4(do it again): --connect the thriftserver SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11; --error !exit - 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING, java.lang.reflect.InvocationTargetException .. .. Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 512282-2/-ext-1/part-0 to destination hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) ... 45 more Caused by: java.io.IOException: Filesystem closed - the doc about the parquet table desc here http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files Hive metastore Parquet table conversion When reading from and writing to Hive metastore Parquet tables, Spark SQL will try to use its own Parquet support instead of Hive SerDe for better performance. This behavior is controlled by the spark.sql.hive.convertMetastoreParquet configuration, and is turned on by default. I am confused the problem appear in the table(partitions) but it is ok with table(with out partitions) . It means spark do not use its own parquet ? Maybe someone give any suggest how could I avoid the issue? was: use thriftserver create table with partitions. session 1: SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) partitioned by (pt string) stored as parquet; --ok !exit session 2: SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) partitioned by (pt string) stored as parquet; --ok !exit session 3: --connect the thriftserver SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11; --ok !exit session 4(do it again): --connect the thriftserver SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11; --error !exit - 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING, java.lang.reflect.InvocationTargetException .. .. Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 512282-2/-ext-1/part-0 to destination hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) ... 45 more Caused by: java.io.IOException: Filesystem closed - > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET
[jira] [Updated] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinzhang updated SPARK-21725: - Description: use thriftserver create table with partitions. session 1: SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) partitioned by (pt string) stored as parquet; --ok !exit session 2: SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) partitioned by (pt string) stored as parquet; --ok !exit session 3: --connect the thriftserver SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11; --ok !exit session 4(do it again): --connect the thriftserver SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11; --error !exit - 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING, java.lang.reflect.InvocationTargetException .. .. Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 512282-2/-ext-1/part-0 to destination hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) ... 45 more Caused by: java.io.IOException: Filesystem closed - was: use thriftserver create table with partitions. session 1: SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) partitioned by (pt string) stored as parquet; --ok !exit session 2: SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) partitioned by (pt string) stored as parquet; --ok !exit session 3: --connect the thriftserver SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11; --ok !exit session 4(do it again): --connect the thriftserver SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11; --error !exit - 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING, java.lang.reflect.InvocationTargetException .. .. Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 512282-2/-ext-1/part-0 to destination hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) ... 45 more Caused by: java.io.IOException: Filesystem closed - > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit >
[jira] [Created] (SPARK-21725) spark thriftserver insert overwrite table partition select
xinzhang created SPARK-21725: Summary: spark thriftserver insert overwrite table partition select Key: SPARK-21725 URL: https://issues.apache.org/jira/browse/SPARK-21725 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.1.0 Environment: centos 6.7 spark 2.1 jdk8 Reporter: xinzhang use thriftserver create table with partitions. session 1: SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) partitioned by (pt string) stored as parquet; --ok !exit session 2: SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) partitioned by (pt string) stored as parquet; --ok !exit session 3: --connect the thriftserver SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11; --ok !exit session 4(do it again): --connect the thriftserver SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select count(1) count from tmp_11; --error !exit - 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING, java.lang.reflect.InvocationTargetException .. .. Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 512282-2/-ext-1/part-0 to destination hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) ... 45 more Caused by: java.io.IOException: Filesystem closed - -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119310#comment-16119310 ] xinzhang edited comment on SPARK-21067 at 8/9/17 2:04 AM: -- hi [~cloud_fan] Can you push this BUG repair? In my consideration, this is a very big obstacle for us when we go to popularization and use SparkSQL(thriftserver). was (Author: zhangxin0112zx): hi [~cloud_fan] Can you push this BUG repair? In my consideration, this is a very big obstacle for us when we go to use SparkSQL(thriftserver). > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at >
[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119310#comment-16119310 ] xinzhang edited comment on SPARK-21067 at 8/9/17 2:03 AM: -- hi [~cloud_fan] Can you push this BUG repair? In my consideration, this is a very big obstacle for us when we go to use SparkSQL(thriftserver). was (Author: zhangxin0112zx): hi [~cloud_fan] Can you push this BUG repair? In my consideration, this is a very big obstacle for us when we go to use Spark(thriftserver). > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at >
[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119310#comment-16119310 ] xinzhang commented on SPARK-21067: -- hi [~cloud_fan] Can you push this BUG repair? In my consideration, this is a very big obstacle for us when we go to use Spark(thriftserver). > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at
[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108303#comment-16108303 ] xinzhang edited comment on SPARK-21067 at 8/5/17 1:13 AM: -- hi srowen.could u consider about this .give some suggest.[~srowen] was (Author: zhangxin0112zx): hi guoxiaolongzte.could u consider about this .give some suggest.[~guoxiaolongzte] > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) >
[jira] [Issue Comment Deleted] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinzhang updated SPARK-21067: - Comment: was deleted (was: hi.Reynold Xin i am looking forwad to your reply [~rxin]) > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) > at
[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108303#comment-16108303 ] xinzhang commented on SPARK-21067: -- hi guoxiaolongzte.could u consider about this .give some suggest.[~guoxiaolongzte] > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) > at
[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106057#comment-16106057 ] xinzhang edited comment on SPARK-21067 at 7/29/17 7:09 AM: --- hi.Reynold Xin i am looking forwad to your reply [~rxin] was (Author: zhangxin0112zx): hi.Reynold Xin i am looking forwad to your reply > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at
[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106057#comment-16106057 ] xinzhang commented on SPARK-21067: -- hi.Reynold Xin i am looking forwad to your reply > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) > at
[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103202#comment-16103202 ] xinzhang edited comment on SPARK-21067 at 7/28/17 1:06 AM: --- same here. problem reappeared in Spark 2.1.0 thriftserver : Open Beeline Session 1 Create Table 1 (Success) Open Beeline Session 2 Create Table 2 (Success) Close Beeline Session 1 Create Table 3 in Beeline Session 2 (FAIL) use parquet, the issue is not present . [~cloud_fan] was (Author: zhangxin0112zx): same here. problem reappeared in Spark 2.1.0 thriftserver : Open Beeline Session 1 Create Table 1 (Success) Open Beeline Session 2 Create Table 2 (Success) Close Beeline Session 1 Create Table 3 in Beeline Session 2 (FAIL) use parquet, the issue is not present . @Wenchen Fan > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at >
[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103202#comment-16103202 ] xinzhang commented on SPARK-21067: -- same here. problem reappeared in Spark 2.1.0 thriftserver : Open Beeline Session 1 Create Table 1 (Success) Open Beeline Session 2 Create Table 2 (Success) Close Beeline Session 1 Create Table 3 in Beeline Session 2 (FAIL) use parquet, the issue is not present . Wenchen Fan > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at >
[jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103202#comment-16103202 ] xinzhang edited comment on SPARK-21067 at 7/27/17 1:27 PM: --- same here. problem reappeared in Spark 2.1.0 thriftserver : Open Beeline Session 1 Create Table 1 (Success) Open Beeline Session 2 Create Table 2 (Success) Close Beeline Session 1 Create Table 3 in Beeline Session 2 (FAIL) use parquet, the issue is not present . @Wenchen Fan was (Author: zhangxin0112zx): same here. problem reappeared in Spark 2.1.0 thriftserver : Open Beeline Session 1 Create Table 1 (Success) Open Beeline Session 2 Create Table 2 (Success) Close Beeline Session 1 Create Table 3 in Beeline Session 2 (FAIL) use parquet, the issue is not present . Wenchen Fan > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at >
[jira] [Commented] (SPARK-19511) insert into table does not work on second session of beeline
[ https://issues.apache.org/jira/browse/SPARK-19511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102804#comment-16102804 ] xinzhang commented on SPARK-19511: -- [~chenerlu] hi it always appear . which scene does it do not appear.? > insert into table does not work on second session of beeline > > > Key: SPARK-19511 > URL: https://issues.apache.org/jira/browse/SPARK-19511 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 > Environment: Centos 7.2, java 1.7.0_91 >Reporter: sanjiv marathe > > same issue spark-11083 ...reopen ? > insert into table works for the first session of beeline; and fails in the > second session of beeline. > Everytime, I had to restart thrift server and connect again to get it working. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11083) insert overwrite table failed when beeline reconnect
[ https://issues.apache.org/jira/browse/SPARK-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102775#comment-16102775 ] xinzhang commented on SPARK-11083: -- reappeared in Spark 2.1.0. any one working on this issue? > insert overwrite table failed when beeline reconnect > > > Key: SPARK-11083 > URL: https://issues.apache.org/jira/browse/SPARK-11083 > Project: Spark > Issue Type: Bug > Components: SQL > Environment: Spark: master branch > Hadoop: 2.7.1 > JDK: 1.8.0_60 >Reporter: Weizhong >Assignee: Davies Liu > > 1. Start Thriftserver > 2. Use beeline connect to thriftserver, then execute "insert overwrite > table_name ..." clause -- success > 3. Exit beelin > 4. Reconnect to thriftserver, and then execute "insert overwrite table_name > ..." clause. -- failed > {noformat} > 15/10/13 18:44:35 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.sql.hive.client.Shim_v1_2.loadDynamicPartitions(HiveShim.scala:520) > at > org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(ClientWrapper.scala:506) > at > org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:506) > at > org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:506) > at > org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:256) > at > org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:211) > at > org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:248) > at > org.apache.spark.sql.hive.client.ClientWrapper.loadDynamicPartitions(ClientWrapper.scala:505) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:225) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:58) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:58) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:144) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:129) > at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:739) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.runInternal(SparkExecuteStatementOperation.scala:224) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:182) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://9.91.8.214:9000/user/hive/warehouse/tpcds_bin_partitioned_orc_2.db/catalog_returns/.hive-staging_hive_2015-10-13_18-44-17_606_2400736035447406540-2/-ext-1/cr_returned_date=2003-08-27/part-00048 > to destination >