[jira] [Commented] (SPARK-10765) use new aggregate interface for hive UDAF
[ https://issues.apache.org/jira/browse/SPARK-10765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15190042#comment-15190042 ] David Ross commented on SPARK-10765: As noted in the change, this is a performance regression for Hive UDAFs: https://github.com/apache/spark/commit/341b13f8f5eb118f1fb4d4f84418715ac4750a4d#diff-53f31aa4bbd9274f40547cd00cf0826dR526 What is the plan to resolve this? > use new aggregate interface for hive UDAF > - > > Key: SPARK-10765 > URL: https://issues.apache.org/jira/browse/SPARK-10765 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11246) [1.5] Table cache for Parquet broken in 1.5
David Ross created SPARK-11246: -- Summary: [1.5] Table cache for Parquet broken in 1.5 Key: SPARK-11246 URL: https://issues.apache.org/jira/browse/SPARK-11246 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.1 Reporter: David Ross Since upgrading to 1.5.1, using the {{CACHE TABLE}} works great for all tables except for parquet tables, likely related to the parquet native reader. Here are steps for parquet table: {code} create table test_parquet stored as parquet as select 1; explain select * from test_parquet; {code} With output: {code} == Physical Plan == Scan ParquetRelation[hdfs://192.168.99.9/user/hive/warehouse/test_parquet][_c0#141] {code} And then caching: {code} cache table test_parquet; explain select * from test_parquet; {code} With output: {code} == Physical Plan == Scan ParquetRelation[hdfs://192.168.99.9/user/hive/warehouse/test_parquet][_c0#174] {code} Note it isn't cached. I have included spark log output for the {{cache table}} and {{explain}} statements below. --- Here's the same for non-parquet table: {code} cache table test_no_parquet; explain select * from test_no_parquet; {code} With output: {code} == Physical Plan == HiveTableScan [_c0#210], (MetastoreRelation default, test_no_parquet, None) {code} And then caching: {code} cache table test_no_parquet; explain select * from test_no_parquet; {code} With output: {code} == Physical Plan == InMemoryColumnarTableScan [_c0#229], (InMemoryRelation [_c0#229], true, 1, StorageLevel(true, true, false, true, 1), (HiveTableScan [_c0#211], (MetastoreRelation default, test_no_parquet, None)), Some(test_no_parquet)) {code} Not that the table seems to be cached. --- Note that if the flag {{spark.sql.hive.convertMetastoreParquet}} is set to {{false}}, parquet tables work the same as non-parquet tables with caching. This is a reasonable workaround for us, but ideally, we would like to benefit from the native reading. --- Spark logs for {{cache table}} for {{test_parquet}}: {code} 15/10/21 21:22:05 INFO thriftserver.SparkExecuteStatementOperation: Running query 'cache table test_parquet' with 20ee2ab9-5242-4783-81cf-46115ed72610 15/10/21 21:22:05 INFO metastore.HiveMetaStore: 49: get_table : db=default tbl=test_parquet 15/10/21 21:22:05 INFO HiveMetaStore.audit: ugi=vagrant ip=unknown-ip-addr cmd=get_table : db=default tbl=test_parquet 15/10/21 21:22:05 INFO metastore.HiveMetaStore: 49: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 15/10/21 21:22:05 INFO metastore.ObjectStore: ObjectStore, initialize called 15/10/21 21:22:05 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing 15/10/21 21:22:05 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL 15/10/21 21:22:05 INFO metastore.ObjectStore: Initialized ObjectStore 15/10/21 21:22:05 INFO storage.MemoryStore: ensureFreeSpace(215680) called with curMem=4196713, maxMem=139009720 15/10/21 21:22:05 INFO storage.MemoryStore: Block broadcast_59 stored as values in memory (estimated size 210.6 KB, free 128.4 MB) 15/10/21 21:22:05 INFO storage.MemoryStore: ensureFreeSpace(20265) called with curMem=4412393, maxMem=139009720 15/10/21 21:22:05 INFO storage.MemoryStore: Block broadcast_59_piece0 stored as bytes in memory (estimated size 19.8 KB, free 128.3 MB) 15/10/21 21:22:05 INFO storage.BlockManagerInfo: Added broadcast_59_piece0 in memory on 192.168.99.9:50262 (size: 19.8 KB, free: 132.2 MB) 15/10/21 21:22:05 INFO spark.SparkContext: Created broadcast 59 from run at AccessController.java:-2 15/10/21 21:22:05 INFO metastore.HiveMetaStore: 49: get_table : db=default tbl=test_parquet 15/10/21 21:22:05 INFO HiveMetaStore.audit: ugi=vagrant ip=unknown-ip-addr cmd=get_table : db=default tbl=test_parquet 15/10/21 21:22:05 INFO storage.MemoryStore: ensureFreeSpace(215680) called with curMem=4432658, maxMem=139009720 15/10/21 21:22:05 INFO storage.MemoryStore: Block broadcast_60 stored as values in memory (estimated size 210.6 KB, free 128.1 MB) 15/10/21 21:22:05 INFO storage.BlockManagerInfo: Removed broadcast_58_piece0 on 192.168.99.9:50262 in memory (size: 19.8 KB, free: 132.2 MB) 15/10/21 21:22:05 INFO storage.BlockManagerInfo: Removed broadcast_57_piece0 on 192.168.99.9:50262 in memory (size: 21.1 KB, free: 132.2 MB) 15/10/21 21:22:05 INFO storage.BlockManagerInfo: Removed broadcast_57_piece0 on slave2:46912 in memory (size: 21.1 KB, free: 534.5 MB) 15/10/21 21:22:05 INFO storage.BlockManagerInfo: Removed broadcast_57_piece0 on slave0:46599 in memory (size: 21.1 KB, free: 534.3 MB) 15/10/21 21:22:05 INFO spark.ContextCleaner: Cleaned accumulator 86 15/10/21 21:22:05 INFO spark.ContextCleaner: Cleaned accumulator 84 15/10/21 21:22:05 INFO
[jira] [Created] (SPARK-11191) [1.5] Can't create UDF's using hive thrift service
David Ross created SPARK-11191: -- Summary: [1.5] Can't create UDF's using hive thrift service Key: SPARK-11191 URL: https://issues.apache.org/jira/browse/SPARK-11191 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.1, 1.5.0 Reporter: David Ross Since upgrading to spark 1.5 we've been unable to create and use UDF's when we run in thrift server mode. Our setup: We start the thrift-server running against yarn in client mode, (we've also built our own spark from github branch-1.5 with the following args: {{-Pyarn -Phive -Phive-thrifeserver}} If i run the following after connecting via JDBC (in this case via beeline): {{add jar 'hdfs://path/to/jar"}} (this command succeeds with no errors) {{CREATE TEMPORARY FUNCTION testUDF AS 'com.foo.class.UDF';}} (this command succeeds with no errors) {{select testUDF(col1) from table1;}} I get the following error in the logs: {code} org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1 pos 8 at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57) at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53) at scala.util.Try.getOrElse(Try.scala:77) at org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506) at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249) {code} (cutting the bulk for ease of report, more than happy to send the full output) {code} 15/10/12 14:34:37 ERROR SparkExecuteStatementOperation: Error running hive query: org.apache.hive.service.cli.HiveSQLException: org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1 pos 100 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.runInternal(SparkExecuteStatementOperation.scala:259) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:182) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} When I ran the same against 1.4 it worked. I've also changed the {{spark.sql.hive.metastore.version}} version to be 0.13 (similar to what it was in 1.4) and 0.14 but I still get the same errors. Also, in 1.5, when you run it against the {{spark-sql}} shell, it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
[jira] [Commented] (SPARK-11191) [1.5] Can't create UDF's using hive thrift service
[ https://issues.apache.org/jira/browse/SPARK-11191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964048#comment-14964048 ] David Ross commented on SPARK-11191: I will add that the exact same thing happens when you don't use {{TEMPORARY}} i.e.: {code} CREATE FUNCTION testUDF AS 'com.foo.class.UDF'; {code} > [1.5] Can't create UDF's using hive thrift service > -- > > Key: SPARK-11191 > URL: https://issues.apache.org/jira/browse/SPARK-11191 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0, 1.5.1 >Reporter: David Ross > > Since upgrading to spark 1.5 we've been unable to create and use UDF's when > we run in thrift server mode. > Our setup: > We start the thrift-server running against yarn in client mode, (we've also > built our own spark from github branch-1.5 with the following args: {{-Pyarn > -Phive -Phive-thrifeserver}} > If i run the following after connecting via JDBC (in this case via beeline): > {{add jar 'hdfs://path/to/jar"}} > (this command succeeds with no errors) > {{CREATE TEMPORARY FUNCTION testUDF AS 'com.foo.class.UDF';}} > (this command succeeds with no errors) > {{select testUDF(col1) from table1;}} > I get the following error in the logs: > {code} > org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1 > pos 8 > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57) > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53) > at scala.util.Try.getOrElse(Try.scala:77) > at > org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506) > at > org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249) > {code} > (cutting the bulk for ease of report, more than happy to send the full output) > {code} > 15/10/12 14:34:37 ERROR SparkExecuteStatementOperation: Error running hive > query: > org.apache.hive.service.cli.HiveSQLException: > org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1 > pos 100 > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.runInternal(SparkExecuteStatementOperation.scala:259) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:182) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at
[jira] [Commented] (SPARK-5391) SparkSQL fails to create tables with custom JSON SerDe
[ https://issues.apache.org/jira/browse/SPARK-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746320#comment-14746320 ] David Ross commented on SPARK-5391: --- Haven't tried native JSON but looks promising, so this ticket is probably lower priority. > SparkSQL fails to create tables with custom JSON SerDe > -- > > Key: SPARK-5391 > URL: https://issues.apache.org/jira/browse/SPARK-5391 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: David Ross > > - Using Spark built from trunk on this commit: > https://github.com/apache/spark/commit/bc20a52b34e826895d0dcc1d783c021ebd456ebd > - Build for Hive13 > - Using this JSON serde: https://github.com/rcongiu/Hive-JSON-Serde > First download jar locally: > {code} > $ curl > http://www.congiu.net/hive-json-serde/1.3/cdh5/json-serde-1.3-jar-with-dependencies.jar > > /tmp/json-serde-1.3-jar-with-dependencies.jar > {code} > Then add it in SparkSQL session: > {code} > add jar /tmp/json-serde-1.3-jar-with-dependencies.jar > {code} > Finally create table: > {code} > create table test_json (c1 boolean) ROW FORMAT SERDE > 'org.openx.data.jsonserde.JsonSerDe'; > {code} > Logs for add jar: > {code} > 15/01/23 23:48:33 INFO thriftserver.SparkExecuteStatementOperation: Running > query 'add jar /tmp/json-serde-1.3-jar-with-dependencies.jar' > 15/01/23 23:48:34 INFO session.SessionState: No Tez session required at this > point. hive.execution.engine=mr. > 15/01/23 23:48:34 INFO SessionState: Added > /tmp/json-serde-1.3-jar-with-dependencies.jar to class path > 15/01/23 23:48:34 INFO SessionState: Added resource: > /tmp/json-serde-1.3-jar-with-dependencies.jar > 15/01/23 23:48:34 INFO spark.SparkContext: Added JAR > /tmp/json-serde-1.3-jar-with-dependencies.jar at > http://192.168.99.9:51312/jars/json-serde-1.3-jar-with-dependencies.jar with > timestamp 1422056914776 > 15/01/23 23:48:34 INFO thriftserver.SparkExecuteStatementOperation: Result > Schema: List() > 15/01/23 23:48:34 INFO thriftserver.SparkExecuteStatementOperation: Result > Schema: List() > {code} > Logs (with error) for create table: > {code} > 15/01/23 23:49:00 INFO thriftserver.SparkExecuteStatementOperation: Running > query 'create table test_json (c1 boolean) ROW FORMAT SERDE > 'org.openx.data.jsonserde.JsonSerDe'' > 15/01/23 23:49:00 INFO parse.ParseDriver: Parsing command: create table > test_json (c1 boolean) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' > 15/01/23 23:49:01 INFO parse.ParseDriver: Parse Completed > 15/01/23 23:49:01 INFO session.SessionState: No Tez session required at this > point. hive.execution.engine=mr. > 15/01/23 23:49:01 INFO log.PerfLogger: from=org.apache.hadoop.hive.ql.Driver> > 15/01/23 23:49:01 INFO log.PerfLogger: from=org.apache.hadoop.hive.ql.Driver> > 15/01/23 23:49:01 INFO ql.Driver: Concurrency mode is disabled, not creating > a lock manager > 15/01/23 23:49:01 INFO log.PerfLogger: from=org.apache.hadoop.hive.ql.Driver> > 15/01/23 23:49:01 INFO log.PerfLogger: from=org.apache.hadoop.hive.ql.Driver> > 15/01/23 23:49:01 INFO parse.ParseDriver: Parsing command: create table > test_json (c1 boolean) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' > 15/01/23 23:49:01 INFO parse.ParseDriver: Parse Completed > 15/01/23 23:49:01 INFO log.PerfLogger: start=1422056941103 end=1422056941104 duration=1 > from=org.apache.hadoop.hive.ql.Driver> > 15/01/23 23:49:01 INFO log.PerfLogger: from=org.apache.hadoop.hive.ql.Driver> > 15/01/23 23:49:01 INFO parse.SemanticAnalyzer: Starting Semantic Analysis > 15/01/23 23:49:01 INFO parse.SemanticAnalyzer: Creating table test_json > position=13 > 15/01/23 23:49:01 INFO ql.Driver: Semantic Analysis Completed > 15/01/23 23:49:01 INFO log.PerfLogger: start=1422056941104 end=1422056941240 duration=136 > from=org.apache.hadoop.hive.ql.Driver> > 15/01/23 23:49:01 INFO ql.Driver: Returning Hive schema: > Schema(fieldSchemas:null, properties:null) > 15/01/23 23:49:01 INFO log.PerfLogger: start=1422056941071 end=1422056941252 duration=181 > from=org.apache.hadoop.hive.ql.Driver> > 15/01/23 23:49:01 INFO log.PerfLogger: from=org.apache.hadoop.hive.ql.Driver> > 15/01/23 23:49:01 INFO ql.Driver: Starting command: create table test_json > (c1 boolean) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' > 15/01/23 23:49:01 INFO log.PerfLogger: start=1422056941067 end=1422056941258 duration=191 > from=org.apache.hadoop.hive.ql.Driver> > 15/01/23 23:49:01 INFO log.PerfLogger: from=org.apache.hadoop.hive.ql.Driver> > 15/01/23 23:49:01 INFO log.PerfLogger: from=org.apache.hadoop.hive.ql.Driver> > 15/01/23 23:49:01 WARN security.ShellBasedUnixGroupsMapping: got exception > trying to get groups for user anonymous > org.apache.hadoop.util.Shell$ExitCodeException: id: anonymous: No such user > at
[jira] [Commented] (SPARK-2087) Clean Multi-user semantics for thrift JDBC/ODBC server.
[ https://issues.apache.org/jira/browse/SPARK-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494428#comment-14494428 ] David Ross commented on SPARK-2087: --- Makes sense, thanks for the response. Clean Multi-user semantics for thrift JDBC/ODBC server. --- Key: SPARK-2087 URL: https://issues.apache.org/jira/browse/SPARK-2087 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.0.2, 1.1.1, 1.2.1, 1.3.0 Reporter: Michael Armbrust Assignee: Cheng Hao Priority: Minor Fix For: 1.4.0 Configuration and temporary tables should exist per-user. Cached tables should be shared across users. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2087) Clean Multi-user semantics for thrift JDBC/ODBC server.
[ https://issues.apache.org/jira/browse/SPARK-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494410#comment-14494410 ] David Ross commented on SPARK-2087: --- Any chance this will be back-ported to the 1.3 branch? Clean Multi-user semantics for thrift JDBC/ODBC server. --- Key: SPARK-2087 URL: https://issues.apache.org/jira/browse/SPARK-2087 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.0.2, 1.1.1, 1.2.1, 1.3.0 Reporter: Michael Armbrust Assignee: Cheng Hao Priority: Minor Fix For: 1.4.0 Configuration and temporary tables should exist per-user. Cached tables should be shared across users. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6757) spark.sql.shuffle.partitions is global, not per connection
David Ross created SPARK-6757: - Summary: spark.sql.shuffle.partitions is global, not per connection Key: SPARK-6757 URL: https://issues.apache.org/jira/browse/SPARK-6757 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: David Ross We are trying to use the {{spark.sql.shuffle.partitions}} parameter to handle large queries differently from smaller queries. We expected that this parameter would be respected per connection, but it seems to be global. For example, in try this in two separate JDBC connections: Connection 1: {code} SET spark.sql.shuffle.partitions=10; SELECT * FROM some_table; {code} The correct number {{10}} was used. Connection 2: {code} SET spark.sql.shuffle.partitions=100; SELECT * FROM some_table; {code} The correct number {{100}} was used. Back to connection 1: {code} SELECT * FROM some_table; {code} We expected the number {{10}} to be used but {{100}} is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6482) Remove synchronization of Hive Native commands
David Ross created SPARK-6482: - Summary: Remove synchronization of Hive Native commands Key: SPARK-6482 URL: https://issues.apache.org/jira/browse/SPARK-6482 Project: Spark Issue Type: Improvement Reporter: David Ross As discussed in https://issues.apache.org/jira/browse/SPARK-4908, concurrent hive native commands run into thread-safety issues with {{org.apache.hadoop.hive.ql.Driver}}. The quick-fix was to synchronize calls to {{runHive}}: https://github.com/apache/spark/commit/480bd1d2edd1de06af607b0cf3ff3c0b16089add However, if the hive native command is long-running, this can block subsequent queries if they have native dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5391) SparkSQL fails to create tables with custom JSON SerDe
David Ross created SPARK-5391: - Summary: SparkSQL fails to create tables with custom JSON SerDe Key: SPARK-5391 URL: https://issues.apache.org/jira/browse/SPARK-5391 Project: Spark Issue Type: Bug Components: SQL Reporter: David Ross - Using Spark built from trunk on this commit: https://github.com/apache/spark/commit/bc20a52b34e826895d0dcc1d783c021ebd456ebd - Build for Hive13 - Using this JSON serde: https://github.com/rcongiu/Hive-JSON-Serde First download jar locally: {code} $ curl http://www.congiu.net/hive-json-serde/1.3/cdh5/json-serde-1.3-jar-with-dependencies.jar /tmp/json-serde-1.3-jar-with-dependencies.jar {code} Then add it in SparkSQL session: {code} add jar /tmp/json-serde-1.3-jar-with-dependencies.jar {code} Finally create table: {code} create table test_json (c1 boolean) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'; {code} Logs for add jar: {code} 15/01/23 23:48:33 INFO thriftserver.SparkExecuteStatementOperation: Running query 'add jar /tmp/json-serde-1.3-jar-with-dependencies.jar' 15/01/23 23:48:34 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr. 15/01/23 23:48:34 INFO SessionState: Added /tmp/json-serde-1.3-jar-with-dependencies.jar to class path 15/01/23 23:48:34 INFO SessionState: Added resource: /tmp/json-serde-1.3-jar-with-dependencies.jar 15/01/23 23:48:34 INFO spark.SparkContext: Added JAR /tmp/json-serde-1.3-jar-with-dependencies.jar at http://192.168.99.9:51312/jars/json-serde-1.3-jar-with-dependencies.jar with timestamp 1422056914776 15/01/23 23:48:34 INFO thriftserver.SparkExecuteStatementOperation: Result Schema: List() 15/01/23 23:48:34 INFO thriftserver.SparkExecuteStatementOperation: Result Schema: List() {code} Logs (with error) for create table: {code} 15/01/23 23:49:00 INFO thriftserver.SparkExecuteStatementOperation: Running query 'create table test_json (c1 boolean) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'' 15/01/23 23:49:00 INFO parse.ParseDriver: Parsing command: create table test_json (c1 boolean) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 15/01/23 23:49:01 INFO parse.ParseDriver: Parse Completed 15/01/23 23:49:01 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr. 15/01/23 23:49:01 INFO log.PerfLogger: PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver 15/01/23 23:49:01 INFO log.PerfLogger: PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver 15/01/23 23:49:01 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager 15/01/23 23:49:01 INFO log.PerfLogger: PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver 15/01/23 23:49:01 INFO log.PerfLogger: PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver 15/01/23 23:49:01 INFO parse.ParseDriver: Parsing command: create table test_json (c1 boolean) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 15/01/23 23:49:01 INFO parse.ParseDriver: Parse Completed 15/01/23 23:49:01 INFO log.PerfLogger: /PERFLOG method=parse start=1422056941103 end=1422056941104 duration=1 from=org.apache.hadoop.hive.ql.Driver 15/01/23 23:49:01 INFO log.PerfLogger: PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver 15/01/23 23:49:01 INFO parse.SemanticAnalyzer: Starting Semantic Analysis 15/01/23 23:49:01 INFO parse.SemanticAnalyzer: Creating table test_json position=13 15/01/23 23:49:01 INFO ql.Driver: Semantic Analysis Completed 15/01/23 23:49:01 INFO log.PerfLogger: /PERFLOG method=semanticAnalyze start=1422056941104 end=1422056941240 duration=136 from=org.apache.hadoop.hive.ql.Driver 15/01/23 23:49:01 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null) 15/01/23 23:49:01 INFO log.PerfLogger: /PERFLOG method=compile start=1422056941071 end=1422056941252 duration=181 from=org.apache.hadoop.hive.ql.Driver 15/01/23 23:49:01 INFO log.PerfLogger: PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver 15/01/23 23:49:01 INFO ql.Driver: Starting command: create table test_json (c1 boolean) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 15/01/23 23:49:01 INFO log.PerfLogger: /PERFLOG method=TimeToSubmit start=1422056941067 end=1422056941258 duration=191 from=org.apache.hadoop.hive.ql.Driver 15/01/23 23:49:01 INFO log.PerfLogger: PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver 15/01/23 23:49:01 INFO log.PerfLogger: PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver 15/01/23 23:49:01 WARN security.ShellBasedUnixGroupsMapping: got exception trying to get groups for user anonymous org.apache.hadoop.util.Shell$ExitCodeException: id: anonymous: No such user at org.apache.hadoop.util.Shell.runCommand(Shell.java:505) at org.apache.hadoop.util.Shell.run(Shell.java:418) at
[jira] [Created] (SPARK-5371) SparkSQL Fails to parse Query with UNION ALL in subquery
David Ross created SPARK-5371: - Summary: SparkSQL Fails to parse Query with UNION ALL in subquery Key: SPARK-5371 URL: https://issues.apache.org/jira/browse/SPARK-5371 Project: Spark Issue Type: Bug Reporter: David Ross This SQL session: {code} DROP TABLE test1; DROP TABLE test2; CREATE TABLE test1 ( c11 INT, c12 INT, c13 INT, c14 INT ); CREATE TABLE test2 ( c21 INT, c22 INT, c23 INT, c24 INT ); SELECT MIN(t3.c_1), MIN(t3.c_2), MIN(t3.c_3), MIN(t3.c_4) FROM ( SELECT SUM(t1.c11) c_1, NULLc_2, NULLc_3, NULLc_4 FROM test1 t1 UNION ALL SELECT NULLc_1, SUM(t2.c22) c_2, SUM(t2.c23) c_3, SUM(t2.c24) c_4 FROM test2 t2 ) t3; {code} Produces this error: {code} 15/01/23 00:25:21 INFO thriftserver.SparkExecuteStatementOperation: Running query 'SELECT MIN(t3.c_1), MIN(t3.c_2), MIN(t3.c_3), MIN(t3.c_4) FROM ( SELECT SUM(t1.c11) c_1, NULLc_2, NULLc_3, NULLc_4 FROM test1 t1 UNION ALL SELECT NULLc_1, SUM(t2.c22) c_2, SUM(t2.c23) c_3, SUM(t2.c24) c_4 FROM test2 t2 ) t3' 15/01/23 00:25:21 INFO parse.ParseDriver: Parsing command: SELECT MIN(t3.c_1), MIN(t3.c_2), MIN(t3.c_3), MIN(t3.c_4) FROM ( SELECT SUM(t1.c11) c_1, NULLc_2, NULLc_3, NULLc_4 FROM test1 t1 UNION ALL SELECT NULLc_1, SUM(t2.c22) c_2, SUM(t2.c23) c_3, SUM(t2.c24) c_4 FROM test2 t2 ) t3 15/01/23 00:25:21 INFO parse.ParseDriver: Parse Completed 15/01/23 00:25:21 ERROR thriftserver.SparkExecuteStatementOperation: Error executing query: java.util.NoSuchElementException: key not found: c_2#23488 at scala.collection.MapLike$class.default(MapLike.scala:228) at org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29) at scala.collection.MapLike$class.apply(MapLike.scala:141) at org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29) at org.apache.spark.sql.catalyst.optimizer.UnionPushdown$$anonfun$1.applyOrElse(Optimizer.scala:77) at org.apache.spark.sql.catalyst.optimizer.UnionPushdown$$anonfun$1.applyOrElse(Optimizer.scala:76) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) at org.apache.spark.sql.catalyst.optimizer.UnionPushdown$.pushToRight(Optimizer.scala:76) at org.apache.spark.sql.catalyst.optimizer.UnionPushdown$$anonfun$apply$1$$anonfun$applyOrElse$6.apply(Optimizer.scala:98) at org.apache.spark.sql.catalyst.optimizer.UnionPushdown$$anonfun$apply$1$$anonfun$applyOrElse$6.apply(Optimizer.scala:98) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.catalyst.optimizer.UnionPushdown$$anonfun$apply$1.applyOrElse(Optimizer.scala:98) at org.apache.spark.sql.catalyst.optimizer.UnionPushdown$$anonfun$apply$1.applyOrElse(Optimizer.scala:85) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:162) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at
[jira] [Commented] (SPARK-4908) Spark SQL built for Hive 13 fails under concurrent metadata queries
[ https://issues.apache.org/jira/browse/SPARK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268892#comment-14268892 ] David Ross commented on SPARK-4908: --- I've verified that this is fixed on trunk. Since his commit says just a quick fix, I will let [~marmbrus] decide whether or not to keep this JIRA open. Spark SQL built for Hive 13 fails under concurrent metadata queries --- Key: SPARK-4908 URL: https://issues.apache.org/jira/browse/SPARK-4908 Project: Spark Issue Type: Bug Components: SQL Reporter: David Ross Assignee: Cheng Lian Priority: Blocker Fix For: 1.3.0, 1.2.1 We are trunk: {{1.3.0-SNAPSHOT}}, as of this commit: https://github.com/apache/spark/commit/3d0c37b8118f6057a663f959321a79b8061132b6 We are using Spark built for Hive 13, using this option: {{-Phive-0.13.1}} In single-threaded mode, normal operations look fine. However, under concurrency, with at least 2 concurrent connections, metadata queries fail. For example, {{USE some_db}}, {{SHOW TABLES}}, and the implicit {{USE}} statement when you pass a default schema in the JDBC URL, all fail. {{SELECT}} queries like {{SELECT * FROM some_table}} do not have this issue. Here is some example code: {code} object main extends App { import java.sql._ import scala.concurrent._ import scala.concurrent.duration._ import scala.concurrent.ExecutionContext.Implicits.global Class.forName(org.apache.hive.jdbc.HiveDriver) val host = localhost // update this val url = sjdbc:hive2://${host}:10511/some_db // update this val future = Future.traverse(1 to 3) { i = Future { println(Starting: + i) try { val conn = DriverManager.getConnection(url) } catch { case e: Throwable = e.printStackTrace() println(Failed: + i) } println(Finishing: + i) } } Await.result(future, 2.minutes) println(done!) } {code} Here is the output: {code} Starting: 1 Starting: 3 Starting: 2 java.sql.SQLException: org.apache.spark.sql.execution.QueryExecutionException: FAILED: Operation cancelled at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231) at org.apache.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:451) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:195) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:270) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply$mcV$sp(ConnectionManager.scala:896) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Failed: 3 Finishing: 3 java.sql.SQLException: org.apache.spark.sql.execution.QueryExecutionException: FAILED: Operation cancelled at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231) at org.apache.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:451) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:195) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:270) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply$mcV$sp(ConnectionManager.scala:896) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893) at
[jira] [Commented] (SPARK-4908) Spark SQL built for Hive 13 fails under concurrent metadata queries
[ https://issues.apache.org/jira/browse/SPARK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256174#comment-14256174 ] David Ross commented on SPARK-4908: --- Note that noticed this line from native Hive logging: {code} 14/12/19 21:44:55 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager {code} It seems to be tied to this config: https://github.com/apache/hive/blob/branch-0.13/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L719 I have this to our {{hive-site.xml}} in the spark {{conf}} directory: {code} property namehive.support.concurrency/name valuetrue/value /property {code} And I still have the issue. Perhaps there is more I need to do to support concurrency? Spark SQL built for Hive 13 fails under concurrent metadata queries --- Key: SPARK-4908 URL: https://issues.apache.org/jira/browse/SPARK-4908 Project: Spark Issue Type: Bug Reporter: David Ross We are trunk: {{1.3.0-SNAPSHOT}}, as of this commit: https://github.com/apache/spark/commit/3d0c37b8118f6057a663f959321a79b8061132b6 We are using Spark built for Hive 13, using this option: {{-Phive-0.13.1}} In single-threaded mode, normal operations look fine. However, under concurrency, with at least 2 concurrent connections, metadata queries fail. For example, {{USE some_db}}, {{SHOW TABLES}}, and the implicit {{USE}} statement when you pass a default schema in the JDBC URL, all fail. {{SELECT}} queries like {{SELECT * FROM some_table}} do not have this issue. Here is some example code: {code} object main extends App { import java.sql._ import scala.concurrent._ import scala.concurrent.duration._ import scala.concurrent.ExecutionContext.Implicits.global Class.forName(org.apache.hive.jdbc.HiveDriver) val host = localhost // update this val url = sjdbc:hive2://${host}:10511/some_db // update this val future = Future.traverse(1 to 3) { i = Future { println(Starting: + i) try { val conn = DriverManager.getConnection(url) } catch { case e: Throwable = e.printStackTrace() println(Failed: + i) } println(Finishing: + i) } } Await.result(future, 2.minutes) println(done!) } {code} Here is the output: {code} Starting: 1 Starting: 3 Starting: 2 java.sql.SQLException: org.apache.spark.sql.execution.QueryExecutionException: FAILED: Operation cancelled at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231) at org.apache.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:451) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:195) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:270) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply$mcV$sp(ConnectionManager.scala:896) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Failed: 3 Finishing: 3 java.sql.SQLException: org.apache.spark.sql.execution.QueryExecutionException: FAILED: Operation cancelled at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231) at org.apache.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:451) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:195) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:270) at
[jira] [Comment Edited] (SPARK-4908) Spark SQL built for Hive 13 fails under concurrent metadata queries
[ https://issues.apache.org/jira/browse/SPARK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256174#comment-14256174 ] David Ross edited comment on SPARK-4908 at 12/22/14 8:43 PM: - Note that I noticed this line in the logs that seems to come from Hive logging (not spark code): {code} 14/12/19 21:44:55 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager {code} It seems to be tied to this config: https://github.com/apache/hive/blob/branch-0.13/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L719 I have this to our {{hive-site.xml}} in the spark {{conf}} directory: {code} property namehive.support.concurrency/name valuetrue/value /property {code} And I still have the issue. Perhaps there is more I need to do to support concurrency? was (Author: dyross): Note that noticed this line from native Hive logging: {code} 14/12/19 21:44:55 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager {code} It seems to be tied to this config: https://github.com/apache/hive/blob/branch-0.13/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L719 I have this to our {{hive-site.xml}} in the spark {{conf}} directory: {code} property namehive.support.concurrency/name valuetrue/value /property {code} And I still have the issue. Perhaps there is more I need to do to support concurrency? Spark SQL built for Hive 13 fails under concurrent metadata queries --- Key: SPARK-4908 URL: https://issues.apache.org/jira/browse/SPARK-4908 Project: Spark Issue Type: Bug Reporter: David Ross We are trunk: {{1.3.0-SNAPSHOT}}, as of this commit: https://github.com/apache/spark/commit/3d0c37b8118f6057a663f959321a79b8061132b6 We are using Spark built for Hive 13, using this option: {{-Phive-0.13.1}} In single-threaded mode, normal operations look fine. However, under concurrency, with at least 2 concurrent connections, metadata queries fail. For example, {{USE some_db}}, {{SHOW TABLES}}, and the implicit {{USE}} statement when you pass a default schema in the JDBC URL, all fail. {{SELECT}} queries like {{SELECT * FROM some_table}} do not have this issue. Here is some example code: {code} object main extends App { import java.sql._ import scala.concurrent._ import scala.concurrent.duration._ import scala.concurrent.ExecutionContext.Implicits.global Class.forName(org.apache.hive.jdbc.HiveDriver) val host = localhost // update this val url = sjdbc:hive2://${host}:10511/some_db // update this val future = Future.traverse(1 to 3) { i = Future { println(Starting: + i) try { val conn = DriverManager.getConnection(url) } catch { case e: Throwable = e.printStackTrace() println(Failed: + i) } println(Finishing: + i) } } Await.result(future, 2.minutes) println(done!) } {code} Here is the output: {code} Starting: 1 Starting: 3 Starting: 2 java.sql.SQLException: org.apache.spark.sql.execution.QueryExecutionException: FAILED: Operation cancelled at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231) at org.apache.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:451) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:195) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:270) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply$mcV$sp(ConnectionManager.scala:896) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Failed: 3 Finishing: 3 java.sql.SQLException:
[jira] [Created] (SPARK-4908) Spark SQL built for Hive 13 fails under concurrent metadata queries
David Ross created SPARK-4908: - Summary: Spark SQL built for Hive 13 fails under concurrent metadata queries Key: SPARK-4908 URL: https://issues.apache.org/jira/browse/SPARK-4908 Project: Spark Issue Type: Bug Reporter: David Ross We are trunk: {{1.3.0-SNAPSHOT}}, as of this commit: https://github.com/apache/spark/commit/3d0c37b8118f6057a663f959321a79b8061132b6 We are using Spark built for Hive 13, using this option: {{-Phive-0.13.1}} In single-threaded mode, normal operations look fine. However, under concurrency, with at least 2 concurrent connections, metadata queries fail. For example, {{USE some_db}}, {{SHOW TABLES}}, and the implicit {{USE}} statement when you pass a default schema in the JDBC URL, all fail. {{SELECT}} queries like {{SELECT * FROM some_table}} do not have this issue. Here is some example code: {code} object main extends App { import java.sql._ import scala.concurrent._ import scala.concurrent.duration._ import scala.concurrent.ExecutionContext.Implicits.global Class.forName(org.apache.hive.jdbc.HiveDriver) val host = localhost // update this val url = sjdbc:hive2://${host}:10511/some_db // update this val future = Future.traverse(1 to 3) { i = Future { println(Starting: + i) try { val conn = DriverManager.getConnection(url) } catch { case e: Throwable = e.printStackTrace() println(Failed: + i) } println(Finishing: + i) } } Await.result(future, 2.minutes) println(done!) } {code} Here is the output: {code} Starting: 1 Starting: 3 Starting: 2 java.sql.SQLException: org.apache.spark.sql.execution.QueryExecutionException: FAILED: Operation cancelled at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231) at org.apache.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:451) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:195) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:270) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply$mcV$sp(ConnectionManager.scala:896) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Failed: 3 Finishing: 3 java.sql.SQLException: org.apache.spark.sql.execution.QueryExecutionException: FAILED: Operation cancelled at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109) at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231) at org.apache.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:451) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:195) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:270) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply$mcV$sp(ConnectionManager.scala:896) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893) at com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
[jira] [Commented] (SPARK-4296) Throw Expression not in GROUP BY when using same expression in group by clause and select clause
[ https://issues.apache.org/jira/browse/SPARK-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250611#comment-14250611 ] David Ross commented on SPARK-4296: --- I can still reproduce this issue. The test case above does appear to be fixed, but if you use other types of agg functions, it can fail. For example: {code} CREATE TABLE test_spark_4296(s STRING); SELECT UPPER(s) FROM test_spark_4296 GROUP BY UPPER(s); {code} That works. But this query doesn't: {code} SELECT REGEXP_EXTRACT(s, .*, 1) FROM test_spark_4296 GROUP BY REGEXP_EXTRACT(s, .*, 1); {code} The error is similar to the one above: {code} 14/12/17 21:39:22 INFO thriftserver.SparkExecuteStatementOperation: Running query 'SELECT REGEXP_EXTRACT(s, .*, 1) FROM test_spark_4296 GROUP BY REGEXP_EXTRACT(s, .*, 1)' 14/12/17 21:39:22 INFO storage.BlockManagerInfo: Removed broadcast_7_piece0 on slave0:50816 in memory (size: 5.2 KB, free: 534.4 MB) 14/12/17 21:39:22 INFO storage.BlockManagerInfo: Removed broadcast_7_piece0 on slave1:45411 in memory (size: 5.2 KB, free: 534.4 MB) 14/12/17 21:39:22 INFO storage.BlockManagerInfo: Removed broadcast_7_piece0 on slave2:59650 in memory (size: 5.2 KB, free: 534.4 MB) 14/12/17 21:39:22 INFO storage.BlockManager: Removing broadcast 7 14/12/17 21:39:22 INFO storage.BlockManager: Removing block broadcast_7_piece0 14/12/17 21:39:22 INFO storage.MemoryStore: Block broadcast_7_piece0 of size 5308 dropped from memory (free 276233416) 14/12/17 21:39:22 INFO storage.BlockManagerInfo: Removed broadcast_7_piece0 on master:34621 in memory (size: 5.2 KB, free: 265.0 MB) 14/12/17 21:39:22 INFO storage.BlockManagerMaster: Updated info of block broadcast_7_piece0 14/12/17 21:39:22 INFO storage.BlockManager: Removing block broadcast_7 14/12/17 21:39:22 INFO storage.MemoryStore: Block broadcast_7 of size 9344 dropped from memory (free 276242760) 14/12/17 21:39:22 INFO spark.ContextCleaner: Cleaned broadcast 7 14/12/17 21:39:22 INFO parse.ParseDriver: Parsing command: SELECT REGEXP_EXTRACT(s, .*, 1) FROM test_spark_4296 GROUP BY REGEXP_EXTRACT(s, .*, 1) 14/12/17 21:39:22 INFO parse.ParseDriver: Parse Completed 14/12/17 21:39:22 INFO spark.ContextCleaner: Cleaned shuffle 1 14/12/17 21:39:22 INFO storage.BlockManager: Removing broadcast 6 14/12/17 21:39:22 INFO storage.BlockManager: Removing block broadcast_6_piece0 14/12/17 21:39:22 INFO storage.MemoryStore: Block broadcast_6_piece0 of size 47235 dropped from memory (free 276289995) 14/12/17 21:39:22 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on master:34621 in memory (size: 46.1 KB, free: 265.0 MB) 14/12/17 21:39:22 INFO storage.BlockManagerMaster: Updated info of block broadcast_6_piece0 14/12/17 21:39:22 INFO storage.BlockManager: Removing block broadcast_6 14/12/17 21:39:22 INFO storage.MemoryStore: Block broadcast_6 of size 523775 dropped from memory (free 276813770) 14/12/17 21:39:22 INFO spark.ContextCleaner: Cleaned broadcast 6 14/12/17 21:39:22 INFO storage.BlockManager: Removing broadcast 5 14/12/17 21:39:22 INFO storage.BlockManager: Removing block broadcast_5_piece0 14/12/17 21:39:22 INFO storage.MemoryStore: Block broadcast_5_piece0 of size 7179 dropped from memory (free 276820949) 14/12/17 21:39:23 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on master:34621 in memory (size: 7.0 KB, free: 265.0 MB) 14/12/17 21:39:23 INFO storage.BlockManagerMaster: Updated info of block broadcast_5_piece0 14/12/17 21:39:23 INFO storage.BlockManager: Removing block broadcast_5 14/12/17 21:39:23 INFO storage.MemoryStore: Block broadcast_5 of size 12784 dropped from memory (free 276833733) 14/12/17 21:39:23 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on slave0:50816 in memory (size: 7.0 KB, free: 534.4 MB) 14/12/17 21:39:23 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on slave1:45411 in memory (size: 7.0 KB, free: 534.4 MB) 14/12/17 21:39:23 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on slave2:59650 in memory (size: 7.0 KB, free: 534.4 MB) 14/12/17 21:39:23 INFO spark.ContextCleaner: Cleaned broadcast 5 14/12/17 21:39:23 INFO storage.BlockManagerInfo: Removed broadcast_4_piece0 on slave1:45411 in memory (size: 7.9 KB, free: 534.4 MB) 14/12/17 21:39:23 INFO storage.BlockManagerInfo: Removed broadcast_4_piece0 on slave2:59650 in memory (size: 7.9 KB, free: 534.4 MB) 14/12/17 21:39:23 INFO storage.BlockManagerInfo: Removed broadcast_4_piece0 on slave0:50816 in memory (size: 7.9 KB, free: 534.4 MB) 14/12/17 21:39:23 ERROR thriftserver.SparkExecuteStatementOperation: Error executing query: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression not in GROUP BY: HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFRegExpExtract(s#609,.*,1) AS _c0#608, tree: Aggregate [HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFRegExpExtract(s#609,.*,1)],
[jira] [Commented] (SPARK-4296) Throw Expression not in GROUP BY when using same expression in group by clause and select clause
[ https://issues.apache.org/jira/browse/SPARK-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250674#comment-14250674 ] David Ross commented on SPARK-4296: --- Hi Michael, We are trunk: {{1.3.0-SNAPSHOT}}, as of https://github.com/apache/spark/commit/3d0c37b8118f6057a663f959321a79b8061132b6 Throw Expression not in GROUP BY when using same expression in group by clause and select clause --- Key: SPARK-4296 URL: https://issues.apache.org/jira/browse/SPARK-4296 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: Shixiong Zhu When the input data has a complex structure, using same expression in group by clause and select clause will throw Expression not in GROUP BY. {code:java} val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.createSchemaRDD case class Birthday(date: String) case class Person(name: String, birthday: Birthday) val people = sc.parallelize(List(Person(John, Birthday(1990-01-22)), Person(Jim, Birthday(1980-02-28 people.registerTempTable(people) val year = sqlContext.sql(select count(*), upper(birthday.date) from people group by upper(birthday.date)) year.collect {code} Here is the plan of year: {code:java} SchemaRDD[3] at RDD at SchemaRDD.scala:105 == Query Plan == == Physical Plan == org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression not in GROUP BY: Upper(birthday#1.date AS date#9) AS c1#3, tree: Aggregate [Upper(birthday#1.date)], [COUNT(1) AS c0#2L,Upper(birthday#1.date AS date#9) AS c1#3] Subquery people LogicalRDD [name#0,birthday#1], MapPartitionsRDD[1] at mapPartitions at ExistingRDD.scala:36 {code} The bug is the equality test for `Upper(birthday#1.date)` and `Upper(birthday#1.date AS date#9)`. Maybe Spark SQL needs a mechanism to compare Alias expression and non-Alias expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4773) CTAS Doesn't Use the Current Schema
[ https://issues.apache.org/jira/browse/SPARK-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Ross resolved SPARK-4773. --- Resolution: Fixed Looks like this was broken by: https://github.com/apache/spark/commit/4b55482abf899c27da3d55401ad26b4e9247b327 And fixed by: https://github.com/apache/spark/commit/51b1fe1426ffecac6c4644523633ea1562ff9a4e Thanks for quick turnaround! CTAS Doesn't Use the Current Schema --- Key: SPARK-4773 URL: https://issues.apache.org/jira/browse/SPARK-4773 Project: Spark Issue Type: Bug Reporter: David Ross In a CTAS (CREATE TABLE __ AS SELECT __), the current schema isn't used. For example, this all works: {code} CREATE DATABASE test_db; USE test_db; CREATE TABLE test_table_1(s string); SELECT * FROM test_table_1; CREATE TABLE test_table_2 AS SELECT * FROM test_db.test_table_1; SELECT * FROM test_table_2; {code} But this fails: {code} CREATE TABLE test_table_3 AS SELECT * FROM test_table_1; {code} Message: {code} 14/12/06 00:28:57 ERROR thriftserver.SparkExecuteStatementOperation: Error executing query: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:43 Table not found 'test_table_1' at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1324) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1053) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8342) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284) at org.apache.spark.sql.hive.execution.CreateTableAsSelect.metastoreRelation$lzycompute(CreateTableAsSelect.scala:59) at org.apache.spark.sql.hive.execution.CreateTableAsSelect.metastoreRelation(CreateTableAsSelect.scala:55) at org.apache.spark.sql.hive.execution.CreateTableAsSelect.sideEffectResult$lzycompute(CreateTableAsSelect.scala:82) at org.apache.spark.sql.hive.execution.CreateTableAsSelect.sideEffectResult(CreateTableAsSelect.scala:70) at org.apache.spark.sql.hive.execution.CreateTableAsSelect.execute(CreateTableAsSelect.scala:89) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108) at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim12.scala:190) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:193) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:175) at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:150) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:207) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526) at org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:43 Table not found 'test_table_1' at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1079) ... 33 more 14/12/06 00:28:57 WARN thrift.ThriftCLIService: Error fetching results: org.apache.hive.service.cli.HiveSQLException:
[jira] [Created] (SPARK-4773) CTAS Doesn't Use the Current Schema
David Ross created SPARK-4773: - Summary: CTAS Doesn't Use the Current Schema Key: SPARK-4773 URL: https://issues.apache.org/jira/browse/SPARK-4773 Project: Spark Issue Type: Bug Reporter: David Ross In a CTAS (CREATE TABLE __ AS SELECT __), the current schema isn't used. For example, this all works: {code} CREATE DATABASE test_db; USE test_db; CREATE TABLE test_table_1(s string); SELECT * FROM test_table_1; CREATE TABLE test_table_2 AS SELECT * FROM test_db.test_table_1; SELECT * FROM test_table_2; {code} But this fails: {code} CREATE TABLE test_table_3 AS SELECT * FROM test_table_1; {code} Message: {code} 14/12/06 00:28:57 ERROR thriftserver.SparkExecuteStatementOperation: Error executing query: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:43 Table not found 'test_table_1' at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1324) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1053) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8342) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284) at org.apache.spark.sql.hive.execution.CreateTableAsSelect.metastoreRelation$lzycompute(CreateTableAsSelect.scala:59) at org.apache.spark.sql.hive.execution.CreateTableAsSelect.metastoreRelation(CreateTableAsSelect.scala:55) at org.apache.spark.sql.hive.execution.CreateTableAsSelect.sideEffectResult$lzycompute(CreateTableAsSelect.scala:82) at org.apache.spark.sql.hive.execution.CreateTableAsSelect.sideEffectResult(CreateTableAsSelect.scala:70) at org.apache.spark.sql.hive.execution.CreateTableAsSelect.execute(CreateTableAsSelect.scala:89) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108) at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim12.scala:190) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:193) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:175) at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:150) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:207) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526) at org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:43 Table not found 'test_table_1' at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1079) ... 33 more 14/12/06 00:28:57 WARN thrift.ThriftCLIService: Error fetching results: org.apache.hive.service.cli.HiveSQLException: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:43 Table not found 'test_table_1' at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim12.scala:221) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:193) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:175) at