[SPARK-8064] [SQL] Build against Hive 1.2.1 Cherry picked the parts of the initial SPARK-8064 WiP branch needed to get sql/hive to compile against hive 1.2.1. That's the ASF release packaged under org.apache.hive, not any fork.
Tests not run yet: that's what the machines are for Author: Steve Loughran <ste...@hortonworks.com> Author: Cheng Lian <l...@databricks.com> Author: Michael Armbrust <mich...@databricks.com> Author: Patrick Wendell <patr...@databricks.com> Closes #7191 from steveloughran/stevel/feature/SPARK-8064-hive-1.2-002 and squashes the following commits: 7556d85 [Cheng Lian] Updates .q files and corresponding golden files ef4af62 [Steve Loughran] Merge commit '6a92bb09f46a04d6cd8c41bdba3ecb727ebb9030' into stevel/feature/SPARK-8064-hive-1.2-002 6a92bb0 [Cheng Lian] Overrides HiveConf time vars dcbb391 [Cheng Lian] Adds com.twitter:parquet-hadoop-bundle:1.6.0 for Hive Parquet SerDe 0bbe475 [Steve Loughran] SPARK-8064 scalastyle rejects the standard Hadoop ASF license header... fdf759b [Steve Loughran] SPARK-8064 classpath dependency suite to be in sync with shading in final (?) hive-exec spark 7a6c727 [Steve Loughran] SPARK-8064 switch to second staging repo of the spark-hive artifacts. This one has the protobuf-shaded hive-exec jar 376c003 [Steve Loughran] SPARK-8064 purge duplicate protobuf declaration 2c74697 [Steve Loughran] SPARK-8064 switch to the protobuf shaded hive-exec jar with tests to chase it down cc44020 [Steve Loughran] SPARK-8064 remove hadoop.version from runtest.py, as profile will fix that automatically. 6901fa9 [Steve Loughran] SPARK-8064 explicit protobuf import da310dc [Michael Armbrust] Fixes for Hive tests. a775a75 [Steve Loughran] SPARK-8064 cherry-pick-incomplete 7404f34 [Patrick Wendell] Add spark-hive staging repo 832c164 [Steve Loughran] SPARK-8064 try to supress compiler warnings on Complex.java pasted-thrift-code 312c0d4 [Steve Loughran] SPARK-8064 maven/ivy dependency purge; calcite declaration needed fa5ae7b [Steve Loughran] HIVE-8064 fix up hive-thriftserver dependencies and cut back on evicted references in the hive- packages; this keeps mvn and ivy resolution compatible, as the reconciliation policy is "by hand" c188048 [Steve Loughran] SPARK-8064 manage the Hive depencencies to that -things that aren't needed are excluded -sql/hive built with ivy is in sync with the maven reconciliation policy, rather than latest-first 4c8be8d [Cheng Lian] WIP: Partial fix for Thrift server and CLI tests 314eb3c [Steve Loughran] SPARK-8064 deprecation warning noise in one of the tests 17b0341 [Steve Loughran] SPARK-8064 IDE-hinted cleanups of Complex.java to reduce compiler warnings. It's all autogenerated code, so still ugly. d029b92 [Steve Loughran] SPARK-8064 rely on unescaping to have already taken place, so go straight to map of serde options 23eca7e [Steve Loughran] HIVE-8064 handle raw and escaped property tokens 54d9b06 [Steve Loughran] SPARK-8064 fix compilation regression surfacing from rebase 0b12d5f [Steve Loughran] HIVE-8064 use subset of hive complex type whose types deserialize fce73b6 [Steve Loughran] SPARK-8064 poms rely implicitly on the version of kryo chill provides fd3aa5d [Steve Loughran] SPARK-8064 version of hive to d/l from ivy is 1.2.1 dc73ece [Steve Loughran] SPARK-8064 revert to master's determinstic pushdown strategy d3c1e4a [Steve Loughran] SPARK-8064 purge UnionType 051cc21 [Steve Loughran] SPARK-8064 switch to an unshaded version of hive-exec-core, which must have been built with Kryo 2.21. This currently looks for a (locally built) version 1.2.1.spark 6684c60 [Steve Loughran] SPARK-8064 ignore RTE raised in blocking process.exitValue() call e6121e5 [Steve Loughran] SPARK-8064 address review comments aa43dc6 [Steve Loughran] SPARK-8064 more robust teardown on JavaMetastoreDatasourcesSuite f2bff01 [Steve Loughran] SPARK-8064 better takeup of asynchronously caught error text 8b1ef38 [Steve Loughran] SPARK-8064: on failures executing spark-submit in HiveSparkSubmitSuite, print command line and all logged output. 5a9ce6b [Steve Loughran] SPARK-8064 add explicit reason for kv split failure, rather than array OOB. *does not address the issue* 642b63a [Steve Loughran] SPARK-8064 reinstate something cut briefly during rebasing 97194dc [Steve Loughran] SPARK-8064 add extra logging to the YarnClusterSuite classpath test. There should be no reason why this is failing on jenkins, but as it is (and presumably its CP-related), improve the logging including any exception raised. 335357f [Steve Loughran] SPARK-8064 fail fast on thrive process spawning tests on exit codes and/or error string patterns seen in log. 3ed872f [Steve Loughran] SPARK-8064 rename field double to dbl bca55e5 [Steve Loughran] SPARK-8064 missed one of the `date` escapes 41d6479 [Steve Loughran] SPARK-8064 wrap tests with withTable() calls to avoid table-exists exceptions 2bc29a4 [Steve Loughran] SPARK-8064 ParquetSuites to escape `date` field name 1ab9bc4 [Steve Loughran] SPARK-8064 TestHive to use sered2.thrift.test.Complex bf3a249 [Steve Loughran] SPARK-8064: more resubmit than fix; tighten startup timeout to 60s. Still no obvious reason why jersey server code in spark-assembly isn't being picked up -it hasn't been shaded c829b8f [Steve Loughran] SPARK-8064: reinstate yarn-rm-server dependencies to hive-exec to ensure that jersey server is on classpath on hadoop versions < 2.6 0b0f738 [Steve Loughran] SPARK-8064: thrift server startup to fail fast on any exception in the main thread 13abaf1 [Steve Loughran] SPARK-8064 Hive compatibilty tests sin sync with explain/show output from Hive 1.2.1 d14d5ea [Steve Loughran] SPARK-8064: DATE is now a predicate; you can't use it as a field in select ops 26eef1c [Steve Loughran] SPARK-8064: HIVE-9039 renamed TOK_UNION => TOK_UNIONALL while adding TOK_UNIONDISTINCT 3d64523 [Steve Loughran] SPARK-8064 improve diagns on uknown token; fix scalastyle failure d0360f6 [Steve Loughran] SPARK-8064: delicate merge in of the branch vanzin/hive-1.1 1126e5a [Steve Loughran] SPARK-8064: name of unrecognized file format wasn't appearing in error text 8cb09c4 [Steve Loughran] SPARK-8064: test resilience/assertion improvements. Independent of the rest of the work; can be backported to earlier versions dec12cb [Steve Loughran] SPARK-8064: when a CLI suite test fails include the full output text in the raised exception; this ensures that the stdout/stderr is included in jenkins reports, so it becomes possible to diagnose the cause. 463a670 [Steve Loughran] SPARK-8064 run-tests.py adds a hadoop-2.6 profile, and changes info messages to say "w/Hive 1.2.1" in console output 2531099 [Steve Loughran] SPARK-8064 successful attempt to get rid of pentaho as a transitive dependency of hive-exec 1d59100 [Steve Loughran] SPARK-8064 (unsuccessful) attempt to get rid of pentaho as a transitive dependency of hive-exec 75733fc [Steve Loughran] SPARK-8064 change thrift binary startup message to "Starting ThriftBinaryCLIService on port" 3ebc279 [Steve Loughran] SPARK-8064 move strings used to check for http/bin thrift services up into constants c80979d [Steve Loughran] SPARK-8064: SparkSQLCLIDriver drops remote mode support. CLISuite Tests pass instead of timing out: undetected regression? 27e8370 [Steve Loughran] SPARK-8064 fix some style & IDE warnings 00e50d6 [Steve Loughran] SPARK-8064 stop excluding hive shims from dependency (commented out , for now) cb4f142 [Steve Loughran] SPARK-8054 cut pentaho dependency from calcite f7aa9cb [Steve Loughran] SPARK-8064 everything compiles with some commenting and moving of classes into a hive package 6c310b4 [Steve Loughran] SPARK-8064 subclass Hive ServerOptionsProcessor to make it public again f61a675 [Steve Loughran] SPARK-8064 thrift server switched to Hive 1.2.1, though it doesn't compile everywhere 4890b9d [Steve Loughran] SPARK-8064, build against Hive 1.2.1 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a2409d1c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a2409d1c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a2409d1c Branch: refs/heads/master Commit: a2409d1c8e8ddec04b529ac6f6a12b5993f0eeda Parents: b2e4b85 Author: Steve Loughran <ste...@hortonworks.com> Authored: Mon Aug 3 15:24:34 2015 -0700 Committer: Michael Armbrust <mich...@databricks.com> Committed: Mon Aug 3 15:24:42 2015 -0700 ---------------------------------------------------------------------- core/pom.xml | 20 - dev/run-tests.py | 7 +- pom.xml | 654 +++++++++- sbin/spark-daemon.sh | 2 +- sql/catalyst/pom.xml | 1 - .../sql/parquet/ParquetCompatibilityTest.scala | 13 +- sql/hive-thriftserver/pom.xml | 22 +- .../HiveServerServerOptionsProcessor.scala | 37 + .../hive/thriftserver/HiveThriftServer2.scala | 27 +- .../SparkExecuteStatementOperation.scala | 9 +- .../hive/thriftserver/SparkSQLCLIDriver.scala | 56 +- .../hive/thriftserver/SparkSQLCLIService.scala | 13 +- .../thriftserver/SparkSQLSessionManager.scala | 11 +- .../spark/sql/hive/thriftserver/CliSuite.scala | 75 +- .../thriftserver/HiveThriftServer2Suites.scala | 40 +- .../hive/execution/HiveCompatibilitySuite.scala | 29 +- sql/hive/pom.xml | 92 +- .../org/apache/spark/sql/hive/HiveContext.scala | 114 +- .../spark/sql/hive/HiveMetastoreCatalog.scala | 5 +- .../org/apache/spark/sql/hive/HiveQl.scala | 97 +- .../org/apache/spark/sql/hive/HiveShim.scala | 15 +- .../spark/sql/hive/client/ClientInterface.scala | 4 + .../spark/sql/hive/client/ClientWrapper.scala | 5 +- .../apache/spark/sql/hive/client/HiveShim.scala | 2 +- .../sql/hive/client/IsolatedClientLoader.scala | 2 +- .../apache/spark/sql/hive/client/package.scala | 2 +- .../hive/execution/InsertIntoHiveTable.scala | 2 +- .../hive/execution/ScriptTransformation.scala | 6 +- .../org/apache/spark/sql/hive/hiveUDFs.scala | 2 +- .../spark/sql/hive/hiveWriterContainers.scala | 2 +- .../apache/spark/sql/hive/orc/OrcFilters.scala | 6 +- .../apache/spark/sql/hive/test/TestHive.scala | 36 +- .../org/apache/spark/sql/hive/test/Complex.java | 1139 ++++++++++++++++++ .../sql/hive/JavaMetastoreDataSourcesSuite.java | 6 +- ... operator-0-ee7f6a60a9792041b85b18cda56429bf | 1 + ...to_string-1-db089ff46f9826c7883198adacdfad59 | 6 +- ...s_star_by-5-41d474f5e6d7c61c36f74b4bec4e9e44 | 500 ++++++++ ...s_star_by-5-6888c7f7894910538d82eefa23443189 | 500 -------- ...ble_alter-3-2a91d52719cf4552ebeb867204552a26 | 2 +- ..._db_table-4-b585371b624cbab2616a49f553a870a0 | 2 +- ...delimited-1-2a91d52719cf4552ebeb867204552a26 | 2 +- ...ble_serde-1-2a91d52719cf4552ebeb867204552a26 | 2 +- ...functions-0-45a7762c39f1b0f26f076220e2764043 | 21 + ...roperties-1-be4adb893c7f946ebd76a648ce3cc1ae | 2 +- ...f_date_add-1-efb60fcbd6d78ad35257fb1ec39ace2 | 4 +- ..._date_sub-1-7efeb74367835ade71e5e42b22f8ced4 | 4 +- ..._datediff-1-34ae7a68b13c2bc9a89f61acf2edd4c5 | 2 +- .../udf_day-0-c4c503756384ff1220222d84fd25e756 | 2 +- .../udf_day-1-87168babe1110fe4c38269843414ca4 | 11 +- ...ayofmonth-0-7b2caf942528656555cf19c261a18502 | 2 +- ...ayofmonth-1-ca24d07102ad264d79ff30c64a73a7e8 | 11 +- .../udf_if-0-b7ffa85b5785cccef2af1b285348cc2c | 2 +- .../udf_if-1-30cf7f51f92b5684e556deff3032d49a | 2 +- .../udf_if-1-b7ffa85b5785cccef2af1b285348cc2c | 2 +- .../udf_if-2-30cf7f51f92b5684e556deff3032d49a | 2 +- ...df_minute-0-9a38997c1f41f4afe00faa0abc471aee | 2 +- ...df_minute-1-16995573ac4f4a1b047ad6ee88699e48 | 8 +- ...udf_month-0-9a38997c1f41f4afe00faa0abc471aee | 2 +- ...udf_month-1-16995573ac4f4a1b047ad6ee88699e48 | 8 +- .../udf_std-1-6759bde0e50a3607b7c3fd5a93cbd027 | 2 +- ...df_stddev-1-18e1d598820013453fad45852e1a303d | 2 +- .../union3-0-6a8a35102de1b0b88c6721a704eb174d | 0 .../union3-0-99620f72f0282904846a596ca5b3e46c | 0 .../union3-2-2a1dcd937f117f1955a169592b96d5f9 | 0 .../union3-2-90ca96ea59fd45cf0af8c020ae77c908 | 0 .../union3-3-72b149ccaef751bcfe55d5ca37cb5fd7 | 4 + .../union3-3-8fc63f8edb2969a63cd4485f1867ba97 | 4 - .../clientpositive/parenthesis_star_by.q | 2 +- .../ql/src/test/queries/clientpositive/union3.q | 11 +- .../sql/hive/ClasspathDependenciesSuite.scala | 110 ++ .../spark/sql/hive/HiveSparkSubmitSuite.scala | 29 +- .../sql/hive/InsertIntoHiveTableSuite.scala | 7 +- .../hive/ParquetHiveCompatibilitySuite.scala | 9 + .../apache/spark/sql/hive/StatisticsSuite.scala | 3 + .../spark/sql/hive/client/VersionsSuite.scala | 6 +- .../sql/hive/execution/HiveQuerySuite.scala | 89 +- .../spark/sql/hive/execution/PruningSuite.scala | 8 +- .../sql/hive/execution/SQLQuerySuite.scala | 140 ++- .../sql/hive/orc/OrcHadoopFsRelationSuite.scala | 8 +- .../hive/orc/OrcPartitionDiscoverySuite.scala | 3 +- .../apache/spark/sql/hive/parquetSuites.scala | 327 ++--- yarn/pom.xml | 10 - .../spark/deploy/yarn/YarnClusterSuite.scala | 24 +- 83 files changed, 3365 insertions(+), 1088 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/core/pom.xml ---------------------------------------------------------------------- diff --git a/core/pom.xml b/core/pom.xml index 2026787..0e53a79 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -46,30 +46,10 @@ <dependency> <groupId>com.twitter</groupId> <artifactId>chill_${scala.binary.version}</artifactId> - <exclusions> - <exclusion> - <groupId>org.ow2.asm</groupId> - <artifactId>asm</artifactId> - </exclusion> - <exclusion> - <groupId>org.ow2.asm</groupId> - <artifactId>asm-commons</artifactId> - </exclusion> - </exclusions> </dependency> <dependency> <groupId>com.twitter</groupId> <artifactId>chill-java</artifactId> - <exclusions> - <exclusion> - <groupId>org.ow2.asm</groupId> - <artifactId>asm</artifactId> - </exclusion> - <exclusion> - <groupId>org.ow2.asm</groupId> - <artifactId>asm-commons</artifactId> - </exclusion> - </exclusions> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/dev/run-tests.py ---------------------------------------------------------------------- diff --git a/dev/run-tests.py b/dev/run-tests.py index b6d1814..d1852b9 100755 --- a/dev/run-tests.py +++ b/dev/run-tests.py @@ -273,6 +273,7 @@ def get_hadoop_profiles(hadoop_version): "hadoop2.0": ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], "hadoop2.2": ["-Pyarn", "-Phadoop-2.2"], "hadoop2.3": ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], + "hadoop2.6": ["-Pyarn", "-Phadoop-2.6"], } if hadoop_version in sbt_maven_hadoop_profiles: @@ -289,7 +290,7 @@ def build_spark_maven(hadoop_version): mvn_goals = ["clean", "package", "-DskipTests"] profiles_and_goals = build_profiles + mvn_goals - print("[info] Building Spark (w/Hive 0.13.1) using Maven with these arguments: ", + print("[info] Building Spark (w/Hive 1.2.1) using Maven with these arguments: ", " ".join(profiles_and_goals)) exec_maven(profiles_and_goals) @@ -305,14 +306,14 @@ def build_spark_sbt(hadoop_version): "streaming-kinesis-asl-assembly/assembly"] profiles_and_goals = build_profiles + sbt_goals - print("[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: ", + print("[info] Building Spark (w/Hive 1.2.1) using SBT with these arguments: ", " ".join(profiles_and_goals)) exec_sbt(profiles_and_goals) def build_apache_spark(build_tool, hadoop_version): - """Will build Spark against Hive v0.13.1 given the passed in build tool (either `sbt` or + """Will build Spark against Hive v1.2.1 given the passed in build tool (either `sbt` or `maven`). Defaults to using `sbt`.""" set_title_and_block("Building Spark", "BLOCK_BUILD") http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/pom.xml ---------------------------------------------------------------------- diff --git a/pom.xml b/pom.xml index be0dac9..a958cec 100644 --- a/pom.xml +++ b/pom.xml @@ -134,11 +134,12 @@ <curator.version>2.4.0</curator.version> <hive.group>org.spark-project.hive</hive.group> <!-- Version used in Maven Hive dependency --> - <hive.version>0.13.1a</hive.version> + <hive.version>1.2.1.spark</hive.version> <!-- Version used for internal directory structure --> - <hive.version.short>0.13.1</hive.version.short> + <hive.version.short>1.2.1</hive.version.short> <derby.version>10.10.1.1</derby.version> <parquet.version>1.7.0</parquet.version> + <hive.parquet.version>1.6.0</hive.parquet.version> <jblas.version>1.2.4</jblas.version> <jetty.version>8.1.14.v20131031</jetty.version> <orbit.version>3.0.0.v201112011016</orbit.version> @@ -151,7 +152,10 @@ <jets3t.version>0.7.1</jets3t.version> <aws.java.sdk.version>1.9.16</aws.java.sdk.version> <aws.kinesis.client.version>1.2.1</aws.kinesis.client.version> + <!-- org.apache.httpcomponents/httpclient--> <commons.httpclient.version>4.3.2</commons.httpclient.version> + <!-- commons-httpclient/commons-httpclient--> + <httpclient.classic.version>3.1</httpclient.classic.version> <commons.math3.version>3.4.1</commons.math3.version> <scala.version>2.10.4</scala.version> <scala.binary.version>2.10</scala.binary.version> @@ -161,6 +165,23 @@ <fasterxml.jackson.version>2.4.4</fasterxml.jackson.version> <snappy.version>1.1.1.7</snappy.version> <netlib.java.version>1.1.2</netlib.java.version> + <calcite.version>1.2.0-incubating</calcite.version> + <commons-codec.version>1.10</commons-codec.version> + <!-- org.apache.commons/commons-lang/--> + <commons-lang2.version>2.6</commons-lang2.version> + <!-- org.apache.commons/commons-lang3/--> + <commons-lang3.version>3.3.2</commons-lang3.version> + <datanucleus-core.version>3.2.10</datanucleus-core.version> + <janino.version>2.7.8</janino.version> + <jersey.version>1.9</jersey.version> + <joda.version>2.5</joda.version> + <jodd.version>3.5.2</jodd.version> + <jsr305.version>1.3.9</jsr305.version> + <libthrift.version>0.9.2</libthrift.version> + + <!-- For maven shade plugin (see SPARK-8819) --> + <create.dependency.reduced.pom>false</create.dependency.reduced.pom> + <test.java.home>${java.home}</test.java.home> <!-- @@ -188,7 +209,6 @@ <MaxPermGen>512m</MaxPermGen> <CodeCacheSize>512m</CodeCacheSize> </properties> - <repositories> <repository> <id>central</id> @@ -247,6 +267,14 @@ </snapshots> </repository> <repository> + <id>spark-hive-staging</id> + <name>Staging Repo for Hive 1.2.1 (Spark Version)</name> + <url>https://oss.sonatype.org/content/repositories/orgspark-project-1113</url> + <releases> + <enabled>true</enabled> + </releases> + </repository> + <repository> <id>mapr-repo</id> <name>MapR Repository</name> <url>http://repository.mapr.com/maven/</url> @@ -257,12 +285,13 @@ <enabled>false</enabled> </snapshots> </repository> + <!-- returning unauthorized on some operations --> <repository> <id>spring-releases</id> <name>Spring Release Repository</name> <url>https://repo.spring.io/libs-release</url> <releases> - <enabled>true</enabled> + <enabled>false</enabled> </releases> <snapshots> <enabled>false</enabled> @@ -402,12 +431,17 @@ <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId> - <version>3.3.2</version> + <version>${commons-lang3.version}</version> + </dependency> + <dependency> + <groupId>org.apache.commons</groupId> + <artifactId>commons-lang</artifactId> + <version>${commons-lang2.version}</version> </dependency> <dependency> <groupId>commons-codec</groupId> <artifactId>commons-codec</artifactId> - <version>1.10</version> + <version>${commons-codec.version}</version> </dependency> <dependency> <groupId>org.apache.commons</groupId> @@ -422,7 +456,12 @@ <dependency> <groupId>com.google.code.findbugs</groupId> <artifactId>jsr305</artifactId> - <version>1.3.9</version> + <version>${jsr305.version}</version> + </dependency> + <dependency> + <groupId>commons-httpclient</groupId> + <artifactId>commons-httpclient</artifactId> + <version>${httpclient.classic.version}</version> </dependency> <dependency> <groupId>org.apache.httpcomponents</groupId> @@ -439,6 +478,16 @@ <artifactId>selenium-java</artifactId> <version>2.42.2</version> <scope>test</scope> + <exclusions> + <exclusion> + <groupId>com.google.guava</groupId> + <artifactId>guava</artifactId> + </exclusion> + <exclusion> + <groupId>io.netty</groupId> + <artifactId>netty</artifactId> + </exclusion> + </exclusions> </dependency> <!-- Added for selenium only, and should match its dependent version: --> <dependency> @@ -624,16 +673,27 @@ <dependency> <groupId>com.sun.jersey</groupId> <artifactId>jersey-server</artifactId> - <version>1.9</version> + <version>${jersey.version}</version> <scope>${hadoop.deps.scope}</scope> </dependency> <dependency> <groupId>com.sun.jersey</groupId> <artifactId>jersey-core</artifactId> - <version>1.9</version> + <version>${jersey.version}</version> <scope>${hadoop.deps.scope}</scope> </dependency> <dependency> + <groupId>com.sun.jersey</groupId> + <artifactId>jersey-json</artifactId> + <version>${jersey.version}</version> + <exclusions> + <exclusion> + <groupId>stax</groupId> + <artifactId>stax-api</artifactId> + </exclusion> + </exclusions> + </dependency> + <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-compiler</artifactId> <version>${scala.version}</version> @@ -1022,45 +1082,357 @@ <artifactId>hive-beeline</artifactId> <version>${hive.version}</version> <scope>${hive.deps.scope}</scope> + <exclusions> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-common</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-exec</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-jdbc</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-metastore</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-service</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-shims</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.thrift</groupId> + <artifactId>libthrift</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-api</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-log4j12</artifactId> + </exclusion> + <exclusion> + <groupId>log4j</groupId> + <artifactId>log4j</artifactId> + </exclusion> + <exclusion> + <groupId>commons-logging</groupId> + <artifactId>commons-logging</artifactId> + </exclusion> + </exclusions> </dependency> <dependency> <groupId>${hive.group}</groupId> <artifactId>hive-cli</artifactId> <version>${hive.version}</version> <scope>${hive.deps.scope}</scope> + <exclusions> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-common</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-exec</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-jdbc</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-metastore</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-serde</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-service</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-shims</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.thrift</groupId> + <artifactId>libthrift</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-api</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-log4j12</artifactId> + </exclusion> + <exclusion> + <groupId>log4j</groupId> + <artifactId>log4j</artifactId> + </exclusion> + <exclusion> + <groupId>commons-logging</groupId> + <artifactId>commons-logging</artifactId> + </exclusion> + </exclusions> </dependency> <dependency> <groupId>${hive.group}</groupId> - <artifactId>hive-exec</artifactId> + <artifactId>hive-common</artifactId> <version>${hive.version}</version> <scope>${hive.deps.scope}</scope> <exclusions> <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-shims</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.ant</groupId> + <artifactId>ant</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.zookeeper</groupId> + <artifactId>zookeeper</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-api</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-log4j12</artifactId> + </exclusion> + <exclusion> + <groupId>log4j</groupId> + <artifactId>log4j</artifactId> + </exclusion> + <exclusion> <groupId>commons-logging</groupId> <artifactId>commons-logging</artifactId> </exclusion> + </exclusions> + </dependency> + + <dependency> + <groupId>${hive.group}</groupId> + <artifactId>hive-exec</artifactId> +<!-- + <classifier>core</classifier> +--> + <version>${hive.version}</version> + <scope>${hive.deps.scope}</scope> + <exclusions> + + <!-- pull this in when needed; the explicit definition culls the surplis--> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-metastore</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-shims</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-ant</artifactId> + </exclusion> + <!-- break the loop --> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>spark-client</artifactId> + </exclusion> + + <!-- excluded dependencies & transitive. + Some may be needed to be explicitly included--> + <exclusion> + <groupId>ant</groupId> + <artifactId>ant</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.ant</groupId> + <artifactId>ant</artifactId> + </exclusion> <exclusion> <groupId>com.esotericsoftware.kryo</groupId> <artifactId>kryo</artifactId> </exclusion> <exclusion> + <groupId>commons-codec</groupId> + <artifactId>commons-codec</artifactId> + </exclusion> + <exclusion> + <groupId>commons-httpclient</groupId> + <artifactId>commons-httpclient</artifactId> + </exclusion> + <exclusion> <groupId>org.apache.avro</groupId> <artifactId>avro-mapred</artifactId> </exclusion> + <!-- this is needed and must be explicitly included later--> + <exclusion> + <groupId>org.apache.calcite</groupId> + <artifactId>calcite-core</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.curator</groupId> + <artifactId>apache-curator</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.curator</groupId> + <artifactId>curator-client</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.curator</groupId> + <artifactId>curator-framework</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.thrift</groupId> + <artifactId>libthrift</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.thrift</groupId> + <artifactId>libfb303</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.zookeeper</groupId> + <artifactId>zookeeper</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-api</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-log4j12</artifactId> + </exclusion> + <exclusion> + <groupId>log4j</groupId> + <artifactId>log4j</artifactId> + </exclusion> + <exclusion> + <groupId>commons-logging</groupId> + <artifactId>commons-logging</artifactId> + </exclusion> </exclusions> </dependency> <dependency> <groupId>${hive.group}</groupId> <artifactId>hive-jdbc</artifactId> <version>${hive.version}</version> - <scope>${hive.deps.scope}</scope> + <exclusions> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-common</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-common</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-metastore</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-serde</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-service</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-shims</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.httpcomponents</groupId> + <artifactId>httpclient</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.httpcomponents</groupId> + <artifactId>httpcore</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.curator</groupId> + <artifactId>curator-framework</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.thrift</groupId> + <artifactId>libthrift</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.thrift</groupId> + <artifactId>libfb303</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.zookeeper</groupId> + <artifactId>zookeeper</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-api</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-log4j12</artifactId> + </exclusion> + <exclusion> + <groupId>log4j</groupId> + <artifactId>log4j</artifactId> + </exclusion> + <exclusion> + <groupId>commons-logging</groupId> + <artifactId>commons-logging</artifactId> + </exclusion> + </exclusions> </dependency> + <dependency> <groupId>${hive.group}</groupId> <artifactId>hive-metastore</artifactId> <version>${hive.version}</version> <scope>${hive.deps.scope}</scope> + <exclusions> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-serde</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-shims</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.thrift</groupId> + <artifactId>libfb303</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.thrift</groupId> + <artifactId>libthrift</artifactId> + </exclusion> + <exclusion> + <groupId>com.google.guava</groupId> + <artifactId>guava</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-api</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-log4j12</artifactId> + </exclusion> + </exclusions> </dependency> + <dependency> <groupId>${hive.group}</groupId> <artifactId>hive-serde</artifactId> @@ -1068,12 +1440,141 @@ <scope>${hive.deps.scope}</scope> <exclusions> <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-common</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-shims</artifactId> + </exclusion> + <exclusion> + <groupId>commons-codec</groupId> + <artifactId>commons-codec</artifactId> + </exclusion> + <exclusion> + <groupId>com.google.code.findbugs</groupId> + <artifactId>jsr305</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.avro</groupId> + <artifactId>avro</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.thrift</groupId> + <artifactId>libthrift</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.thrift</groupId> + <artifactId>libfb303</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-api</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-log4j12</artifactId> + </exclusion> + <exclusion> + <groupId>log4j</groupId> + <artifactId>log4j</artifactId> + </exclusion> + <exclusion> <groupId>commons-logging</groupId> <artifactId>commons-logging</artifactId> </exclusion> + </exclusions> + </dependency> + + <dependency> + <groupId>${hive.group}</groupId> + <artifactId>hive-service</artifactId> + <version>${hive.version}</version> + <scope>${hive.deps.scope}</scope> + <exclusions> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-common</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-exec</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-metastore</artifactId> + </exclusion> + <exclusion> + <groupId>${hive.group}</groupId> + <artifactId>hive-shims</artifactId> + </exclusion> + <exclusion> + <groupId>commons-codec</groupId> + <artifactId>commons-codec</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.curator</groupId> + <artifactId>curator-framework</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.curator</groupId> + <artifactId>curator-recipes</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.thrift</groupId> + <artifactId>libfb303</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.thrift</groupId> + <artifactId>libthrift</artifactId> + </exclusion> + </exclusions> + </dependency> + + <!-- hive shims pulls in hive 0.23 and a transitive dependency of the Hadoop version + Hive was built against. This dependency cuts out the YARN/hadoop dependency, which + is needed by Hive to submit work to a YARN cluster.--> + <dependency> + <groupId>${hive.group}</groupId> + <artifactId>hive-shims</artifactId> + <version>${hive.version}</version> + <scope>${hive.deps.scope}</scope> + <exclusions> + <exclusion> + <groupId>com.google.guava</groupId> + <artifactId>guava</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-yarn-server-resourcemanager</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.curator</groupId> + <artifactId>curator-framework</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.thrift</groupId> + <artifactId>libthrift</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.zookeeper</groupId> + <artifactId>zookeeper</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-api</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-log4j12</artifactId> + </exclusion> + <exclusion> + <groupId>log4j</groupId> + <artifactId>log4j</artifactId> + </exclusion> <exclusion> <groupId>commons-logging</groupId> - <artifactId>commons-logging-api</artifactId> + <artifactId>commons-logging</artifactId> </exclusion> </exclusions> </dependency> @@ -1096,6 +1597,12 @@ <scope>${parquet.test.deps.scope}</scope> </dependency> <dependency> + <groupId>com.twitter</groupId> + <artifactId>parquet-hadoop-bundle</artifactId> + <version>${hive.parquet.version}</version> + <scope>runtime</scope> + </dependency> + <dependency> <groupId>org.apache.flume</groupId> <artifactId>flume-ng-core</artifactId> <version>${flume.version}</version> @@ -1135,6 +1642,125 @@ </exclusion> </exclusions> </dependency> + <dependency> + <groupId>org.apache.calcite</groupId> + <artifactId>calcite-core</artifactId> + <version>${calcite.version}</version> + <exclusions> + <exclusion> + <groupId>com.fasterxml.jackson.core</groupId> + <artifactId>jackson-annotations</artifactId> + </exclusion> + <exclusion> + <groupId>com.fasterxml.jackson.core</groupId> + <artifactId>jackson-core</artifactId> + </exclusion> + <exclusion> + <groupId>com.fasterxml.jackson.core</groupId> + <artifactId>jackson-databind</artifactId> + </exclusion> + <exclusion> + <groupId>com.google.guava</groupId> + <artifactId>guava</artifactId> + </exclusion> + <exclusion> + <groupId>com.google.code.findbugs</groupId> + <artifactId>jsr305</artifactId> + </exclusion> + <exclusion> + <groupId>org.codehaus.janino</groupId> + <artifactId>janino</artifactId> + </exclusion> + <!-- hsqldb interferes with the use of derby as the default db + in hive's use of datanucleus. + --> + <exclusion> + <groupId>org.hsqldb</groupId> + <artifactId>hsqldb</artifactId> + </exclusion> + <exclusion> + <groupId>org.pentaho</groupId> + <artifactId>pentaho-aggdesigner-algorithm</artifactId> + </exclusion> + </exclusions> + </dependency> + <dependency> + <groupId>org.apache.calcite</groupId> + <artifactId>calcite-avatica</artifactId> + <version>${calcite.version}</version> + <exclusions> + <exclusion> + <groupId>com.fasterxml.jackson.core</groupId> + <artifactId>jackson-annotations</artifactId> + </exclusion> + <exclusion> + <groupId>com.fasterxml.jackson.core</groupId> + <artifactId>jackson-core</artifactId> + </exclusion> + <exclusion> + <groupId>com.fasterxml.jackson.core</groupId> + <artifactId>jackson-databind</artifactId> + </exclusion> + </exclusions> + </dependency> + <dependency> + <groupId>org.codehaus.janino</groupId> + <artifactId>janino</artifactId> + <version>${janino.version}</version> + </dependency> + <dependency> + <groupId>joda-time</groupId> + <artifactId>joda-time</artifactId> + <version>${joda.version}</version> + </dependency> + <dependency> + <groupId>org.jodd</groupId> + <artifactId>jodd-core</artifactId> + <version>${jodd.version}</version> + </dependency> + <dependency> + <groupId>org.datanucleus</groupId> + <artifactId>datanucleus-core</artifactId> + <version>${datanucleus-core.version}</version> + </dependency> + <dependency> + <groupId>org.apache.thrift</groupId> + <artifactId>libthrift</artifactId> + <version>${libthrift.version}</version> + <exclusions> + <exclusion> + <groupId>org.apache.httpcomponents</groupId> + <artifactId>httpclient</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.httpcomponents</groupId> + <artifactId>httpcore</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-api</artifactId> + </exclusion> + </exclusions> + </dependency> + <dependency> + <groupId>org.apache.thrift</groupId> + <artifactId>libfb303</artifactId> + <version>${libthrift.version}</version> + <exclusions> + <exclusion> + <groupId>org.apache.httpcomponents</groupId> + <artifactId>httpclient</artifactId> + </exclusion> + <exclusion> + <groupId>org.apache.httpcomponents</groupId> + <artifactId>httpcore</artifactId> + </exclusion> + <exclusion> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-api</artifactId> + </exclusion> + </exclusions> + </dependency> </dependencies> </dependencyManagement> @@ -1271,6 +1897,8 @@ <spark.ui.showConsoleProgress>false</spark.ui.showConsoleProgress> <spark.driver.allowMultipleContexts>true</spark.driver.allowMultipleContexts> <spark.unsafe.exceptionOnMemoryLeak>true</spark.unsafe.exceptionOnMemoryLeak> + <!-- Needed by sql/hive tests. --> + <test.src.tables>src</test.src.tables> </systemProperties> <failIfNoTests>false</failIfNoTests> </configuration> @@ -1305,6 +1933,8 @@ <spark.ui.showConsoleProgress>false</spark.ui.showConsoleProgress> <spark.driver.allowMultipleContexts>true</spark.driver.allowMultipleContexts> <spark.unsafe.exceptionOnMemoryLeak>true</spark.unsafe.exceptionOnMemoryLeak> + <!-- Needed by sql/hive tests. --> + <test.src.tables>__not_used__</test.src.tables> </systemProperties> </configuration> <executions> http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/sbin/spark-daemon.sh ---------------------------------------------------------------------- diff --git a/sbin/spark-daemon.sh b/sbin/spark-daemon.sh index de762ac..0fbe795 100755 --- a/sbin/spark-daemon.sh +++ b/sbin/spark-daemon.sh @@ -29,7 +29,7 @@ # SPARK_NICENESS The scheduling priority for daemons. Defaults to 0. ## -usage="Usage: spark-daemon.sh [--config <conf-dir>] (start|stop|status) <spark-command> <spark-instance-number> <args...>" +usage="Usage: spark-daemon.sh [--config <conf-dir>] (start|stop|submit|status) <spark-command> <spark-instance-number> <args...>" # if no args specified, show usage if [ $# -le 1 ]; then http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/sql/catalyst/pom.xml ---------------------------------------------------------------------- diff --git a/sql/catalyst/pom.xml b/sql/catalyst/pom.xml index f4b1cc3..75ab575 100644 --- a/sql/catalyst/pom.xml +++ b/sql/catalyst/pom.xml @@ -66,7 +66,6 @@ <dependency> <groupId>org.codehaus.janino</groupId> <artifactId>janino</artifactId> - <version>2.7.8</version> </dependency> </dependencies> <build> http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetCompatibilityTest.scala ---------------------------------------------------------------------- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetCompatibilityTest.scala b/sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetCompatibilityTest.scala index b4cdfd9..5747893 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetCompatibilityTest.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetCompatibilityTest.scala @@ -31,6 +31,14 @@ import org.apache.spark.util.Utils abstract class ParquetCompatibilityTest extends QueryTest with ParquetTest with BeforeAndAfterAll { protected var parquetStore: File = _ + /** + * Optional path to a staging subdirectory which may be created during query processing + * (Hive does this). + * Parquet files under this directory will be ignored in [[readParquetSchema()]] + * @return an optional staging directory to ignore when scanning for parquet files. + */ + protected def stagingDir: Option[String] = None + override protected def beforeAll(): Unit = { parquetStore = Utils.createTempDir(namePrefix = "parquet-compat_") parquetStore.delete() @@ -43,7 +51,10 @@ abstract class ParquetCompatibilityTest extends QueryTest with ParquetTest with def readParquetSchema(path: String): MessageType = { val fsPath = new Path(path) val fs = fsPath.getFileSystem(configuration) - val parquetFiles = fs.listStatus(fsPath).toSeq.filterNot(_.getPath.getName.startsWith("_")) + val parquetFiles = fs.listStatus(fsPath).toSeq.filterNot { status => + status.getPath.getName.startsWith("_") || + stagingDir.map(status.getPath.getName.startsWith).getOrElse(false) + } val footers = ParquetFileReader.readAllFootersInParallel(configuration, parquetFiles, true) footers.head.getParquetMetadata.getFileMetaData.getSchema } http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/sql/hive-thriftserver/pom.xml ---------------------------------------------------------------------- diff --git a/sql/hive-thriftserver/pom.xml b/sql/hive-thriftserver/pom.xml index 73e6ccd..2dfbcb2 100644 --- a/sql/hive-thriftserver/pom.xml +++ b/sql/hive-thriftserver/pom.xml @@ -62,19 +62,29 @@ </dependency> <dependency> <groupId>${hive.group}</groupId> + <artifactId>hive-service</artifactId> + </dependency> + <dependency> + <groupId>${hive.group}</groupId> <artifactId>hive-beeline</artifactId> </dependency> + <dependency> + <groupId>com.sun.jersey</groupId> + <artifactId>jersey-core</artifactId> + </dependency> + <dependency> + <groupId>com.sun.jersey</groupId> + <artifactId>jersey-json</artifactId> + </dependency> + <dependency> + <groupId>com.sun.jersey</groupId> + <artifactId>jersey-server</artifactId> + </dependency> <!-- Added for selenium: --> <dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-java</artifactId> <scope>test</scope> - <exclusions> - <exclusion> - <groupId>io.netty</groupId> - <artifactId>netty</artifactId> - </exclusion> - </exclusions> </dependency> </dependencies> <build> http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/sql/hive-thriftserver/src/main/scala/org/apache/hive/service/server/HiveServerServerOptionsProcessor.scala ---------------------------------------------------------------------- diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/hive/service/server/HiveServerServerOptionsProcessor.scala b/sql/hive-thriftserver/src/main/scala/org/apache/hive/service/server/HiveServerServerOptionsProcessor.scala new file mode 100644 index 0000000..2228f65 --- /dev/null +++ b/sql/hive-thriftserver/src/main/scala/org/apache/hive/service/server/HiveServerServerOptionsProcessor.scala @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hive.service.server + +import org.apache.hive.service.server.HiveServer2.{StartOptionExecutor, ServerOptionsProcessor} + +/** + * Class to upgrade a package-private class to public, and + * implement a `process()` operation consistent with + * the behavior of older Hive versions + * @param serverName name of the hive server + */ +private[apache] class HiveServerServerOptionsProcessor(serverName: String) + extends ServerOptionsProcessor(serverName) { + + def process(args: Array[String]): Boolean = { + // A parse failure automatically triggers a system exit + val response = super.parse(args) + val executor = response.getServerOptionsExecutor() + // return true if the parsed option was to start the service + executor.isInstanceOf[StartOptionExecutor] + } +} http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala ---------------------------------------------------------------------- diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala index b7db80d..9c04734 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala @@ -17,6 +17,9 @@ package org.apache.spark.sql.hive.thriftserver +import java.util.Locale +import java.util.concurrent.atomic.AtomicBoolean + import scala.collection.mutable import scala.collection.mutable.ArrayBuffer @@ -24,7 +27,7 @@ import org.apache.commons.logging.LogFactory import org.apache.hadoop.hive.conf.HiveConf import org.apache.hadoop.hive.conf.HiveConf.ConfVars import org.apache.hive.service.cli.thrift.{ThriftBinaryCLIService, ThriftHttpCLIService} -import org.apache.hive.service.server.{HiveServer2, ServerOptionsProcessor} +import org.apache.hive.service.server.{HiveServerServerOptionsProcessor, HiveServer2} import org.apache.spark.annotation.DeveloperApi import org.apache.spark.scheduler.{SparkListener, SparkListenerApplicationEnd, SparkListenerJobStart} @@ -65,7 +68,7 @@ object HiveThriftServer2 extends Logging { } def main(args: Array[String]) { - val optionsProcessor = new ServerOptionsProcessor("HiveThriftServer2") + val optionsProcessor = new HiveServerServerOptionsProcessor("HiveThriftServer2") if (!optionsProcessor.process(args)) { System.exit(-1) } @@ -241,9 +244,12 @@ object HiveThriftServer2 extends Logging { private[hive] class HiveThriftServer2(hiveContext: HiveContext) extends HiveServer2 with ReflectedCompositeService { + // state is tracked internally so that the server only attempts to shut down if it successfully + // started, and then once only. + private val started = new AtomicBoolean(false) override def init(hiveConf: HiveConf) { - val sparkSqlCliService = new SparkSQLCLIService(hiveContext) + val sparkSqlCliService = new SparkSQLCLIService(this, hiveContext) setSuperField(this, "cliService", sparkSqlCliService) addService(sparkSqlCliService) @@ -259,8 +265,19 @@ private[hive] class HiveThriftServer2(hiveContext: HiveContext) } private def isHTTPTransportMode(hiveConf: HiveConf): Boolean = { - val transportMode: String = hiveConf.getVar(ConfVars.HIVE_SERVER2_TRANSPORT_MODE) - transportMode.equalsIgnoreCase("http") + val transportMode = hiveConf.getVar(ConfVars.HIVE_SERVER2_TRANSPORT_MODE) + transportMode.toLowerCase(Locale.ENGLISH).equals("http") + } + + + override def start(): Unit = { + super.start() + started.set(true) } + override def stop(): Unit = { + if (started.getAndSet(false)) { + super.stop() + } + } } http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala ---------------------------------------------------------------------- diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala index e875888..833bf62 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala @@ -32,8 +32,7 @@ import org.apache.hive.service.cli._ import org.apache.hadoop.hive.ql.metadata.Hive import org.apache.hadoop.hive.ql.metadata.HiveException import org.apache.hadoop.hive.ql.session.SessionState -import org.apache.hadoop.hive.shims.ShimLoader -import org.apache.hadoop.security.UserGroupInformation +import org.apache.hadoop.hive.shims.Utils import org.apache.hive.service.cli.operation.ExecuteStatementOperation import org.apache.hive.service.cli.session.HiveSession @@ -146,7 +145,7 @@ private[hive] class SparkExecuteStatementOperation( } else { val parentSessionState = SessionState.get() val hiveConf = getConfigForOperation() - val sparkServiceUGI = ShimLoader.getHadoopShims.getUGIForConf(hiveConf) + val sparkServiceUGI = Utils.getUGI() val sessionHive = getCurrentHive() val currentSqlSession = hiveContext.currentSession @@ -174,7 +173,7 @@ private[hive] class SparkExecuteStatementOperation( } try { - ShimLoader.getHadoopShims().doAs(sparkServiceUGI, doAsAction) + sparkServiceUGI.doAs(doAsAction) } catch { case e: Exception => setOperationException(new HiveSQLException(e)) @@ -201,7 +200,7 @@ private[hive] class SparkExecuteStatementOperation( } } - private def runInternal(): Unit = { + override def runInternal(): Unit = { statementId = UUID.randomUUID().toString logInfo(s"Running query '$statement' with $statementId") setState(OperationState.RUNNING) http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala ---------------------------------------------------------------------- diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala index f66a17b..d388614 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala @@ -20,9 +20,10 @@ package org.apache.spark.sql.hive.thriftserver import scala.collection.JavaConversions._ import java.io._ -import java.util.{ArrayList => JArrayList} +import java.util.{ArrayList => JArrayList, Locale} -import jline.{ConsoleReader, History} +import jline.console.ConsoleReader +import jline.console.history.FileHistory import org.apache.commons.lang3.StringUtils import org.apache.commons.logging.LogFactory @@ -40,6 +41,10 @@ import org.apache.spark.Logging import org.apache.spark.sql.hive.HiveContext import org.apache.spark.util.Utils +/** + * This code doesn't support remote connections in Hive 1.2+, as the underlying CliDriver + * has dropped its support. + */ private[hive] object SparkSQLCLIDriver extends Logging { private var prompt = "spark-sql" private var continuedPrompt = "".padTo(prompt.length, ' ') @@ -111,16 +116,9 @@ private[hive] object SparkSQLCLIDriver extends Logging { // Clean up after we exit Utils.addShutdownHook { () => SparkSQLEnv.stop() } + val remoteMode = isRemoteMode(sessionState) // "-h" option has been passed, so connect to Hive thrift server. - if (sessionState.getHost != null) { - sessionState.connect() - if (sessionState.isRemoteMode) { - prompt = s"[${sessionState.getHost}:${sessionState.getPort}]" + prompt - continuedPrompt = "".padTo(prompt.length, ' ') - } - } - - if (!sessionState.isRemoteMode) { + if (!remoteMode) { // Hadoop-20 and above - we need to augment classpath using hiveconf // components. // See also: code in ExecDriver.java @@ -131,6 +129,9 @@ private[hive] object SparkSQLCLIDriver extends Logging { } conf.setClassLoader(loader) Thread.currentThread().setContextClassLoader(loader) + } else { + // Hive 1.2 + not supported in CLI + throw new RuntimeException("Remote operations not supported") } val cli = new SparkSQLCLIDriver @@ -171,14 +172,14 @@ private[hive] object SparkSQLCLIDriver extends Logging { val reader = new ConsoleReader() reader.setBellEnabled(false) // reader.setDebug(new PrintWriter(new FileWriter("writer.debug", true))) - CliDriver.getCommandCompletor.foreach((e) => reader.addCompletor(e)) + CliDriver.getCommandCompleter.foreach((e) => reader.addCompleter(e)) val historyDirectory = System.getProperty("user.home") try { if (new File(historyDirectory).exists()) { val historyFile = historyDirectory + File.separator + ".hivehistory" - reader.setHistory(new History(new File(historyFile))) + reader.setHistory(new FileHistory(new File(historyFile))) } else { logWarning("WARNING: Directory for Hive history file: " + historyDirectory + " does not exist. History will not be available during this session.") @@ -190,10 +191,14 @@ private[hive] object SparkSQLCLIDriver extends Logging { logWarning(e.getMessage) } + // TODO: missing +/* val clientTransportTSocketField = classOf[CliSessionState].getDeclaredField("transport") clientTransportTSocketField.setAccessible(true) transport = clientTransportTSocketField.get(sessionState).asInstanceOf[TSocket] +*/ + transport = null var ret = 0 var prefix = "" @@ -230,6 +235,13 @@ private[hive] object SparkSQLCLIDriver extends Logging { System.exit(ret) } + + + def isRemoteMode(state: CliSessionState): Boolean = { + // sessionState.isRemoteMode + state.isHiveServerQuery + } + } private[hive] class SparkSQLCLIDriver extends CliDriver with Logging { @@ -239,25 +251,33 @@ private[hive] class SparkSQLCLIDriver extends CliDriver with Logging { private val console = new SessionState.LogHelper(LOG) + private val isRemoteMode = { + SparkSQLCLIDriver.isRemoteMode(sessionState) + } + private val conf: Configuration = if (sessionState != null) sessionState.getConf else new Configuration() // Force initializing SparkSQLEnv. This is put here but not object SparkSQLCliDriver // because the Hive unit tests do not go through the main() code path. - if (!sessionState.isRemoteMode) { + if (!isRemoteMode) { SparkSQLEnv.init() + } else { + // Hive 1.2 + not supported in CLI + throw new RuntimeException("Remote operations not supported") } override def processCmd(cmd: String): Int = { val cmd_trimmed: String = cmd.trim() + val cmd_lower = cmd_trimmed.toLowerCase(Locale.ENGLISH) val tokens: Array[String] = cmd_trimmed.split("\\s+") val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim() - if (cmd_trimmed.toLowerCase.equals("quit") || - cmd_trimmed.toLowerCase.equals("exit") || - tokens(0).equalsIgnoreCase("source") || + if (cmd_lower.equals("quit") || + cmd_lower.equals("exit") || + tokens(0).toLowerCase(Locale.ENGLISH).equals("source") || cmd_trimmed.startsWith("!") || tokens(0).toLowerCase.equals("list") || - sessionState.isRemoteMode) { + isRemoteMode) { val start = System.currentTimeMillis() super.processCmd(cmd) val end = System.currentTimeMillis() http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala ---------------------------------------------------------------------- diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala index 41f647d..644165a 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala @@ -23,11 +23,12 @@ import javax.security.auth.login.LoginException import org.apache.commons.logging.Log import org.apache.hadoop.hive.conf.HiveConf -import org.apache.hadoop.hive.shims.ShimLoader +import org.apache.hadoop.hive.shims.Utils import org.apache.hadoop.security.UserGroupInformation import org.apache.hive.service.Service.STATE import org.apache.hive.service.auth.HiveAuthFactory import org.apache.hive.service.cli._ +import org.apache.hive.service.server.HiveServer2 import org.apache.hive.service.{AbstractService, Service, ServiceException} import org.apache.spark.sql.hive.HiveContext @@ -35,22 +36,22 @@ import org.apache.spark.sql.hive.thriftserver.ReflectionUtils._ import scala.collection.JavaConversions._ -private[hive] class SparkSQLCLIService(hiveContext: HiveContext) - extends CLIService +private[hive] class SparkSQLCLIService(hiveServer: HiveServer2, hiveContext: HiveContext) + extends CLIService(hiveServer) with ReflectedCompositeService { override def init(hiveConf: HiveConf) { setSuperField(this, "hiveConf", hiveConf) - val sparkSqlSessionManager = new SparkSQLSessionManager(hiveContext) + val sparkSqlSessionManager = new SparkSQLSessionManager(hiveServer, hiveContext) setSuperField(this, "sessionManager", sparkSqlSessionManager) addService(sparkSqlSessionManager) var sparkServiceUGI: UserGroupInformation = null - if (ShimLoader.getHadoopShims.isSecurityEnabled) { + if (UserGroupInformation.isSecurityEnabled) { try { HiveAuthFactory.loginFromKeytab(hiveConf) - sparkServiceUGI = ShimLoader.getHadoopShims.getUGIForConf(hiveConf) + sparkServiceUGI = Utils.getUGI() setSuperField(this, "serviceUGI", sparkServiceUGI) } catch { case e @ (_: IOException | _: LoginException) => http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala ---------------------------------------------------------------------- diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala index 2d5ee68..92ac0ec 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala @@ -25,14 +25,15 @@ import org.apache.hadoop.hive.conf.HiveConf.ConfVars import org.apache.hive.service.cli.SessionHandle import org.apache.hive.service.cli.session.SessionManager import org.apache.hive.service.cli.thrift.TProtocolVersion +import org.apache.hive.service.server.HiveServer2 import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.hive.thriftserver.ReflectionUtils._ import org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager -private[hive] class SparkSQLSessionManager(hiveContext: HiveContext) - extends SessionManager +private[hive] class SparkSQLSessionManager(hiveServer: HiveServer2, hiveContext: HiveContext) + extends SessionManager(hiveServer) with ReflectedCompositeService { private lazy val sparkSqlOperationManager = new SparkSQLOperationManager(hiveContext) @@ -55,12 +56,14 @@ private[hive] class SparkSQLSessionManager(hiveContext: HiveContext) protocol: TProtocolVersion, username: String, passwd: String, + ipAddress: String, sessionConf: java.util.Map[String, String], withImpersonation: Boolean, delegationToken: String): SessionHandle = { hiveContext.openSession() - val sessionHandle = super.openSession( - protocol, username, passwd, sessionConf, withImpersonation, delegationToken) + val sessionHandle = + super.openSession(protocol, username, passwd, ipAddress, sessionConf, withImpersonation, + delegationToken) val session = super.getSession(sessionHandle) HiveThriftServer2.listener.onSessionCreated( session.getIpAddress, sessionHandle.getSessionId.toString, session.getUsername) http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala ---------------------------------------------------------------------- diff --git a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala index df80d04..121b3e0 100644 --- a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala +++ b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala @@ -23,6 +23,7 @@ import scala.collection.mutable.ArrayBuffer import scala.concurrent.duration._ import scala.concurrent.{Await, Promise} import scala.sys.process.{Process, ProcessLogger} +import scala.util.Failure import org.apache.hadoop.hive.conf.HiveConf.ConfVars import org.scalatest.BeforeAndAfter @@ -37,31 +38,46 @@ import org.apache.spark.util.Utils class CliSuite extends SparkFunSuite with BeforeAndAfter with Logging { val warehousePath = Utils.createTempDir() val metastorePath = Utils.createTempDir() + val scratchDirPath = Utils.createTempDir() before { - warehousePath.delete() - metastorePath.delete() + warehousePath.delete() + metastorePath.delete() + scratchDirPath.delete() } after { - warehousePath.delete() - metastorePath.delete() + warehousePath.delete() + metastorePath.delete() + scratchDirPath.delete() } + /** + * Run a CLI operation and expect all the queries and expected answers to be returned. + * @param timeout maximum time for the commands to complete + * @param extraArgs any extra arguments + * @param errorResponses a sequence of strings whose presence in the stdout of the forked process + * is taken as an immediate error condition. That is: if a line beginning + * with one of these strings is found, fail the test immediately. + * The default value is `Seq("Error:")` + * + * @param queriesAndExpectedAnswers one or more tupes of query + answer + */ def runCliWithin( timeout: FiniteDuration, - extraArgs: Seq[String] = Seq.empty)( + extraArgs: Seq[String] = Seq.empty, + errorResponses: Seq[String] = Seq("Error:"))( queriesAndExpectedAnswers: (String, String)*): Unit = { val (queries, expectedAnswers) = queriesAndExpectedAnswers.unzip - val cliScript = "../../bin/spark-sql".split("/").mkString(File.separator) - val command = { + val cliScript = "../../bin/spark-sql".split("/").mkString(File.separator) val jdbcUrl = s"jdbc:derby:;databaseName=$metastorePath;create=true" s"""$cliScript | --master local | --hiveconf ${ConfVars.METASTORECONNECTURLKEY}=$jdbcUrl | --hiveconf ${ConfVars.METASTOREWAREHOUSE}=$warehousePath + | --hiveconf ${ConfVars.SCRATCHDIR}=$scratchDirPath """.stripMargin.split("\\s+").toSeq ++ extraArgs } @@ -81,6 +97,12 @@ class CliSuite extends SparkFunSuite with BeforeAndAfter with Logging { if (next == expectedAnswers.size) { foundAllExpectedAnswers.trySuccess(()) } + } else { + errorResponses.foreach( r => { + if (line.startsWith(r)) { + foundAllExpectedAnswers.tryFailure( + new RuntimeException(s"Failed with error line '$line'")) + }}) } } @@ -88,16 +110,44 @@ class CliSuite extends SparkFunSuite with BeforeAndAfter with Logging { val process = (Process(command, None) #< queryStream).run( ProcessLogger(captureOutput("stdout"), captureOutput("stderr"))) + // catch the output value + class exitCodeCatcher extends Runnable { + var exitValue = 0 + + override def run(): Unit = { + try { + exitValue = process.exitValue() + } catch { + case rte: RuntimeException => + // ignored as it will get triggered when the process gets destroyed + logDebug("Ignoring exception while waiting for exit code", rte) + } + if (exitValue != 0) { + // process exited: fail fast + foundAllExpectedAnswers.tryFailure( + new RuntimeException(s"Failed with exit code $exitValue")) + } + } + } + // spin off the code catche thread. No attempt is made to kill this + // as it will exit once the launched process terminates. + val codeCatcherThread = new Thread(new exitCodeCatcher()) + codeCatcherThread.start() + try { - Await.result(foundAllExpectedAnswers.future, timeout) + Await.ready(foundAllExpectedAnswers.future, timeout) + foundAllExpectedAnswers.future.value match { + case Some(Failure(t)) => throw t + case _ => + } } catch { case cause: Throwable => - logError( + val message = s""" |======================= |CliSuite failure output |======================= |Spark SQL CLI command line: ${command.mkString(" ")} - | + |Exception: $cause |Executed query $next "${queries(next)}", |But failed to capture expected output "${expectedAnswers(next)}" within $timeout. | @@ -105,8 +155,9 @@ class CliSuite extends SparkFunSuite with BeforeAndAfter with Logging { |=========================== |End CliSuite failure output |=========================== - """.stripMargin, cause) - throw cause + """.stripMargin + logError(message, cause) + fail(message, cause) } finally { process.destroy() } http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala ---------------------------------------------------------------------- diff --git a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala index 39b3152..8374629 100644 --- a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala +++ b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala @@ -19,7 +19,6 @@ package org.apache.spark.sql.hive.thriftserver import java.io.File import java.net.URL -import java.nio.charset.StandardCharsets import java.sql.{Date, DriverManager, SQLException, Statement} import scala.collection.mutable.ArrayBuffer @@ -492,7 +491,7 @@ abstract class HiveThriftServer2Test extends SparkFunSuite with BeforeAndAfterAl new File(s"$tempLog4jConf/log4j.properties"), UTF_8) - tempLog4jConf + File.pathSeparator + sys.props("java.class.path") + tempLog4jConf // + File.pathSeparator + sys.props("java.class.path") } s"""$startScript @@ -508,6 +507,20 @@ abstract class HiveThriftServer2Test extends SparkFunSuite with BeforeAndAfterAl """.stripMargin.split("\\s+").toSeq } + /** + * String to scan for when looking for the the thrift binary endpoint running. + * This can change across Hive versions. + */ + val THRIFT_BINARY_SERVICE_LIVE = "Starting ThriftBinaryCLIService on port" + + /** + * String to scan for when looking for the the thrift HTTP endpoint running. + * This can change across Hive versions. + */ + val THRIFT_HTTP_SERVICE_LIVE = "Started ThriftHttpCLIService in http" + + val SERVER_STARTUP_TIMEOUT = 1.minute + private def startThriftServer(port: Int, attempt: Int) = { warehousePath = Utils.createTempDir() warehousePath.delete() @@ -545,23 +558,26 @@ abstract class HiveThriftServer2Test extends SparkFunSuite with BeforeAndAfterAl // Ensures that the following "tail" command won't fail. logPath.createNewFile() + val successLines = Seq(THRIFT_BINARY_SERVICE_LIVE, THRIFT_HTTP_SERVICE_LIVE) + val failureLines = Seq("HiveServer2 is stopped", "Exception in thread", "Error:") logTailingProcess = // Using "-n +0" to make sure all lines in the log file are checked. Process(s"/usr/bin/env tail -n +0 -f ${logPath.getCanonicalPath}").run(ProcessLogger( (line: String) => { diagnosisBuffer += line - - if (line.contains("ThriftBinaryCLIService listening on") || - line.contains("Started ThriftHttpCLIService in http")) { - serverStarted.trySuccess(()) - } else if (line.contains("HiveServer2 is stopped")) { - // This log line appears when the server fails to start and terminates gracefully (e.g. - // because of port contention). - serverStarted.tryFailure(new RuntimeException("Failed to start HiveThriftServer2")) - } + successLines.foreach(r => { + if (line.contains(r)) { + serverStarted.trySuccess(()) + } + }) + failureLines.foreach(r => { + if (line.contains(r)) { + serverStarted.tryFailure(new RuntimeException(s"Failed with output '$line'")) + } + }) })) - Await.result(serverStarted.future, 2.minute) + Await.result(serverStarted.future, SERVER_STARTUP_TIMEOUT) } private def stopThriftServer(): Unit = { http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala ---------------------------------------------------------------------- diff --git a/sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala b/sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala index 53d5b22..c46a4a4 100644 --- a/sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala +++ b/sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala @@ -267,7 +267,34 @@ class HiveCompatibilitySuite extends HiveQueryFileTest with BeforeAndAfter { "date_udf", // Unlike Hive, we do support log base in (0, 1.0], therefore disable this - "udf7" + "udf7", + + // Trivial changes to DDL output + "compute_stats_empty_table", + "compute_stats_long", + "create_view_translate", + "show_create_table_serde", + "show_tblproperties", + + // Odd changes to output + "merge4", + + // Thift is broken... + "inputddl8", + + // Hive changed ordering of ddl: + "varchar_union1", + + // Parser changes in Hive 1.2 + "input25", + "input26", + + // Uses invalid table name + "innerjoin", + + // classpath problems + "compute_stats.*", + "udf_bitmap_.*" ) /** http://git-wip-us.apache.org/repos/asf/spark/blob/a2409d1c/sql/hive/pom.xml ---------------------------------------------------------------------- diff --git a/sql/hive/pom.xml b/sql/hive/pom.xml index b00f320..be16074 100644 --- a/sql/hive/pom.xml +++ b/sql/hive/pom.xml @@ -36,6 +36,11 @@ </properties> <dependencies> + <!-- Added for Hive Parquet SerDe --> + <dependency> + <groupId>com.twitter</groupId> + <artifactId>parquet-hadoop-bundle</artifactId> + </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.binary.version}</artifactId> @@ -53,32 +58,42 @@ <artifactId>spark-sql_${scala.binary.version}</artifactId> <version>${project.version}</version> </dependency> +<!-- <dependency> - <groupId>${hive.group}</groupId> - <artifactId>hive-metastore</artifactId> + <groupId>com.google.guava</groupId> + <artifactId>guava</artifactId> </dependency> <dependency> - <groupId>commons-httpclient</groupId> - <artifactId>commons-httpclient</artifactId> - <version>3.1</version> + <groupId>com.google.protobuf</groupId> + <artifactId>protobuf-java</artifactId> + <version>${protobuf.version}</version> </dependency> <dependency> <groupId>${hive.group}</groupId> - <artifactId>hive-exec</artifactId> - </dependency> - <dependency> - <groupId>org.apache.httpcomponents</groupId> - <artifactId>httpclient</artifactId> - <version>${commons.httpclient.version}</version> + <artifactId>hive-common</artifactId> </dependency> +--> <dependency> - <groupId>org.codehaus.jackson</groupId> - <artifactId>jackson-mapper-asl</artifactId> + <groupId>${hive.group}</groupId> + <artifactId>hive-exec</artifactId> +<!-- + <classifier>core</classifier> +--> </dependency> <dependency> <groupId>${hive.group}</groupId> - <artifactId>hive-serde</artifactId> + <artifactId>hive-metastore</artifactId> </dependency> + <!-- + <dependency> + <groupId>${hive.group}</groupId> + <artifactId>hive-serde</artifactId> + </dependency> + <dependency> + <groupId>${hive.group}</groupId> + <artifactId>hive-shims</artifactId> + </dependency> + --> <!-- hive-serde already depends on avro, but this brings in customized config of avro deps from parent --> <dependency> <groupId>org.apache.avro</groupId> @@ -92,6 +107,55 @@ <classifier>${avro.mapred.classifier}</classifier> </dependency> <dependency> + <groupId>commons-httpclient</groupId> + <artifactId>commons-httpclient</artifactId> + </dependency> + <dependency> + <groupId>org.apache.calcite</groupId> + <artifactId>calcite-avatica</artifactId> + </dependency> + <dependency> + <groupId>org.apache.calcite</groupId> + <artifactId>calcite-core</artifactId> + </dependency> + <dependency> + <groupId>org.apache.httpcomponents</groupId> + <artifactId>httpclient</artifactId> + </dependency> + <dependency> + <groupId>org.codehaus.jackson</groupId> + <artifactId>jackson-mapper-asl</artifactId> + </dependency> + <!-- transitive dependencies of hive-exec-core doesn't declare --> + <dependency> + <groupId>commons-codec</groupId> + <artifactId>commons-codec</artifactId> + </dependency> + <dependency> + <groupId>joda-time</groupId> + <artifactId>joda-time</artifactId> + </dependency> + <dependency> + <groupId>org.jodd</groupId> + <artifactId>jodd-core</artifactId> + </dependency> + <dependency> + <groupId>com.google.code.findbugs</groupId> + <artifactId>jsr305</artifactId> + </dependency> + <dependency> + <groupId>org.datanucleus</groupId> + <artifactId>datanucleus-core</artifactId> + </dependency> + <dependency> + <groupId>org.apache.thrift</groupId> + <artifactId>libthrift</artifactId> + </dependency> + <dependency> + <groupId>org.apache.thrift</groupId> + <artifactId>libfb303</artifactId> + </dependency> + <dependency> <groupId>org.scalacheck</groupId> <artifactId>scalacheck_${scala.binary.version}</artifactId> <scope>test</scope> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org