[GitHub] spark pull request #19148: [SPARK-21936][SQL][WIP] backward compatibility te...

2017-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137621549
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
+  private val tmpDataDir = Utils.createTempDir(namePrefix = "test-data")
+  private val sparkTestingDir = "/tmp/spark-test"
+  private val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+
+  override def afterAll(): Unit = {
+Utils.deleteRecursively(wareHousePath)
+super.afterAll()
+  }
+
+  private def downloadSpark(version: String): Unit = {
+import scala.sys.process._
+
+val url = 
s"https://d3kbcqa49mib13.cloudfront.net/spark-$version-bin-hadoop2.7.tgz";
+
+Seq("wget", url, "-q", "-P", sparkTestingDir).!
+
+val downloaded = new File(sparkTestingDir, 
s"spark-$version-bin-hadoop2.7.tgz").getCanonicalPath
+val targetDir = new File(sparkTestingDir, 
s"spark-$version").getCanonicalPath
+
+Seq("mkdir", targetDir).!
+
+Seq("tar", "-xzf", downloaded, "-C", targetDir, 
"--strip-components=1").!
+
+Seq("rm", downloaded).!
+  }
+
+  private def genDataDir(name: String): String = {
+new File(tmpDataDir, name).getCanonicalPath
+  }
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+
+val tempPyFile = File.createTempFile("test", ".py")
+Files.write(tempPyFile.toPath,
+  s"""
+|from pyspark.sql import SparkSession
+|
+|spark = SparkSession.builder.enableHiveSupport().getOrCreate()
+|version_index = spark.conf.get("spark.sql.test.version.index", 
None)
+|
+|spark.sql("create table data_source_tbl_{} using json as select 1 
i".format(version_index))
+|
+|spark.sql("create table hive_compatible_data_source_tbl_" + 
version_index + \\
+|  " using parquet as select 1 i")
+|
+|json_file = "${genDataDir("json_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.json(json_file)
+|spark.sql("create table external_data_source_tbl_" + 
version_index + \\
+|  "(i int) using json options (path 
'{}')".format(json_file))
+|
+|parquet_file = "${genDataDir("parquet_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.parquet(parquet_file)
+|spark.sql("create table 
hive_compatible_external_data_source_tbl_" + version_index + \\
+|  "(i int) using parquet options (path 
'{}')".format(parquet_file))
+|
+|json_file2 = "${genDataDir("json2_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.json(json_file2)
+|spark.sql("create table external_table_without_schema_" + 
version_index + \\
   

[GitHub] spark pull request #19148: [SPARK-21936][SQL][WIP] backward compatibility te...

2017-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137621065
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogBackwardCompatibilitySuite.scala
 ---
@@ -1,260 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.hive
-
-import java.net.URI
-
-import org.apache.hadoop.fs.Path
-import org.scalatest.BeforeAndAfterEach
-
-import org.apache.spark.sql.QueryTest
-import org.apache.spark.sql.catalyst.TableIdentifier
-import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
CatalogTable, CatalogTableType}
-import org.apache.spark.sql.hive.client.HiveClient
-import org.apache.spark.sql.hive.test.TestHiveSingleton
-import org.apache.spark.sql.test.SQLTestUtils
-import org.apache.spark.sql.types.StructType
-import org.apache.spark.util.Utils
-
-
-class HiveExternalCatalogBackwardCompatibilitySuite extends QueryTest
--- End diff --

This is covered by the new test suite.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL][WIP] backward compatibility te...

2017-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137621130
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala
 ---
@@ -1354,31 +1354,4 @@ class MetastoreDataSourcesSuite extends QueryTest 
with SQLTestUtils with TestHiv
   sparkSession.sparkContext.conf.set(DEBUG_MODE, previousValue)
 }
   }
-
-  test("SPARK-18464: support old table which doesn't store schema in table 
properties") {
--- End diff --

This is covered by the new test suite.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL][WIP] backward compatibility te...

2017-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137620814
  
--- Diff: sql/hive/pom.xml ---
@@ -177,6 +177,10 @@
   libfb303
 
 
+  org.apache.derby
+  derby
--- End diff --

Hive metastore depends on derby 10.10.2, and we package derby 10.12.1 when 
building Spark, so at last we are using derby 10.12.1 when Spark uses local 
hive metastore.

However this is a tricky approach, e.g. it doesn't work for SBT. When you 
build Spark with SBT, you are still using derby 10.10.2. It's probably the 
reason why the test failed on jenkins.

Here I explicitly add the derby dependency to the hive module, to overwrite 
the default derby 10.10.2 dependency.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL][WIP] backward compatibility te...

2017-09-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137413347
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.sql.Timestamp
+import java.util.Date
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.scalatest.concurrent.Timeouts
+import org.scalatest.exceptions.TestFailedDueToTimeoutException
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.{SparkFunSuite, TestUtils}
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.test.ProcessTestUtils.ProcessOutputCapturer
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkFunSuite with Timeouts 
{
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
--- End diff --

This single warehouse seems to make a failure. Maybe, Spark 2.0.2 tries to 
read 2.2.0?
```
build/sbt -Phive "project hive" "test-only 
*.HiveExternalCatalogVersionsSuite"
...
[info] - backward compatibility *** FAILED *** (17 seconds, 712 
milliseconds)
[info]   spark-submit returned with exit code 1.
...
[info]   2017-09-06 16:07:41.744 - stderr> Caused by: 
java.sql.SQLException: Database at 
/Users/dongjoon/PR-19148/target/tmp/warehouse-d2818ad2-f141-4fc7-bc68-e7f67c89f3f4/metastore_db
 has an incompatible format with the current version of the software.  The 
database was created by or upgraded by version 10.12.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL][WIP] backward compatibility te...

2017-09-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137412267
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.sql.Timestamp
+import java.util.Date
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.scalatest.concurrent.Timeouts
+import org.scalatest.exceptions.TestFailedDueToTimeoutException
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.{SparkFunSuite, TestUtils}
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.test.ProcessTestUtils.ProcessOutputCapturer
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkFunSuite with Timeouts 
{
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
+  private val sparkTestingDir = "/tmp/spark-test"
+  private val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+
+  override def afterAll(): Unit = {
+Utils.deleteRecursively(wareHousePath)
+super.afterAll()
+  }
+
+  // NOTE: This is an expensive operation in terms of time (10 seconds+). 
Use sparingly.
+  // This is copied from org.apache.spark.deploy.SparkSubmitSuite
+  private def runSparkSubmit(args: Seq[String], sparkHomeOpt: 
Option[String] = None): Unit = {
+val sparkHome = sparkHomeOpt.getOrElse(
+  sys.props.getOrElse("spark.test.home", fail("spark.test.home is not 
set!")))
+val history = ArrayBuffer.empty[String]
+val sparkSubmit = if (Utils.isWindows) {
+  // On Windows, `ProcessBuilder.directory` does not change the 
current working directory.
+  new File("..\\..\\bin\\spark-submit.cmd").getAbsolutePath
+} else {
+  "./bin/spark-submit"
+}
+val commands = Seq(sparkSubmit) ++ args
+val commandLine = commands.mkString("'", "' '", "'")
+
+val builder = new ProcessBuilder(commands: _*).directory(new 
File(sparkHome))
+val env = builder.environment()
+env.put("SPARK_TESTING", "1")
+env.put("SPARK_HOME", sparkHome)
+
+def captureOutput(source: String)(line: String): Unit = {
+  // This test suite has some weird behaviors when executed on Jenkins:
+  //
+  // 1. Sometimes it gets extremely slow out of unknown reason on 
Jenkins.  Here we add a
+  //timestamp to provide more diagnosis information.
+  // 2. Log lines are not correctly redirected to unit-tests.log as 
expected, so here we print
+  //them out for debugging purposes.
+  val logLine = s"${new Timestamp(new Date().getTime)} - $source> 
$line"
+  // scalastyle:off println
+  println(logLine)
+  // scalastyle:on println
+  history += logLine
+}
+
+val process = builder.start()
+new ProcessOutputCapturer(process.getInputStream, 
captureOutput("stdout")).start()
+new ProcessOutputCapturer(process.getErrorStream, 
captureOutput("stderr")).start()
+
+try {
+  val exitCode = failAfter(300.seconds) { process.waitFor() }
+  if (exitCode != 0) {
+// include logs in output. Note that logging is async and may not 
have completed
+// at the time this exception is raised
+Thread.sleep(1000)
+val historyLog = history.mkString("\n")
+fail {
  

[GitHub] spark pull request #19148: [SPARK-21936][SQL][WIP] backward compatibility te...

2017-09-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137357440
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.sql.Timestamp
+import java.util.Date
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.scalatest.concurrent.Timeouts
+import org.scalatest.exceptions.TestFailedDueToTimeoutException
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.{SparkFunSuite, TestUtils}
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.test.ProcessTestUtils.ProcessOutputCapturer
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkFunSuite with Timeouts 
{
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
+  private val sparkTestingDir = "/tmp/spark-test"
+  private val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+
+  override def afterAll(): Unit = {
+Utils.deleteRecursively(wareHousePath)
+super.afterAll()
+  }
+
+  // NOTE: This is an expensive operation in terms of time (10 seconds+). 
Use sparingly.
+  // This is copied from org.apache.spark.deploy.SparkSubmitSuite
+  private def runSparkSubmit(args: Seq[String], sparkHomeOpt: 
Option[String] = None): Unit = {
+val sparkHome = sparkHomeOpt.getOrElse(
+  sys.props.getOrElse("spark.test.home", fail("spark.test.home is not 
set!")))
+val history = ArrayBuffer.empty[String]
+val sparkSubmit = if (Utils.isWindows) {
+  // On Windows, `ProcessBuilder.directory` does not change the 
current working directory.
+  new File("..\\..\\bin\\spark-submit.cmd").getAbsolutePath
+} else {
+  "./bin/spark-submit"
+}
+val commands = Seq(sparkSubmit) ++ args
+val commandLine = commands.mkString("'", "' '", "'")
+
+val builder = new ProcessBuilder(commands: _*).directory(new 
File(sparkHome))
+val env = builder.environment()
+env.put("SPARK_TESTING", "1")
+env.put("SPARK_HOME", sparkHome)
+
+def captureOutput(source: String)(line: String): Unit = {
+  // This test suite has some weird behaviors when executed on Jenkins:
+  //
+  // 1. Sometimes it gets extremely slow out of unknown reason on 
Jenkins.  Here we add a
+  //timestamp to provide more diagnosis information.
+  // 2. Log lines are not correctly redirected to unit-tests.log as 
expected, so here we print
+  //them out for debugging purposes.
+  val logLine = s"${new Timestamp(new Date().getTime)} - $source> 
$line"
+  // scalastyle:off println
+  println(logLine)
+  // scalastyle:on println
+  history += logLine
+}
+
+val process = builder.start()
+new ProcessOutputCapturer(process.getInputStream, 
captureOutput("stdout")).start()
+new ProcessOutputCapturer(process.getErrorStream, 
captureOutput("stderr")).start()
+
+try {
+  val exitCode = failAfter(300.seconds) { process.waitFor() }
+  if (exitCode != 0) {
+// include logs in output. Note that logging is async and may not 
have completed
+// at the time this exception is raised
+Thread.sleep(1000)
+val historyLog = history.mkString("\n")
+fail {
+

[GitHub] spark pull request #19148: [SPARK-21936][SQL][WIP] backward compatibility te...

2017-09-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137357402
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.sql.Timestamp
+import java.util.Date
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.scalatest.concurrent.Timeouts
+import org.scalatest.exceptions.TestFailedDueToTimeoutException
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.{SparkFunSuite, TestUtils}
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.test.ProcessTestUtils.ProcessOutputCapturer
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkFunSuite with Timeouts 
{
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
+  private val sparkTestingDir = "/tmp/spark-test"
+  private val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+
+  override def afterAll(): Unit = {
+Utils.deleteRecursively(wareHousePath)
+super.afterAll()
+  }
+
+  // NOTE: This is an expensive operation in terms of time (10 seconds+). 
Use sparingly.
+  // This is copied from org.apache.spark.deploy.SparkSubmitSuite
+  private def runSparkSubmit(args: Seq[String], sparkHomeOpt: 
Option[String] = None): Unit = {
+val sparkHome = sparkHomeOpt.getOrElse(
+  sys.props.getOrElse("spark.test.home", fail("spark.test.home is not 
set!")))
+val history = ArrayBuffer.empty[String]
+val sparkSubmit = if (Utils.isWindows) {
+  // On Windows, `ProcessBuilder.directory` does not change the 
current working directory.
+  new File("..\\..\\bin\\spark-submit.cmd").getAbsolutePath
+} else {
+  "./bin/spark-submit"
+}
+val commands = Seq(sparkSubmit) ++ args
+val commandLine = commands.mkString("'", "' '", "'")
+
+val builder = new ProcessBuilder(commands: _*).directory(new 
File(sparkHome))
+val env = builder.environment()
+env.put("SPARK_TESTING", "1")
+env.put("SPARK_HOME", sparkHome)
+
+def captureOutput(source: String)(line: String): Unit = {
+  // This test suite has some weird behaviors when executed on Jenkins:
+  //
+  // 1. Sometimes it gets extremely slow out of unknown reason on 
Jenkins.  Here we add a
+  //timestamp to provide more diagnosis information.
+  // 2. Log lines are not correctly redirected to unit-tests.log as 
expected, so here we print
+  //them out for debugging purposes.
+  val logLine = s"${new Timestamp(new Date().getTime)} - $source> 
$line"
+  // scalastyle:off println
+  println(logLine)
+  // scalastyle:on println
+  history += logLine
+}
+
+val process = builder.start()
+new ProcessOutputCapturer(process.getInputStream, 
captureOutput("stdout")).start()
+new ProcessOutputCapturer(process.getErrorStream, 
captureOutput("stderr")).start()
+
+try {
+  val exitCode = failAfter(300.seconds) { process.waitFor() }
+  if (exitCode != 0) {
+// include logs in output. Note that logging is async and may not 
have completed
+// at the time this exception is raised
+Thread.sleep(1000)
+val historyLog = history.mkString("\n")
+fail {
+

[GitHub] spark pull request #19148: [SPARK-21936][SQL][WIP] backward compatibility te...

2017-09-06 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/19148

[SPARK-21936][SQL][WIP] backward compatibility test framework for 
HiveExternalCatalog

## What changes were proposed in this pull request?

`HiveExternalCatalog` is a semi-public interface. When creating tables, 
`HiveExternalCatalog` converts the table metadata to hive table format and save 
into hive metastore. It's very import to guarantee backward compatibility here, 
i.e., tables created by previous Spark versions should still be readable in 
newer Spark versions.

Previously we find backward compatibility issues manually, which is really 
easy to miss bugs. This PR introduces a test framework to automatically test 
`HiveExternalCatalog` backward compatibility, by downloading Spark binaries 
with different versions, and create tables with these Spark versions, and read 
these tables with current Spark version.

TODO:
add more test cases
make sure it works on jenkins

## How was this patch tested?

test-only change

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19148.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19148


commit 283bc45d5b8ae63ba00d74c0322469c10b4f88eb
Author: Wenchen Fan 
Date:   2017-09-06T05:39:53Z

backward compatibility test framework for HiveExternalCatalog




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org