[GitHub] spark issue #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs in Pytho...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19147
  
**[Test build #81538 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81538/testReport)**
 for PR 19147 at commit 
[`2f929d8`](https://github.com/apache/spark/commit/2f929d8e0ec01ca7070fc0969e5091dad4ce8350).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs in Pytho...

2017-09-07 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19147
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.test...

2017-09-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19158


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...

2017-09-07 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19158
  
Thanks for reviewing! merging to master/2.2/2.1/2.0


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18875
  
**[Test build #81537 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81537/testReport)**
 for PR 18875 at commit 
[`36ce961`](https://github.com/apache/spark/commit/36ce9614c078c9c0aca62a672948d8581b43e2ea).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-07 Thread goldmedal
Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137710147
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -26,20 +26,50 @@ import 
org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
+// `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+// Once it is initialized with `StructType`, it can be used to write out a 
struct or an array of
+// struct. Once it is initialized with `MapType`, it can be used to write 
out a map. An exception
+// will be thrown if trying to write out a struct if it is initialized 
with a `MapType`,
+// and vice verse.
--- End diff --

ok.  I'll modify it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...

2017-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19158
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs i...

2017-09-07 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/19147#discussion_r137707828
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/VectorizedPythonRunner.scala
 ---
@@ -0,0 +1,329 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.python
+
+import java.io.{BufferedInputStream, BufferedOutputStream, 
DataInputStream, DataOutputStream}
+import java.net.Socket
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+
+import org.apache.arrow.vector.VectorSchemaRoot
+import org.apache.arrow.vector.stream.{ArrowStreamReader, 
ArrowStreamWriter}
+
+import org.apache.spark.{SparkEnv, SparkFiles, TaskContext}
+import org.apache.spark.api.python.{ChainedPythonFunctions, 
PythonEvalType, PythonException, PythonRDD, SpecialLengths}
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.arrow.{ArrowUtils, ArrowWriter}
+import org.apache.spark.sql.execution.vectorized.{ArrowColumnVector, 
ColumnarBatch, ColumnVector}
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+
+/**
+ * Similar to `PythonRunner`, but exchange data with Python worker via 
columnar format.
+ */
+class VectorizedPythonRunner(
+funcs: Seq[ChainedPythonFunctions],
+batchSize: Int,
+bufferSize: Int,
+reuse_worker: Boolean,
+argOffsets: Array[Array[Int]]) extends Logging {
+
+  require(funcs.length == argOffsets.length, "argOffsets should have the 
same length as funcs")
+
+  // All the Python functions should have the same exec, version and 
envvars.
+  private val envVars = funcs.head.funcs.head.envVars
+  private val pythonExec = funcs.head.funcs.head.pythonExec
+  private val pythonVer = funcs.head.funcs.head.pythonVer
+
+  // TODO: support accumulator in multiple UDF
+  private val accumulator = funcs.head.funcs.head.accumulator
+
+  // todo: return column batch?
+  def compute(
--- End diff --

Yes, it is a lot of duplicated code from `PythonRunner` that could be 
refactored.  I'm guessing you did not use the existing code because of the 
Arrow stream format?  While I would love to start using that in Spark, I think 
it would be better to do this at a later time when the required code could be 
refactored and the Arrow stream format could replace where we currently use the 
file format.

Also, the good part about using the iterator based file format is each 
iteration can allow Python to communicate back an error code and exit 
gracefully.  In my own tests with the streaming format if an error occurred 
after the stream had started, Spark could lock up in a waiting state.  These 
are the reasons I did not use the streaming format in my implementation.  Would 
this `VectorizedPythonRunner` be able to handle these types of errors?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18875
  
We should add test suite for `JacksonGenerator`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137706345
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -26,20 +26,50 @@ import 
org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
+// `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+// Once it is initialized with `StructType`, it can be used to write out a 
struct or an array of
+// struct. Once it is initialized with `MapType`, it can be used to write 
out a map. An exception
+// will be thrown if trying to write out a struct if it is initialized 
with a `MapType`,
+// and vice verse.
--- End diff --

For this kind of comment, we use the style like:

/**
 * Code comments...
 *
 */


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137706271
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
+  private val tmpDataDir = Utils.createTempDir(namePrefix = "test-data")
+  private val sparkTestingDir = "/tmp/spark-test"
+  private val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+
+  override def afterAll(): Unit = {
+Utils.deleteRecursively(wareHousePath)
+Utils.deleteRecursively(tmpDataDir)
+super.afterAll()
+  }
+
+  private def downloadSpark(version: String): Unit = {
+import scala.sys.process._
+
+val url = 
s"https://d3kbcqa49mib13.cloudfront.net/spark-$version-bin-hadoop2.7.tgz;
+
+Seq("wget", url, "-q", "-P", sparkTestingDir).!
+
+val downloaded = new File(sparkTestingDir, 
s"spark-$version-bin-hadoop2.7.tgz").getCanonicalPath
+val targetDir = new File(sparkTestingDir, 
s"spark-$version").getCanonicalPath
+
+Seq("mkdir", targetDir).!
+
+Seq("tar", "-xzf", downloaded, "-C", targetDir, 
"--strip-components=1").!
+
+Seq("rm", downloaded).!
+  }
+
+  private def genDataDir(name: String): String = {
+new File(tmpDataDir, name).getCanonicalPath
+  }
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+
+val tempPyFile = File.createTempFile("test", ".py")
+Files.write(tempPyFile.toPath,
+  s"""
+|from pyspark.sql import SparkSession
+|
+|spark = SparkSession.builder.enableHiveSupport().getOrCreate()
+|version_index = spark.conf.get("spark.sql.test.version.index", 
None)
+|
+|spark.sql("create table data_source_tbl_{} using json as select 1 
i".format(version_index))
+|
+|spark.sql("create table hive_compatible_data_source_tbl_" + 
version_index + \\
+|  " using parquet as select 1 i")
+|
+|json_file = "${genDataDir("json_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.json(json_file)
+|spark.sql("create table external_data_source_tbl_" + 
version_index + \\
+|  "(i int) using json options (path 
'{}')".format(json_file))
+|
+|parquet_file = "${genDataDir("parquet_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.parquet(parquet_file)
+|spark.sql("create table 
hive_compatible_external_data_source_tbl_" + version_index + \\
+|  "(i int) using parquet options (path 
'{}')".format(parquet_file))
+|
+|json_file2 = "${genDataDir("json2_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.json(json_file2)
+|spark.sql("create table 

[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18266
  
**[Test build #81536 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81536/testReport)**
 for PR 18266 at commit 
[`b38a1a8`](https://github.com/apache/spark/commit/b38a1a8b2d9ffee250b9e8637dc579f2a8f9182d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137704899
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
+  private val tmpDataDir = Utils.createTempDir(namePrefix = "test-data")
+  private val sparkTestingDir = "/tmp/spark-test"
+  private val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+
+  override def afterAll(): Unit = {
+Utils.deleteRecursively(wareHousePath)
+Utils.deleteRecursively(tmpDataDir)
+super.afterAll()
+  }
+
+  private def downloadSpark(version: String): Unit = {
+import scala.sys.process._
+
+val url = 
s"https://d3kbcqa49mib13.cloudfront.net/spark-$version-bin-hadoop2.7.tgz;
+
+Seq("wget", url, "-q", "-P", sparkTestingDir).!
+
+val downloaded = new File(sparkTestingDir, 
s"spark-$version-bin-hadoop2.7.tgz").getCanonicalPath
+val targetDir = new File(sparkTestingDir, 
s"spark-$version").getCanonicalPath
+
+Seq("mkdir", targetDir).!
+
+Seq("tar", "-xzf", downloaded, "-C", targetDir, 
"--strip-components=1").!
+
+Seq("rm", downloaded).!
+  }
+
+  private def genDataDir(name: String): String = {
+new File(tmpDataDir, name).getCanonicalPath
+  }
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+
+val tempPyFile = File.createTempFile("test", ".py")
+Files.write(tempPyFile.toPath,
+  s"""
+|from pyspark.sql import SparkSession
+|
+|spark = SparkSession.builder.enableHiveSupport().getOrCreate()
+|version_index = spark.conf.get("spark.sql.test.version.index", 
None)
+|
+|spark.sql("create table data_source_tbl_{} using json as select 1 
i".format(version_index))
+|
+|spark.sql("create table hive_compatible_data_source_tbl_" + 
version_index + \\
+|  " using parquet as select 1 i")
+|
+|json_file = "${genDataDir("json_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.json(json_file)
+|spark.sql("create table external_data_source_tbl_" + 
version_index + \\
+|  "(i int) using json options (path 
'{}')".format(json_file))
+|
+|parquet_file = "${genDataDir("parquet_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.parquet(parquet_file)
+|spark.sql("create table 
hive_compatible_external_data_source_tbl_" + version_index + \\
+|  "(i int) using parquet options (path 
'{}')".format(parquet_file))
+|
+|json_file2 = "${genDataDir("json2_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.json(json_file2)
+|spark.sql("create table 

[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137704429
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
+  private val tmpDataDir = Utils.createTempDir(namePrefix = "test-data")
+  private val sparkTestingDir = "/tmp/spark-test"
+  private val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+
+  override def afterAll(): Unit = {
+Utils.deleteRecursively(wareHousePath)
+Utils.deleteRecursively(tmpDataDir)
+super.afterAll()
+  }
+
+  private def downloadSpark(version: String): Unit = {
+import scala.sys.process._
+
+val url = 
s"https://d3kbcqa49mib13.cloudfront.net/spark-$version-bin-hadoop2.7.tgz;
+
+Seq("wget", url, "-q", "-P", sparkTestingDir).!
+
+val downloaded = new File(sparkTestingDir, 
s"spark-$version-bin-hadoop2.7.tgz").getCanonicalPath
+val targetDir = new File(sparkTestingDir, 
s"spark-$version").getCanonicalPath
+
+Seq("mkdir", targetDir).!
+
+Seq("tar", "-xzf", downloaded, "-C", targetDir, 
"--strip-components=1").!
+
+Seq("rm", downloaded).!
+  }
+
+  private def genDataDir(name: String): String = {
+new File(tmpDataDir, name).getCanonicalPath
+  }
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+
+val tempPyFile = File.createTempFile("test", ".py")
+Files.write(tempPyFile.toPath,
+  s"""
+|from pyspark.sql import SparkSession
+|
+|spark = SparkSession.builder.enableHiveSupport().getOrCreate()
+|version_index = spark.conf.get("spark.sql.test.version.index", 
None)
+|
+|spark.sql("create table data_source_tbl_{} using json as select 1 
i".format(version_index))
--- End diff --

Instead of only using lowercase column name, should we use mix-case Hive 
schema for those tables?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19155: [SPARK-21949][TEST] Tables created in unit tests should ...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19155
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81533/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19155: [SPARK-21949][TEST] Tables created in unit tests should ...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19155
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19155: [SPARK-21949][TEST] Tables created in unit tests should ...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19155
  
**[Test build #81533 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81533/testReport)**
 for PR 19155 at commit 
[`1d38337`](https://github.com/apache/spark/commit/1d38337b22ea8926aeb1db0591285fbb34f902cc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81532/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19148
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19148
  
**[Test build #81532 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81532/testReport)**
 for PR 19148 at commit 
[`00cdd0a`](https://github.com/apache/spark/commit/00cdd0a63bdd4f531eb06de8d9651e934f2bb448).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137703092
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
--- End diff --

Ok. After a build clean it works now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19158
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81534/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19158
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19158
  
**[Test build #81534 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81534/testReport)**
 for PR 19158 at commit 
[`134bc26`](https://github.com/apache/spark/commit/134bc267a5ef01d9dea3d08cc255facdd8dfc0c8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18956
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81531/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18956
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18956
  
**[Test build #81531 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81531/testReport)**
 for PR 18956 at commit 
[`ecdfb7d`](https://github.com/apache/spark/commit/ecdfb7db34d0d01e357bff0d32b62137ef0ae735).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137700913
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
--- End diff --

Let me do build clean and try again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19148
  
**[Test build #81535 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81535/testReport)**
 for PR 19148 at commit 
[`62369e3`](https://github.com/apache/spark/commit/62369e3a07bc23d68068e809edf1c43de448740a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137700499
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
--- End diff --

Did you try a clean clone? I added the derby dependency to make the test 
work on jenkins...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19107: [SPARK-21799][ML] Fix `KMeans` performance regression ca...

2017-09-07 Thread smurching
Github user smurching commented on the issue:

https://github.com/apache/spark/pull/19107
  
Sorry for the delay, this looks good to me -- thanks @WeichenXu123!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137699853
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
--- End diff --

After removing the added derby dependency, this test can work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137699802
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/SparkSubmitTestUtils.scala ---
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.sql.Timestamp
+import java.util.Date
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.scalatest.concurrent.Timeouts
+import org.scalatest.exceptions.TestFailedDueToTimeoutException
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.test.ProcessTestUtils.ProcessOutputCapturer
+import org.apache.spark.util.Utils
+
+trait SparkSubmitTestUtils extends SparkFunSuite with Timeouts {
--- End diff --

nit. Let's use `TimeLimits` instead of `Timeouts`. `Timeouts` is deprecated 
now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19158
  
**[Test build #81534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81534/testReport)**
 for PR 19158 at commit 
[`134bc26`](https://github.com/apache/spark/commit/134bc267a5ef01d9dea3d08cc255facdd8dfc0c8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137699720
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
--- End diff --

can you print `org.apache.derby.tools.sysinfo.getVersionString` in 
`IsolatedClientLoader.createClient` to see what's your actual derby version?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs in Pytho...

2017-09-07 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19147
  
The test failure above should be fixed by #19158.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137699367
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
--- End diff --

I ran this test locally and encountered the failure like:

2017-09-07 19:28:07.595 - stderr> Caused by: java.sql.SQLException: 
Database at

/root/repos/spark-1/target/tmp/warehouse-66dad501-c743-4ac3-83cc-51451c6d697a/metastore_db
has an incompatible format with the current version of the software.  
The database was created by or
upgraded by version 10.12.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.test...

2017-09-07 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/19158

[SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTests2 should stop 
SparkContext.

## What changes were proposed in this pull request?

`pyspark.sql.tests.SQLTests2` doesn't stop newly created spark context in 
the test and it might affect the following tests.
This pr makes `pyspark.sql.tests.SQLTests2` stop `SparkContext`.

## How was this patch tested?

Existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-21950

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19158.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19158


commit 134bc267a5ef01d9dea3d08cc255facdd8dfc0c8
Author: Takuya UESHIN 
Date:   2017-09-08T02:34:41Z

Make pyspark.sql.tests.SQLTests2 stop SparkContext.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [DO NOT MERGE][SPARK-15689][SQL] data source v2

2017-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r137699153
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/upward/StatisticsSupport.java
 ---
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader.upward;
+
+/**
+ * A mix in interface for `DataSourceV2Reader`. Users can implement this 
interface to report
+ * statistics to Spark.
+ */
+public interface StatisticsSupport {
--- End diff --

I'd like to put column stats in a separated interface, because we already 
separate basic stats and column stats in `ANALYZE TABLE`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [DO NOT MERGE][SPARK-15689][SQL] data source v2

2017-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r137698996
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, Statistics}
+import org.apache.spark.sql.sources.v2.reader.DataSourceV2Reader
+import org.apache.spark.sql.sources.v2.reader.upward.StatisticsSupport
+
+case class DataSourceV2Relation(
+output: Seq[AttributeReference],
+reader: DataSourceV2Reader) extends LeafNode {
+
+  override def computeStats(): Statistics = reader match {
+case r: StatisticsSupport => Statistics(sizeInBytes = 
r.getStatistics.sizeInBytes())
+case _ => Statistics(sizeInBytes = conf.defaultSizeInBytes)
+  }
+}
+
+object DataSourceV2Relation {
+  def apply(reader: DataSourceV2Reader): DataSourceV2Relation = {
+new DataSourceV2Relation(reader.readSchema().toAttributes, reader)
--- End diff --

In data source V2, we will delegate partition pruning to the data source, 
although we need to do some refactoring to make it happen.

> I was just looking into how the data source should provide partition 
data, or at least fields that are the same for all rows in a `ReadTask`. It 
would be nice to have a way to pass those up instead of materializing them in 
each `UnsafeRow`.

This can be achieved by the columnar reader. Think about a data source 
having a data column `i` and a partition column `j`, the returned columnar 
batch has 2 column vectors for `i` and `j`. Column vector `i` is a normal one 
that contains all the values of column `i` within this batch, column vector `j` 
is a constant vector that only contains a single value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19107: [SPARK-21799][ML] Fix `KMeans` performance regression ca...

2017-09-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19107
  
cc @smurching Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-07 Thread caneGuy
Github user caneGuy commented on the issue:

https://github.com/apache/spark/pull/19132
  
@vanzin @zsxwing could you help reivew this?Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18956
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18956
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81529/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19155: [SPARK-21949][TEST] Tables created in unit tests should ...

2017-09-07 Thread 10110346
Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/19155
  
@dongjoon-hyun  thanks, I have created  a JIRA issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18956
  
**[Test build #81529 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81529/testReport)**
 for PR 18956 at commit 
[`d1db7cf`](https://github.com/apache/spark/commit/d1db7cf815d447b195c907fb159ed0a6770c537b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19155: [MINOR][TEST] Tables created in unit tests should be dro...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19155
  
**[Test build #81533 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81533/testReport)**
 for PR 19155 at commit 
[`1d38337`](https://github.com/apache/spark/commit/1d38337b22ea8926aeb1db0591285fbb34f902cc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19157: [SPARK-20589][Core][Scheduler] Allow limiting task concu...

2017-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19157
  
@dhruve, FYI, AppVeyor CI only runs SparkR tests on Windows only when there 
are changes in R related codes:


https://github.com/apache/spark/blob/75a6d05853fea13f88e3c941b1959b24e4640824/appveyor.yml#L29-L34

Thing is, it looks when `git merge` is performed, 
https://github.com/apache/spark/commit/8b3830004d69bd5f109fd9846f59583c23a910c7 
 (not `rebase`), that merging commit one includes usually some changes in R and 
then the CI is triggered, which is actually quite moderate. So, I think 
generally we should rebase it when there are conflicts.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19144: [UI][Streaming]Modify the title, 'Records' instead of 'I...

2017-09-07 Thread guoxiaolongzte
Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/19144
  
@zsxwing Help to review the code, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19150: [SPARK-21939][TEST] Use TimeLimits instead of Tim...

2017-09-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19150


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19150: [SPARK-21939][TEST] Use TimeLimits instead of Timeouts

2017-09-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19150
  
Thank you for review and merging, @jerryshao ! Also, thank you for review 
and approving, @HyukjinKwon and @srowen .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19150: [SPARK-21939][TEST] Use TimeLimits instead of Timeouts

2017-09-07 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19150
  
Merging to master, thanks @dongjoon-hyun .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict between ...

2017-09-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19149
  
Except that, Isolation of `InferFiltersFromConstraints` looks good to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict between ...

2017-09-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19149
  
Hi, @gatorsmile .
According to the PR description, it's about `PruneFilters`. Do we need a 
test case because SPARK-21652 is about `ConstantPropagation`, not 
`PruneFilters`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18029
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81530/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18029
  
**[Test build #81530 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81530/testReport)**
 for PR 18029 at commit 
[`cef5cde`](https://github.com/apache/spark/commit/cef5cdece2bd2a7c95e19493c511d602c1b46461).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class KinesisInitialPosition `
  * `sealed trait InitialPosition `
  * `case class AtTimestamp(timestamp: Date) extends InitialPosition `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18029
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19148
  
**[Test build #81532 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81532/testReport)**
 for PR 19148 at commit 
[`00cdd0a`](https://github.com/apache/spark/commit/00cdd0a63bdd4f531eb06de8d9651e934f2bb448).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137686311
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
+  private val tmpDataDir = Utils.createTempDir(namePrefix = "test-data")
+  private val sparkTestingDir = "/tmp/spark-test"
+  private val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+
+  override def afterAll(): Unit = {
+Utils.deleteRecursively(wareHousePath)
--- End diff --

I wanna keep the `sparkTestingDir`, so we don't need to download spark 
again if this jenkins machine has already run this suite before.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18956
  
**[Test build #81531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81531/testReport)**
 for PR 18956 at commit 
[`ecdfb7d`](https://github.com/apache/spark/commit/ecdfb7db34d0d01e357bff0d32b62137ef0ae735).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17435: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17435#discussion_r137685731
  
--- Diff: python/pyspark/sql/types.py ---
@@ -438,6 +438,11 @@ def toInternal(self, obj):
 def fromInternal(self, obj):
 return self.dataType.fromInternal(obj)
 
+def typeName(self):
+raise TypeError(
+"StructField does not have typename. \
--- End diff --

Little nit: looks a typo, typename -> typeName.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17435: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17435#discussion_r137685629
  
--- Diff: python/pyspark/sql/types.py ---
@@ -438,6 +438,11 @@ def toInternal(self, obj):
 def fromInternal(self, obj):
 return self.dataType.fromInternal(obj)
 
+def typeName(self):
+raise TypeError(
+"StructField does not have typename. \
+You can use self.dataType.simpleString() instead.")
--- End diff --

I'd remove `self` here and just say something like ` use typeName() on its 
type explicitly ...`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18029
  
**[Test build #81530 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81530/testReport)**
 for PR 18029 at commit 
[`cef5cde`](https://github.com/apache/spark/commit/cef5cdece2bd2a7c95e19493c511d602c1b46461).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18029: [SPARK-20168] [DStream] Add changes to use kinesi...

2017-09-07 Thread yssharma
Github user yssharma commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r137684968
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/InitialPosition.scala
 ---
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.streaming.kinesis
+
+import java.util.Date
+
+import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream
+
+/**
+ * Trait for Kinesis's InitialPositionInStream.
+ * This will be overridden by more specific types.
+ */
+sealed trait InitialPosition {
+  val initialPositionInStream: InitialPositionInStream
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.LATEST.
+ */
+case object Latest extends InitialPosition {
+  val instance: InitialPosition = this
+  override val initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.LATEST
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.TRIM_HORIZON.
+ */
+case object TrimHorizon extends InitialPosition {
+  val instance: InitialPosition = this
+  override val initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.TRIM_HORIZON
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.AT_TIMESTAMP.
+ */
+case class AtTimestamp(timestamp: Date) extends InitialPosition {
+  val instance: InitialPosition = this
+  override val initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.AT_TIMESTAMP
+}
+
+/**
+ * Companion object for InitialPosition that returns
+ * appropriate version of InitialPositionInStream.
+ */
+object InitialPosition {
--- End diff --

I've implemented the functions with this Capital naming, but still feel a 
bit salty about this :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17435: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17435#discussion_r137684263
  
--- Diff: python/pyspark/sql/types.py ---
@@ -438,6 +438,11 @@ def toInternal(self, obj):
 def fromInternal(self, obj):
 return self.dataType.fromInternal(obj)
 
+def typeName(self):
+raise TypeError(
--- End diff --

Could we do like ...

```python
raise TypeError(
"..."
"...")
```
if it doesn't bother you much?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19148
  
LGTM except two minor comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137683665
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
+  private val tmpDataDir = Utils.createTempDir(namePrefix = "test-data")
+  private val sparkTestingDir = "/tmp/spark-test"
+  private val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+
+  override def afterAll(): Unit = {
+Utils.deleteRecursively(wareHousePath)
+super.afterAll()
+  }
+
+  private def downloadSpark(version: String): Unit = {
+import scala.sys.process._
+
+val url = 
s"https://d3kbcqa49mib13.cloudfront.net/spark-$version-bin-hadoop2.7.tgz;
+
+Seq("wget", url, "-q", "-P", sparkTestingDir).!
+
+val downloaded = new File(sparkTestingDir, 
s"spark-$version-bin-hadoop2.7.tgz").getCanonicalPath
+val targetDir = new File(sparkTestingDir, 
s"spark-$version").getCanonicalPath
+
+Seq("mkdir", targetDir).!
+
+Seq("tar", "-xzf", downloaded, "-C", targetDir, 
"--strip-components=1").!
+
+Seq("rm", downloaded).!
+  }
+
+  private def genDataDir(name: String): String = {
+new File(tmpDataDir, name).getCanonicalPath
+  }
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+
+val tempPyFile = File.createTempFile("test", ".py")
+Files.write(tempPyFile.toPath,
+  s"""
+|from pyspark.sql import SparkSession
+|
+|spark = SparkSession.builder.enableHiveSupport().getOrCreate()
+|version_index = spark.conf.get("spark.sql.test.version.index", 
None)
+|
+|spark.sql("create table data_source_tbl_{} using json as select 1 
i".format(version_index))
+|
+|spark.sql("create table hive_compatible_data_source_tbl_" + 
version_index + \\
+|  " using parquet as select 1 i")
+|
+|json_file = "${genDataDir("json_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.json(json_file)
+|spark.sql("create table external_data_source_tbl_" + 
version_index + \\
+|  "(i int) using json options (path 
'{}')".format(json_file))
+|
+|parquet_file = "${genDataDir("parquet_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.parquet(parquet_file)
+|spark.sql("create table 
hive_compatible_external_data_source_tbl_" + version_index + \\
+|  "(i int) using parquet options (path 
'{}')".format(parquet_file))
+|
+|json_file2 = "${genDataDir("json2_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.json(json_file2)
+|spark.sql("create table external_table_without_schema_" + 
version_index + \\
   

[GitHub] spark issue #19129: [SPARK-13656][SQL] Delete spark.sql.parquet.cacheMetadat...

2017-09-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19129
  
Thank you for review, @gatorsmile , @HyukjinKwon , @maropu .
In this issue, I've learned how to track the unused stuff correctly. Thank 
you again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137682740
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
+  private val tmpDataDir = Utils.createTempDir(namePrefix = "test-data")
+  private val sparkTestingDir = "/tmp/spark-test"
+  private val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+
+  override def afterAll(): Unit = {
+Utils.deleteRecursively(wareHousePath)
+super.afterAll()
+  }
+
+  private def downloadSpark(version: String): Unit = {
+import scala.sys.process._
+
+val url = 
s"https://d3kbcqa49mib13.cloudfront.net/spark-$version-bin-hadoop2.7.tgz;
+
+Seq("wget", url, "-q", "-P", sparkTestingDir).!
+
+val downloaded = new File(sparkTestingDir, 
s"spark-$version-bin-hadoop2.7.tgz").getCanonicalPath
+val targetDir = new File(sparkTestingDir, 
s"spark-$version").getCanonicalPath
+
+Seq("mkdir", targetDir).!
+
+Seq("tar", "-xzf", downloaded, "-C", targetDir, 
"--strip-components=1").!
+
+Seq("rm", downloaded).!
+  }
+
+  private def genDataDir(name: String): String = {
+new File(tmpDataDir, name).getCanonicalPath
+  }
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+
+val tempPyFile = File.createTempFile("test", ".py")
+Files.write(tempPyFile.toPath,
+  s"""
+|from pyspark.sql import SparkSession
+|
+|spark = SparkSession.builder.enableHiveSupport().getOrCreate()
+|version_index = spark.conf.get("spark.sql.test.version.index", 
None)
+|
+|spark.sql("create table data_source_tbl_{} using json as select 1 
i".format(version_index))
+|
+|spark.sql("create table hive_compatible_data_source_tbl_" + 
version_index + \\
+|  " using parquet as select 1 i")
+|
+|json_file = "${genDataDir("json_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.json(json_file)
+|spark.sql("create table external_data_source_tbl_" + 
version_index + \\
+|  "(i int) using json options (path 
'{}')".format(json_file))
+|
+|parquet_file = "${genDataDir("parquet_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.parquet(parquet_file)
+|spark.sql("create table 
hive_compatible_external_data_source_tbl_" + version_index + \\
+|  "(i int) using parquet options (path 
'{}')".format(parquet_file))
+|
+|json_file2 = "${genDataDir("json2_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.json(json_file2)
+|spark.sql("create table external_table_without_schema_" + 
version_index + \\
   

[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137682115
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
+  private val tmpDataDir = Utils.createTempDir(namePrefix = "test-data")
+  private val sparkTestingDir = "/tmp/spark-test"
+  private val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+
+  override def afterAll(): Unit = {
+Utils.deleteRecursively(wareHousePath)
--- End diff --

Also delete `tmpDataDir ` and `sparkTestingDir `?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19148
  
Less than 2 mins to finish the suite. It looks pretty good! 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18956
  
LGTM except two minor comments


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18956#discussion_r137681045
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
 ---
@@ -64,6 +64,14 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] 
extends Logging {
   protected def batches: Seq[Batch]
 
   /**
+   * Defines a check function which checks for structural integrity of the 
plan after the execution
--- End diff --

`which` -> `that`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18956#discussion_r137680999
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
 ---
@@ -64,6 +64,14 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] 
extends Logging {
   protected def batches: Seq[Batch]
 
   /**
+   * Defines a check function which checks for structural integrity of the 
plan after the execution
+   * of each rule. For example, we can check whether a plan is still 
resolved after each rule in
+   * `Optimizer`, so we can catch rules that return invalid plans. The 
check function will returns
--- End diff --

`will returns` -> `returns`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137680545
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala ---
@@ -534,4 +534,176 @@ class InsertIntoHiveTableSuite extends QueryTest with 
TestHiveSingleton with Bef
   }
 }
   }
+
+  test("insert overwrite to dir from hive metastore table") {
+withTempDir { dir =>
+  val path = dir.toURI.getPath
+
+  sql(s"INSERT OVERWRITE LOCAL DIRECTORY '${path}' SELECT * FROM src 
where key < 10")
+
+  sql(
+s"""
+   |INSERT OVERWRITE LOCAL DIRECTORY '${path}'
+   |STORED AS orc
+   |SELECT * FROM src where key < 10
+ """.stripMargin)
+
+  // use orc data source to check the data of path is right.
+  withTempView("orc_source") {
+sql(
+  s"""
+ |CREATE TEMPORARY VIEW orc_source
+ |USING org.apache.spark.sql.hive.orc
+ |OPTIONS (
+ |  PATH '${dir.getCanonicalPath}'
+ |)
+   """.stripMargin)
+
+checkAnswer(
+  sql("select * from orc_source"),
+  sql("select * from src where key < 10"))
+  }
+}
+  }
+
+  test("insert overwrite to local dir from temp table") {
+withTempView("test_insert_table") {
+  spark.range(10).selectExpr("id", "id AS 
str").createOrReplaceTempView("test_insert_table")
+
+  withTempDir { dir =>
+val path = dir.toURI.getPath
+
+sql(
+  s"""
+ |INSERT OVERWRITE LOCAL DIRECTORY '${path}'
+ |ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+ |SELECT * FROM test_insert_table
+   """.stripMargin)
+
+sql(
+  s"""
+ |INSERT OVERWRITE LOCAL DIRECTORY '${path}'
+ |STORED AS orc
+ |SELECT * FROM test_insert_table
+   """.stripMargin)
+
+// use orc data source to check the data of path is right.
+checkAnswer(
+  spark.read.orc(dir.getCanonicalPath),
+  sql("select * from test_insert_table"))
+  }
+}
+  }
+
+  test("insert overwrite to dir from temp table") {
+withTempView("test_insert_table") {
+  spark.range(10).selectExpr("id", "id AS 
str").createOrReplaceTempView("test_insert_table")
+
+  withTempDir { dir =>
+val pathUri = dir.toURI
+
+sql(
+  s"""
+ |INSERT OVERWRITE DIRECTORY '${pathUri}'
+ |ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+ |SELECT * FROM test_insert_table
+   """.stripMargin)
+
+sql(
+  s"""
+ |INSERT OVERWRITE DIRECTORY '${pathUri}'
+ |STORED AS orc
+ |SELECT * FROM test_insert_table
+   """.stripMargin)
+
+// use orc data source to check the data of path is right.
+checkAnswer(
+  spark.read.orc(dir.getCanonicalPath),
+  sql("select * from test_insert_table"))
+  }
+}
+  }
+
+  test("multi insert overwrite to dir") {
+withTempView("test_insert_table") {
+  spark.range(10).selectExpr("id", "id AS 
str").createOrReplaceTempView("test_insert_table")
+
+  withTempDir { dir =>
+val pathUri = dir.toURI
+
+sql(
+  s"""
+ |FROM test_insert_table
+ |INSERT OVERWRITE DIRECTORY '${pathUri}'
+ |STORED AS orc
+ |SELECT id
+ |INSERT OVERWRITE DIRECTORY '${pathUri}'
--- End diff --

To test multi-insert, we need to use different paths and then verify both 
are successful or not. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r137680153
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -360,6 +360,31 @@ case class InsertIntoTable(
 }
 
 /**
+ * Insert query result into a directory.
+ *
+ * @param isLocal Indicates whether the specified directory is local 
directory
+ * @param storage Info about output file, row and what serialization format
+ * @param provider Specifies what data source to use; only used for data 
source file.
+ * @param child The query to be executed
+ * @param overwrite If true, the existing directory will be overwritten
+ *
+ * Note that this plan is unresolved and has to be replaced by the 
concrete implementations
+ * 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scaladuring 
analysis.
--- End diff --

Could you fix it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19129: [SPARK-13656][SQL] Delete spark.sql.parquet.cache...

2017-09-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19129


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19129: [SPARK-13656][SQL] Delete spark.sql.parquet.cacheMetadat...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19129
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18956#discussion_r137678987
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizerSICheckerSuite.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.analysis.{EmptyFunctionRegistry, 
UnresolvedAttribute}
+import org.apache.spark.sql.catalyst.catalog.{InMemoryCatalog, 
SessionCatalog}
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.errors.TreeNodeException
+import org.apache.spark.sql.catalyst.expressions.{Alias, Literal}
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, 
OneRowRelation, Project}
+import org.apache.spark.sql.catalyst.rules._
+import org.apache.spark.sql.internal.SQLConf
+
+
+class OptimizerSICheckerkSuite extends PlanTest {
--- End diff --

Ok.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18956#discussion_r137679007
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
 ---
@@ -64,6 +64,14 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] 
extends Logging {
   protected def batches: Seq[Batch]
 
   /**
+   * Defines a check function which checks for structural integrity of the 
plan after the execution
+   * of each rule. For example, we can check whether a plan is still 
resolved after each rule in
+   * `Optimizer`, so we can catch rules that return invalid plans. The 
check function will returns
+   * `false` if the given plan doesn't pass the structural integrity check.
+   */
+  protected def planChecker(plan: TreeType): Boolean = true
--- End diff --

Looks good.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18956
  
**[Test build #81529 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81529/testReport)**
 for PR 18956 at commit 
[`d1db7cf`](https://github.com/apache/spark/commit/d1db7cf815d447b195c907fb159ed0a6770c537b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19046: [SPARK-18769][yarn] Limit resource requests based...

2017-09-07 Thread vanzin
Github user vanzin closed the pull request at:

https://github.com/apache/spark/pull/19046


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19046: [SPARK-18769][yarn] Limit resource requests based on RM'...

2017-09-07 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19046
  
I'm going to close this; when I find some free time I might take a closer 
at the issue described in Wilfred's message.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137663570
  
--- Diff: sql/hive/pom.xml ---
@@ -177,6 +177,10 @@
   libfb303
 
 
+  org.apache.derby
+  derby
--- End diff --

I see. Thank you!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81524/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19148
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19148
  
**[Test build #81524 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81524/testReport)**
 for PR 19148 at commit 
[`08dcf22`](https://github.com/apache/spark/commit/08dcf2291a0b1ae4b0e8f29c7628ff04b1924029).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils `
  * `trait SparkSubmitTestUtils extends SparkFunSuite with Timeouts `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17589: [SPARK-16544][SQL] Support for conversion from numeric c...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17589
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19129: [SPARK-13656][SQL] Delete spark.sql.parquet.cacheMetadat...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19129
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19129: [SPARK-13656][SQL] Delete spark.sql.parquet.cacheMetadat...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19129
  
**[Test build #81525 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81525/testReport)**
 for PR 19129 at commit 
[`8e3d8fe`](https://github.com/apache/spark/commit/8e3d8fe26c6bbf15e17a4b80ff8357fe870f2d46).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17589: [SPARK-16544][SQL] Support for conversion from numeric c...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17589
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81527/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19129: [SPARK-13656][SQL] Delete spark.sql.parquet.cacheMetadat...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19129
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81525/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17589: [SPARK-16544][SQL] Support for conversion from numeric c...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17589
  
**[Test build #81527 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81527/testReport)**
 for PR 17589 at commit 
[`cbf8a22`](https://github.com/apache/spark/commit/cbf8a224e9cb5744fd340a4f835bdf07cfdf5543).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19103: [SPARK-21890] Credentials not being passed to add...

2017-09-07 Thread redsanket
Github user redsanket closed the pull request at:

https://github.com/apache/spark/pull/19103


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18975
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81528/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18975
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18975
  
**[Test build #81528 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81528/testReport)**
 for PR 18975 at commit 
[`4a5ff29`](https://github.com/apache/spark/commit/4a5ff2912b15a00e7568893be0fa0b61618146c2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19157: [SPARK-20589][Core][Scheduler] Allow limiting task concu...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19157
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19157: [SPARK-20589][Core][Scheduler] Allow limiting task concu...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19157
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81526/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19157: [SPARK-20589][Core][Scheduler] Allow limiting task concu...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19157
  
**[Test build #81526 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81526/testReport)**
 for PR 19157 at commit 
[`8b38300`](https://github.com/apache/spark/commit/8b3830004d69bd5f109fd9846f59583c23a910c7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >