[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18266#discussion_r137720936 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -197,7 +197,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { * @since 1.4.0 */ def jdbc(url: String, table: String, properties: Properties): DataFrame = { -assertNoSpecifiedSchema("jdbc") +assertJdbcAPISpecifiedDataFrameSchema() --- End diff -- Users should be able to do it in either way. If users specify them in both `schema()` API and the `customerSchema` option, we should issue an exception. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18266#discussion_r137720118 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -679,6 +679,16 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { } /** + * A convenient function for validate specified column types schema in jdbc API. + */ + private def assertJdbcAPISpecifiedDataFrameSchema(): Unit = { --- End diff -- `assertJdbcAPISpecifiedDataFrameSchema` -> `assertNoSpecifiedSchemaForJDBC` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19152: [SPARK-21915][ML][PySpark] Model 1 and Model 2 ParamMaps...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19152 **[Test build #3914 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3914/testReport)** for PR 19152 at commit [`a2ccb8a`](https://github.com/apache/spark/commit/a2ccb8a83d13d39c95f0ac1cac1c74dca064). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18266#discussion_r137719100 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala --- @@ -185,6 +185,10 @@ object SQLDataSourceExample { connectionProperties.put("password", "password") val jdbcDF2 = spark.read .jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties) +// Specifying dataframe column data types on read --- End diff -- > Specifying the custom data types of columns --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18875 **[Test build #81542 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81542/testReport)** for PR 18875 at commit [`1df28ec`](https://github.com/apache/spark/commit/1df28ec200fd46a001b0fea9597f8b9659ea94f4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18266#discussion_r137718217 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala --- @@ -185,6 +185,10 @@ object SQLDataSourceExample { connectionProperties.put("password", "password") val jdbcDF2 = spark.read .jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties) +// Specifying dataframe column data types on read +connectionProperties.put("customDataFrameColumnTypes", "id DECIMAL(38, 0), name STRING") --- End diff -- `customSchema ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19145: [spark-21933][yarn] Spark Streaming request more executo...
Github user klion26 commented on the issue: https://github.com/apache/spark/pull/19145 @HyukjinKwon @vanzin @srowen @foxish @djvulee @squito Could you please help to review this pr? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19131: [MINOR][SQL]remove unuse import class
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19131 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19131: [MINOR][SQL]remove unuse import class
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19131 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81541/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19131: [MINOR][SQL]remove unuse import class
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19131 **[Test build #81541 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81541/testReport)** for PR 19131 at commit [`648ed11`](https://github.com/apache/spark/commit/648ed1165e3913ac919e0dc02608887c9ee6d7c1). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18266#discussion_r137717807 --- Diff: examples/src/main/python/sql/datasource.py --- @@ -177,6 +177,16 @@ def jdbc_dataset_example(spark): .jdbc("jdbc:postgresql:dbserver", "schema.tablename", properties={"user": "username", "password": "password"}) +# Specifying dataframe column data types on read +jdbcDF3 = spark.read \ +.format("jdbc") \ +.option("url", "jdbc:postgresql:dbserver") \ +.option("dbtable", "schema.tablename") \ +.option("user", "username") \ +.option("password", "password") \ +.option("customDataFrameColumnTypes", "id DECIMAL(38, 0), name STRING") \ --- End diff -- `readTableColumnTypes` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18266#discussion_r137717728 --- Diff: docs/sql-programming-guide.md --- @@ -1334,7 +1334,14 @@ the following case-insensitive options: The database column data types to use instead of the defaults, when creating the table. Data type information should be specified in the same format as CREATE TABLE columns syntax (e.g: "name CHAR(64), comments VARCHAR(1024)"). The specified types should be valid spark sql data types. This option applies only to writing. - + + + +customDataFrameColumnTypes + + The DataFrame column data types to use instead of the defaults when reading data from jdbc API. (e.g: "id DECIMAL(38, 0), name STRING"). The specified types should be valid spark sql data types. This option applies only to reading. --- End diff -- This is not limited to DataFrame. > The customized column types to use for reading data from JDBC connectors. For example, "id DECIMAL(38, 0), name STRING"). The specified types should be valid spark sql data types. This option applies only to reading. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18266#discussion_r137717379 --- Diff: docs/sql-programming-guide.md --- @@ -1334,7 +1334,14 @@ the following case-insensitive options: The database column data types to use instead of the defaults, when creating the table. Data type information should be specified in the same format as CREATE TABLE columns syntax (e.g: "name CHAR(64), comments VARCHAR(1024)"). The specified types should be valid spark sql data types. This option applies only to writing. - + + + +customDataFrameColumnTypes --- End diff -- `readTableColumnTypes` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19131: [MINOR][SQL]remove unuse import class
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19131 **[Test build #81541 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81541/testReport)** for PR 19131 at commit [`648ed11`](https://github.com/apache/spark/commit/648ed1165e3913ac919e0dc02608887c9ee6d7c1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19131: [MINOR][SQL]remove unuse import class
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19131 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13067: [SPARK-4131] [SQL] Support INSERT OVERWRITE [LOCAL] DIRE...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/13067 Since this PR https://github.com/apache/spark/pull/18975 will be merged soon. Could you close this? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19131: [MINOR][SQL]remove unuse import class
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19131 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19148 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19160: [SPARK-21934][CORE] Expose Shuffle Netty memory usage to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19160 **[Test build #81540 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81540/testReport)** for PR 19160 at commit [`04a7ec9`](https://github.com/apache/spark/commit/04a7ec944b3273fbe9b9bdb6e217814452a1a12c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19148 Could you send a PR to 2.2 branch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19148 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19159: [SPARK-21946][TEST]: fix flaky test: "alter table: renam...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19159 LGTM pending Jenkins. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19160: [SPARK-21934][CORE] Expose Shuffle Netty memory u...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/19160 [SPARK-21934][CORE] Expose Shuffle Netty memory usage to MetricsSystem ## What changes were proposed in this pull request? This is a followup work of SPARK-9104 to expose the Netty memory usage to MetricsSystem. Current the shuffle Netty memory usage of `NettyBlockTransferService` will be exposed, if using external shuffle, then the Netty memory usage of `ExternalShuffleClient` and `ExternalShuffleService` will be exposed instead. Currently I don't expose Netty memory usage of `YarnShuffleService`, because `YarnShuffleService` doesn't have `MetricsSystem` itself, and is better to connect to Hadoop's MetricsSystem. ## How was this patch tested? Manually verified in local cluster. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-21934 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19160.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19160 commit 04a7ec944b3273fbe9b9bdb6e217814452a1a12c Author: jerryshao Date: 2017-09-07T13:25:39Z Expose Shuffle Netty memory usage to MetricsSystem --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81535/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19148 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19159: [SPARK-21946][TEST]: fix flaky test: "alter table: renam...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19159 **[Test build #81539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81539/testReport)** for PR 19159 at commit [`7a47891`](https://github.com/apache/spark/commit/7a478918710627f5d0df973f059b07d8cf17bd51). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13067: [SPARK-4131] [SQL] Support INSERT OVERWRITE [LOCAL] DIRE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13067 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19148 **[Test build #81535 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81535/testReport)** for PR 19148 at commit [`62369e3`](https://github.com/apache/spark/commit/62369e3a07bc23d68068e809edf1c43de448740a). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class NettyMemoryMetrics implements MetricSet ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18995: [SPARK-21787][SQL] Support for pushing down filte...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/18995 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18956 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19159: [TEST]: fix flaky test: "alter table: rename cach...
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/19159 [TEST]: fix flaky test: "alter table: rename cached table" in InMemoryCatalogedDDLSuite ## What changes were proposed in this pull request? This PR fixes flaky test `InMemoryCatalogedDDLSuite "alter table: rename cached table"`. Since this test validates distributed DataFrame, the result should be checked by using `checkAnswer`. The original version used `df.collect().Seq` method that does not guaranty an order of each element of the result. ## How was this patch tested? Use existing test case You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-21946 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19159.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19159 commit 7a478918710627f5d0df973f059b07d8cf17bd51 Author: Kazuaki Ishizaki Date: 2017-09-08T06:06:47Z use checkAnswer to validate results of DataFrame --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18956 Thanks @rxin @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18956 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19155: [SPARK-21949][TEST] Tables created in unit tests ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19155 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19155: [SPARK-21949][TEST] Tables created in unit tests should ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19155 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs in Pytho...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19147 **[Test build #81538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81538/testReport)** for PR 19147 at commit [`2f929d8`](https://github.com/apache/spark/commit/2f929d8e0ec01ca7070fc0969e5091dad4ce8350). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs in Pytho...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19147 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.test...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19158 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19158 Thanks for reviewing! merging to master/2.2/2.1/2.0 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18875 **[Test build #81537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81537/testReport)** for PR 18875 at commit [`36ce961`](https://github.com/apache/spark/commit/36ce9614c078c9c0aca62a672948d8581b43e2ea). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137710147 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -26,20 +26,50 @@ import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ +// `JackGenerator` can only be initialized with a `StructType` or a `MapType`. +// Once it is initialized with `StructType`, it can be used to write out a struct or an array of +// struct. Once it is initialized with `MapType`, it can be used to write out a map. An exception +// will be thrown if trying to write out a struct if it is initialized with a `MapType`, +// and vice verse. --- End diff -- ok. I'll modify it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19158 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs i...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19147#discussion_r137707828 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/VectorizedPythonRunner.scala --- @@ -0,0 +1,329 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.python + +import java.io.{BufferedInputStream, BufferedOutputStream, DataInputStream, DataOutputStream} +import java.net.Socket +import java.nio.charset.StandardCharsets + +import scala.collection.JavaConverters._ + +import org.apache.arrow.vector.VectorSchemaRoot +import org.apache.arrow.vector.stream.{ArrowStreamReader, ArrowStreamWriter} + +import org.apache.spark.{SparkEnv, SparkFiles, TaskContext} +import org.apache.spark.api.python.{ChainedPythonFunctions, PythonEvalType, PythonException, PythonRDD, SpecialLengths} +import org.apache.spark.internal.Logging +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.execution.arrow.{ArrowUtils, ArrowWriter} +import org.apache.spark.sql.execution.vectorized.{ArrowColumnVector, ColumnarBatch, ColumnVector} +import org.apache.spark.sql.types._ +import org.apache.spark.util.Utils + +/** + * Similar to `PythonRunner`, but exchange data with Python worker via columnar format. + */ +class VectorizedPythonRunner( +funcs: Seq[ChainedPythonFunctions], +batchSize: Int, +bufferSize: Int, +reuse_worker: Boolean, +argOffsets: Array[Array[Int]]) extends Logging { + + require(funcs.length == argOffsets.length, "argOffsets should have the same length as funcs") + + // All the Python functions should have the same exec, version and envvars. + private val envVars = funcs.head.funcs.head.envVars + private val pythonExec = funcs.head.funcs.head.pythonExec + private val pythonVer = funcs.head.funcs.head.pythonVer + + // TODO: support accumulator in multiple UDF + private val accumulator = funcs.head.funcs.head.accumulator + + // todo: return column batch? + def compute( --- End diff -- Yes, it is a lot of duplicated code from `PythonRunner` that could be refactored. I'm guessing you did not use the existing code because of the Arrow stream format? While I would love to start using that in Spark, I think it would be better to do this at a later time when the required code could be refactored and the Arrow stream format could replace where we currently use the file format. Also, the good part about using the iterator based file format is each iteration can allow Python to communicate back an error code and exit gracefully. In my own tests with the streaming format if an error occurred after the stream had started, Spark could lock up in a waiting state. These are the reasons I did not use the streaming format in my implementation. Would this `VectorizedPythonRunner` be able to handle these types of errors? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18875 We should add test suite for `JacksonGenerator`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137706345 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -26,20 +26,50 @@ import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ +// `JackGenerator` can only be initialized with a `StructType` or a `MapType`. +// Once it is initialized with `StructType`, it can be used to write out a struct or an array of +// struct. Once it is initialized with `MapType`, it can be used to write out a map. An exception +// will be thrown if trying to write out a struct if it is initialized with a `MapType`, +// and vice verse. --- End diff -- For this kind of comment, we use the style like: /** * Code comments... * */ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19148#discussion_r137706271 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.nio.file.Files + +import org.apache.spark.TestUtils +import org.apache.spark.sql.{QueryTest, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.CatalogTableType +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.util.Utils + +/** + * Test HiveExternalCatalog backward compatibility. + * + * Note that, this test suite will automatically download spark binary packages of different + * versions to a local directory `/tmp/spark-test`. If there is already a spark folder with + * expected version under this local directory, e.g. `/tmp/spark-test/spark-2.0.3`, we will skip the + * downloading for this spark version. + */ +class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils { + private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse") + private val tmpDataDir = Utils.createTempDir(namePrefix = "test-data") + private val sparkTestingDir = "/tmp/spark-test" + private val unusedJar = TestUtils.createJarWithClasses(Seq.empty) + + override def afterAll(): Unit = { +Utils.deleteRecursively(wareHousePath) +Utils.deleteRecursively(tmpDataDir) +super.afterAll() + } + + private def downloadSpark(version: String): Unit = { +import scala.sys.process._ + +val url = s"https://d3kbcqa49mib13.cloudfront.net/spark-$version-bin-hadoop2.7.tgz"; + +Seq("wget", url, "-q", "-P", sparkTestingDir).! + +val downloaded = new File(sparkTestingDir, s"spark-$version-bin-hadoop2.7.tgz").getCanonicalPath +val targetDir = new File(sparkTestingDir, s"spark-$version").getCanonicalPath + +Seq("mkdir", targetDir).! + +Seq("tar", "-xzf", downloaded, "-C", targetDir, "--strip-components=1").! + +Seq("rm", downloaded).! + } + + private def genDataDir(name: String): String = { +new File(tmpDataDir, name).getCanonicalPath + } + + override def beforeAll(): Unit = { +super.beforeAll() + +val tempPyFile = File.createTempFile("test", ".py") +Files.write(tempPyFile.toPath, + s""" +|from pyspark.sql import SparkSession +| +|spark = SparkSession.builder.enableHiveSupport().getOrCreate() +|version_index = spark.conf.get("spark.sql.test.version.index", None) +| +|spark.sql("create table data_source_tbl_{} using json as select 1 i".format(version_index)) +| +|spark.sql("create table hive_compatible_data_source_tbl_" + version_index + \\ +| " using parquet as select 1 i") +| +|json_file = "${genDataDir("json_")}" + str(version_index) +|spark.range(1, 2).selectExpr("cast(id as int) as i").write.json(json_file) +|spark.sql("create table external_data_source_tbl_" + version_index + \\ +| "(i int) using json options (path '{}')".format(json_file)) +| +|parquet_file = "${genDataDir("parquet_")}" + str(version_index) +|spark.range(1, 2).selectExpr("cast(id as int) as i").write.parquet(parquet_file) +|spark.sql("create table hive_compatible_external_data_source_tbl_" + version_index + \\ +| "(i int) using parquet options (path '{}')".format(parquet_file)) +| +|json_file2 = "${genDataDir("json2_")}" + str(version_index) +|spark.range(1, 2).selectExpr("cast(id as int) as i").write.json(json_file2) +|spark.sql("create table external_tab
[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18266 **[Test build #81536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81536/testReport)** for PR 18266 at commit [`b38a1a8`](https://github.com/apache/spark/commit/b38a1a8b2d9ffee250b9e8637dc579f2a8f9182d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19148#discussion_r137704899 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.nio.file.Files + +import org.apache.spark.TestUtils +import org.apache.spark.sql.{QueryTest, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.CatalogTableType +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.util.Utils + +/** + * Test HiveExternalCatalog backward compatibility. + * + * Note that, this test suite will automatically download spark binary packages of different + * versions to a local directory `/tmp/spark-test`. If there is already a spark folder with + * expected version under this local directory, e.g. `/tmp/spark-test/spark-2.0.3`, we will skip the + * downloading for this spark version. + */ +class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils { + private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse") + private val tmpDataDir = Utils.createTempDir(namePrefix = "test-data") + private val sparkTestingDir = "/tmp/spark-test" + private val unusedJar = TestUtils.createJarWithClasses(Seq.empty) + + override def afterAll(): Unit = { +Utils.deleteRecursively(wareHousePath) +Utils.deleteRecursively(tmpDataDir) +super.afterAll() + } + + private def downloadSpark(version: String): Unit = { +import scala.sys.process._ + +val url = s"https://d3kbcqa49mib13.cloudfront.net/spark-$version-bin-hadoop2.7.tgz"; + +Seq("wget", url, "-q", "-P", sparkTestingDir).! + +val downloaded = new File(sparkTestingDir, s"spark-$version-bin-hadoop2.7.tgz").getCanonicalPath +val targetDir = new File(sparkTestingDir, s"spark-$version").getCanonicalPath + +Seq("mkdir", targetDir).! + +Seq("tar", "-xzf", downloaded, "-C", targetDir, "--strip-components=1").! + +Seq("rm", downloaded).! + } + + private def genDataDir(name: String): String = { +new File(tmpDataDir, name).getCanonicalPath + } + + override def beforeAll(): Unit = { +super.beforeAll() + +val tempPyFile = File.createTempFile("test", ".py") +Files.write(tempPyFile.toPath, + s""" +|from pyspark.sql import SparkSession +| +|spark = SparkSession.builder.enableHiveSupport().getOrCreate() +|version_index = spark.conf.get("spark.sql.test.version.index", None) +| +|spark.sql("create table data_source_tbl_{} using json as select 1 i".format(version_index)) +| +|spark.sql("create table hive_compatible_data_source_tbl_" + version_index + \\ +| " using parquet as select 1 i") +| +|json_file = "${genDataDir("json_")}" + str(version_index) +|spark.range(1, 2).selectExpr("cast(id as int) as i").write.json(json_file) +|spark.sql("create table external_data_source_tbl_" + version_index + \\ +| "(i int) using json options (path '{}')".format(json_file)) +| +|parquet_file = "${genDataDir("parquet_")}" + str(version_index) +|spark.range(1, 2).selectExpr("cast(id as int) as i").write.parquet(parquet_file) +|spark.sql("create table hive_compatible_external_data_source_tbl_" + version_index + \\ +| "(i int) using parquet options (path '{}')".format(parquet_file)) +| +|json_file2 = "${genDataDir("json2_")}" + str(version_index) +|spark.range(1, 2).selectExpr("cast(id as int) as i").write.json(json_file2) +|spark.sql("create table external_table_
[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19148#discussion_r137704429 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.nio.file.Files + +import org.apache.spark.TestUtils +import org.apache.spark.sql.{QueryTest, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.CatalogTableType +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.util.Utils + +/** + * Test HiveExternalCatalog backward compatibility. + * + * Note that, this test suite will automatically download spark binary packages of different + * versions to a local directory `/tmp/spark-test`. If there is already a spark folder with + * expected version under this local directory, e.g. `/tmp/spark-test/spark-2.0.3`, we will skip the + * downloading for this spark version. + */ +class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils { + private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse") + private val tmpDataDir = Utils.createTempDir(namePrefix = "test-data") + private val sparkTestingDir = "/tmp/spark-test" + private val unusedJar = TestUtils.createJarWithClasses(Seq.empty) + + override def afterAll(): Unit = { +Utils.deleteRecursively(wareHousePath) +Utils.deleteRecursively(tmpDataDir) +super.afterAll() + } + + private def downloadSpark(version: String): Unit = { +import scala.sys.process._ + +val url = s"https://d3kbcqa49mib13.cloudfront.net/spark-$version-bin-hadoop2.7.tgz"; + +Seq("wget", url, "-q", "-P", sparkTestingDir).! + +val downloaded = new File(sparkTestingDir, s"spark-$version-bin-hadoop2.7.tgz").getCanonicalPath +val targetDir = new File(sparkTestingDir, s"spark-$version").getCanonicalPath + +Seq("mkdir", targetDir).! + +Seq("tar", "-xzf", downloaded, "-C", targetDir, "--strip-components=1").! + +Seq("rm", downloaded).! + } + + private def genDataDir(name: String): String = { +new File(tmpDataDir, name).getCanonicalPath + } + + override def beforeAll(): Unit = { +super.beforeAll() + +val tempPyFile = File.createTempFile("test", ".py") +Files.write(tempPyFile.toPath, + s""" +|from pyspark.sql import SparkSession +| +|spark = SparkSession.builder.enableHiveSupport().getOrCreate() +|version_index = spark.conf.get("spark.sql.test.version.index", None) +| +|spark.sql("create table data_source_tbl_{} using json as select 1 i".format(version_index)) --- End diff -- Instead of only using lowercase column name, should we use mix-case Hive schema for those tables? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19155: [SPARK-21949][TEST] Tables created in unit tests should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19155 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81533/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19155: [SPARK-21949][TEST] Tables created in unit tests should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19155 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19155: [SPARK-21949][TEST] Tables created in unit tests should ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19155 **[Test build #81533 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81533/testReport)** for PR 19155 at commit [`1d38337`](https://github.com/apache/spark/commit/1d38337b22ea8926aeb1db0591285fbb34f902cc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81532/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19148 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19148 **[Test build #81532 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81532/testReport)** for PR 19148 at commit [`00cdd0a`](https://github.com/apache/spark/commit/00cdd0a63bdd4f531eb06de8d9651e934f2bb448). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19148#discussion_r137703092 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.nio.file.Files + +import org.apache.spark.TestUtils +import org.apache.spark.sql.{QueryTest, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.CatalogTableType +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.util.Utils + +/** + * Test HiveExternalCatalog backward compatibility. + * + * Note that, this test suite will automatically download spark binary packages of different + * versions to a local directory `/tmp/spark-test`. If there is already a spark folder with + * expected version under this local directory, e.g. `/tmp/spark-test/spark-2.0.3`, we will skip the + * downloading for this spark version. + */ +class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils { --- End diff -- Ok. After a build clean it works now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19158 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81534/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19158 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19158 **[Test build #81534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81534/testReport)** for PR 19158 at commit [`134bc26`](https://github.com/apache/spark/commit/134bc267a5ef01d9dea3d08cc255facdd8dfc0c8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18956 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81531/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18956 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18956 **[Test build #81531 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81531/testReport)** for PR 18956 at commit [`ecdfb7d`](https://github.com/apache/spark/commit/ecdfb7db34d0d01e357bff0d32b62137ef0ae735). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19148#discussion_r137700913 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.nio.file.Files + +import org.apache.spark.TestUtils +import org.apache.spark.sql.{QueryTest, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.CatalogTableType +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.util.Utils + +/** + * Test HiveExternalCatalog backward compatibility. + * + * Note that, this test suite will automatically download spark binary packages of different + * versions to a local directory `/tmp/spark-test`. If there is already a spark folder with + * expected version under this local directory, e.g. `/tmp/spark-test/spark-2.0.3`, we will skip the + * downloading for this spark version. + */ +class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils { --- End diff -- Let me do build clean and try again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19148 **[Test build #81535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81535/testReport)** for PR 19148 at commit [`62369e3`](https://github.com/apache/spark/commit/62369e3a07bc23d68068e809edf1c43de448740a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19148#discussion_r137700499 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.nio.file.Files + +import org.apache.spark.TestUtils +import org.apache.spark.sql.{QueryTest, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.CatalogTableType +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.util.Utils + +/** + * Test HiveExternalCatalog backward compatibility. + * + * Note that, this test suite will automatically download spark binary packages of different + * versions to a local directory `/tmp/spark-test`. If there is already a spark folder with + * expected version under this local directory, e.g. `/tmp/spark-test/spark-2.0.3`, we will skip the + * downloading for this spark version. + */ +class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils { --- End diff -- Did you try a clean clone? I added the derby dependency to make the test work on jenkins... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19107: [SPARK-21799][ML] Fix `KMeans` performance regression ca...
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19107 Sorry for the delay, this looks good to me -- thanks @WeichenXu123! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19148#discussion_r137699853 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.nio.file.Files + +import org.apache.spark.TestUtils +import org.apache.spark.sql.{QueryTest, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.CatalogTableType +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.util.Utils + +/** + * Test HiveExternalCatalog backward compatibility. + * + * Note that, this test suite will automatically download spark binary packages of different + * versions to a local directory `/tmp/spark-test`. If there is already a spark folder with + * expected version under this local directory, e.g. `/tmp/spark-test/spark-2.0.3`, we will skip the + * downloading for this spark version. + */ +class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils { --- End diff -- After removing the added derby dependency, this test can work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19148#discussion_r137699802 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/SparkSubmitTestUtils.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.sql.Timestamp +import java.util.Date + +import scala.collection.mutable.ArrayBuffer + +import org.scalatest.concurrent.Timeouts +import org.scalatest.exceptions.TestFailedDueToTimeoutException +import org.scalatest.time.SpanSugar._ + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.test.ProcessTestUtils.ProcessOutputCapturer +import org.apache.spark.util.Utils + +trait SparkSubmitTestUtils extends SparkFunSuite with Timeouts { --- End diff -- nit. Let's use `TimeLimits` instead of `Timeouts`. `Timeouts` is deprecated now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19158 **[Test build #81534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81534/testReport)** for PR 19158 at commit [`134bc26`](https://github.com/apache/spark/commit/134bc267a5ef01d9dea3d08cc255facdd8dfc0c8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19148#discussion_r137699720 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.nio.file.Files + +import org.apache.spark.TestUtils +import org.apache.spark.sql.{QueryTest, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.CatalogTableType +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.util.Utils + +/** + * Test HiveExternalCatalog backward compatibility. + * + * Note that, this test suite will automatically download spark binary packages of different + * versions to a local directory `/tmp/spark-test`. If there is already a spark folder with + * expected version under this local directory, e.g. `/tmp/spark-test/spark-2.0.3`, we will skip the + * downloading for this spark version. + */ +class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils { --- End diff -- can you print `org.apache.derby.tools.sysinfo.getVersionString` in `IsolatedClientLoader.createClient` to see what's your actual derby version? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs in Pytho...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19147 The test failure above should be fixed by #19158. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19148#discussion_r137699367 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.nio.file.Files + +import org.apache.spark.TestUtils +import org.apache.spark.sql.{QueryTest, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.CatalogTableType +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.util.Utils + +/** + * Test HiveExternalCatalog backward compatibility. + * + * Note that, this test suite will automatically download spark binary packages of different + * versions to a local directory `/tmp/spark-test`. If there is already a spark folder with + * expected version under this local directory, e.g. `/tmp/spark-test/spark-2.0.3`, we will skip the + * downloading for this spark version. + */ +class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils { --- End diff -- I ran this test locally and encountered the failure like: 2017-09-07 19:28:07.595 - stderr> Caused by: java.sql.SQLException: Database at /root/repos/spark-1/target/tmp/warehouse-66dad501-c743-4ac3-83cc-51451c6d697a/metastore_db has an incompatible format with the current version of the software. The database was created by or upgraded by version 10.12. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.test...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/19158 [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTests2 should stop SparkContext. ## What changes were proposed in this pull request? `pyspark.sql.tests.SQLTests2` doesn't stop newly created spark context in the test and it might affect the following tests. This pr makes `pyspark.sql.tests.SQLTests2` stop `SparkContext`. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-21950 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19158.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19158 commit 134bc267a5ef01d9dea3d08cc255facdd8dfc0c8 Author: Takuya UESHIN Date: 2017-09-08T02:34:41Z Make pyspark.sql.tests.SQLTests2 stop SparkContext. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [DO NOT MERGE][SPARK-15689][SQL] data source v2
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r137699153 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/upward/StatisticsSupport.java --- @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader.upward; + +/** + * A mix in interface for `DataSourceV2Reader`. Users can implement this interface to report + * statistics to Spark. + */ +public interface StatisticsSupport { --- End diff -- I'd like to put column stats in a separated interface, because we already separate basic stats and column stats in `ANALYZE TABLE`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [DO NOT MERGE][SPARK-15689][SQL] data source v2
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r137698996 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.spark.sql.catalyst.expressions.AttributeReference +import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, Statistics} +import org.apache.spark.sql.sources.v2.reader.DataSourceV2Reader +import org.apache.spark.sql.sources.v2.reader.upward.StatisticsSupport + +case class DataSourceV2Relation( +output: Seq[AttributeReference], +reader: DataSourceV2Reader) extends LeafNode { + + override def computeStats(): Statistics = reader match { +case r: StatisticsSupport => Statistics(sizeInBytes = r.getStatistics.sizeInBytes()) +case _ => Statistics(sizeInBytes = conf.defaultSizeInBytes) + } +} + +object DataSourceV2Relation { + def apply(reader: DataSourceV2Reader): DataSourceV2Relation = { +new DataSourceV2Relation(reader.readSchema().toAttributes, reader) --- End diff -- In data source V2, we will delegate partition pruning to the data source, although we need to do some refactoring to make it happen. > I was just looking into how the data source should provide partition data, or at least fields that are the same for all rows in a `ReadTask`. It would be nice to have a way to pass those up instead of materializing them in each `UnsafeRow`. This can be achieved by the columnar reader. Think about a data source having a data column `i` and a partition column `j`, the returned columnar batch has 2 column vectors for `i` and `j`. Column vector `i` is a normal one that contains all the values of column `i` within this batch, column vector `j` is a constant vector that only contains a single value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19107: [SPARK-21799][ML] Fix `KMeans` performance regression ca...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19107 cc @smurching Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...
Github user caneGuy commented on the issue: https://github.com/apache/spark/pull/19132 @vanzin @zsxwing could you help reivew this?Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18956 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18956 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81529/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19155: [SPARK-21949][TEST] Tables created in unit tests should ...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/19155 @dongjoon-hyun thanks, I have created a JIRA issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18956 **[Test build #81529 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81529/testReport)** for PR 18956 at commit [`d1db7cf`](https://github.com/apache/spark/commit/d1db7cf815d447b195c907fb159ed0a6770c537b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19155: [MINOR][TEST] Tables created in unit tests should be dro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19155 **[Test build #81533 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81533/testReport)** for PR 19155 at commit [`1d38337`](https://github.com/apache/spark/commit/1d38337b22ea8926aeb1db0591285fbb34f902cc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19157: [SPARK-20589][Core][Scheduler] Allow limiting task concu...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19157 @dhruve, FYI, AppVeyor CI only runs SparkR tests on Windows only when there are changes in R related codes: https://github.com/apache/spark/blob/75a6d05853fea13f88e3c941b1959b24e4640824/appveyor.yml#L29-L34 Thing is, it looks when `git merge` is performed, https://github.com/apache/spark/commit/8b3830004d69bd5f109fd9846f59583c23a910c7 (not `rebase`), that merging commit one includes usually some changes in R and then the CI is triggered, which is actually quite moderate. So, I think generally we should rebase it when there are conflicts. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19144: [UI][Streaming]Modify the title, 'Records' instead of 'I...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/19144 @zsxwing Help to review the code, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19150: [SPARK-21939][TEST] Use TimeLimits instead of Tim...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19150 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19150: [SPARK-21939][TEST] Use TimeLimits instead of Timeouts
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19150 Thank you for review and merging, @jerryshao ! Also, thank you for review and approving, @HyukjinKwon and @srowen . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19150: [SPARK-21939][TEST] Use TimeLimits instead of Timeouts
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19150 Merging to master, thanks @dongjoon-hyun . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict between ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19149 Except that, Isolation of `InferFiltersFromConstraints` looks good to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict between ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19149 Hi, @gatorsmile . According to the PR description, it's about `PruneFilters`. Do we need a test case because SPARK-21652 is about `ConstantPropagation`, not `PruneFilters`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18029 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81530/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18029 **[Test build #81530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81530/testReport)** for PR 18029 at commit [`cef5cde`](https://github.com/apache/spark/commit/cef5cdece2bd2a7c95e19493c511d602c1b46461). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class KinesisInitialPosition ` * `sealed trait InitialPosition ` * `case class AtTimestamp(timestamp: Date) extends InitialPosition ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18029 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19148 **[Test build #81532 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81532/testReport)** for PR 19148 at commit [`00cdd0a`](https://github.com/apache/spark/commit/00cdd0a63bdd4f531eb06de8d9651e934f2bb448). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19148#discussion_r137686311 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala --- @@ -0,0 +1,193 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.nio.file.Files + +import org.apache.spark.TestUtils +import org.apache.spark.sql.{QueryTest, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.CatalogTableType +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.util.Utils + +/** + * Test HiveExternalCatalog backward compatibility. + * + * Note that, this test suite will automatically download spark binary packages of different + * versions to a local directory `/tmp/spark-test`. If there is already a spark folder with + * expected version under this local directory, e.g. `/tmp/spark-test/spark-2.0.3`, we will skip the + * downloading for this spark version. + */ +class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils { + private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse") + private val tmpDataDir = Utils.createTempDir(namePrefix = "test-data") + private val sparkTestingDir = "/tmp/spark-test" + private val unusedJar = TestUtils.createJarWithClasses(Seq.empty) + + override def afterAll(): Unit = { +Utils.deleteRecursively(wareHousePath) --- End diff -- I wanna keep the `sparkTestingDir`, so we don't need to download spark again if this jenkins machine has already run this suite before. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18956 **[Test build #81531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81531/testReport)** for PR 18956 at commit [`ecdfb7d`](https://github.com/apache/spark/commit/ecdfb7db34d0d01e357bff0d32b62137ef0ae735). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17435: [SPARK-20098][PYSPARK] dataType's typeName fix
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17435#discussion_r137685731 --- Diff: python/pyspark/sql/types.py --- @@ -438,6 +438,11 @@ def toInternal(self, obj): def fromInternal(self, obj): return self.dataType.fromInternal(obj) +def typeName(self): +raise TypeError( +"StructField does not have typename. \ --- End diff -- Little nit: looks a typo, typename -> typeName. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17435: [SPARK-20098][PYSPARK] dataType's typeName fix
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17435#discussion_r137685629 --- Diff: python/pyspark/sql/types.py --- @@ -438,6 +438,11 @@ def toInternal(self, obj): def fromInternal(self, obj): return self.dataType.fromInternal(obj) +def typeName(self): +raise TypeError( +"StructField does not have typename. \ +You can use self.dataType.simpleString() instead.") --- End diff -- I'd remove `self` here and just say something like ` use typeName() on its type explicitly ...`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18029 **[Test build #81530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81530/testReport)** for PR 18029 at commit [`cef5cde`](https://github.com/apache/spark/commit/cef5cdece2bd2a7c95e19493c511d602c1b46461). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18029: [SPARK-20168] [DStream] Add changes to use kinesi...
Github user yssharma commented on a diff in the pull request: https://github.com/apache/spark/pull/18029#discussion_r137684968 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/InitialPosition.scala --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.streaming.kinesis + +import java.util.Date + +import com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream + +/** + * Trait for Kinesis's InitialPositionInStream. + * This will be overridden by more specific types. + */ +sealed trait InitialPosition { + val initialPositionInStream: InitialPositionInStream +} + +/** + * Case object for Kinesis's InitialPositionInStream.LATEST. + */ +case object Latest extends InitialPosition { + val instance: InitialPosition = this + override val initialPositionInStream: InitialPositionInStream += InitialPositionInStream.LATEST +} + +/** + * Case object for Kinesis's InitialPositionInStream.TRIM_HORIZON. + */ +case object TrimHorizon extends InitialPosition { + val instance: InitialPosition = this + override val initialPositionInStream: InitialPositionInStream += InitialPositionInStream.TRIM_HORIZON +} + +/** + * Case object for Kinesis's InitialPositionInStream.AT_TIMESTAMP. + */ +case class AtTimestamp(timestamp: Date) extends InitialPosition { + val instance: InitialPosition = this + override val initialPositionInStream: InitialPositionInStream += InitialPositionInStream.AT_TIMESTAMP +} + +/** + * Companion object for InitialPosition that returns + * appropriate version of InitialPositionInStream. + */ +object InitialPosition { --- End diff -- I've implemented the functions with this Capital naming, but still feel a bit salty about this :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17435: [SPARK-20098][PYSPARK] dataType's typeName fix
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17435#discussion_r137684263 --- Diff: python/pyspark/sql/types.py --- @@ -438,6 +438,11 @@ def toInternal(self, obj): def fromInternal(self, obj): return self.dataType.fromInternal(obj) +def typeName(self): +raise TypeError( --- End diff -- Could we do like ... ```python raise TypeError( "..." "...") ``` if it doesn't bother you much? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org