[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97477/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 **[Test build #97477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97477/testReport)** for PR 22608 at commit [`5d270f1`](https://github.com/apache/spark/commit/5d270f17dccbb2eac6d3c2ab8c12987e3d992086). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20433 @maropu Thanks! This is great to make our Spark SQL parser fully compatible with ANSI SQL. Please continue the efforts! cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20433#discussion_r225784123 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -335,6 +335,12 @@ object SQLConf { .booleanConf .createWithDefault(true) + val ANSI_SQL_PARSER = +buildConf("spark.sql.parser.ansi.enabled") + .doc("When true, tries to conform to ANSI SQL syntax.") + .booleanConf + .createWithDefault(false) --- End diff -- Since the next is the 3.0 release, we will turn this on by default. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20433#discussion_r225783980 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -335,6 +335,12 @@ object SQLConf { .booleanConf .createWithDefault(true) + val ANSI_SQL_PARSER = --- End diff -- The legacy flag will be removed in 3.0 release. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22746#discussion_r225783658 --- Diff: docs/sql-reference.md --- @@ -0,0 +1,641 @@ +--- +layout: global +title: Reference +displayTitle: Reference +--- + +* Table of contents +{:toc} + +## Data Types + +Spark SQL and DataFrames support the following data types: + +* Numeric types +- `ByteType`: Represents 1-byte signed integer numbers. --- End diff -- nit: use 2 space indent. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4052/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22749 **[Test build #97480 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97480/testReport)** for PR 22749 at commit [`25a6162`](https://github.com/apache/spark/commit/25a616286075ca4f0a7d528095b387172b05c6c3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22219: [SPARK-25224][SQL] Improvement of Spark SQL ThriftServer...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22219 cc @srinathshankar @yuchenhuo --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22746#discussion_r225780740 --- Diff: docs/sql-getting-started.md --- @@ -0,0 +1,369 @@ +--- +layout: global +title: Getting Started +displayTitle: Getting Started +--- + +* Table of contents +{:toc} + +## Starting Point: SparkSession + + + + +The entry point into all functionality in Spark is the [`SparkSession`](api/scala/index.html#org.apache.spark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder()`: + +{% include_example init_session scala/org/apache/spark/examples/sql/SparkSQLExample.scala %} + + + + +The entry point into all functionality in Spark is the [`SparkSession`](api/java/index.html#org.apache.spark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder()`: + +{% include_example init_session java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %} + + + + +The entry point into all functionality in Spark is the [`SparkSession`](api/python/pyspark.sql.html#pyspark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder`: + +{% include_example init_session python/sql/basic.py %} + + + + +The entry point into all functionality in Spark is the [`SparkSession`](api/R/sparkR.session.html) class. To initialize a basic `SparkSession`, just call `sparkR.session()`: + +{% include_example init_session r/RSparkSQLExample.R %} + +Note that when invoked for the first time, `sparkR.session()` initializes a global `SparkSession` singleton instance, and always returns a reference to this instance for successive invocations. In this way, users only need to initialize the `SparkSession` once, then SparkR functions like `read.df` will be able to access this global instance implicitly, and users don't need to pass the `SparkSession` instance around. + + + +`SparkSession` in Spark 2.0 provides builtin support for Hive features including the ability to +write queries using HiveQL, access to Hive UDFs, and the ability to read data from Hive tables. +To use these features, you do not need to have an existing Hive setup. + +## Creating DataFrames + + + +With a `SparkSession`, applications can create DataFrames from an [existing `RDD`](#interoperating-with-rdds), +from a Hive table, or from [Spark data sources](#data-sources). --- End diff -- The link `[Spark data sources](#data-sources)` does not work after this change. Could you fix all the similar cases? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22694: [SQL][CATALYST][MINOR] update some error comments
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22694 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22694: [SQL][CATALYST][MINOR] update some error comments
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22694 Merged to master and branch-2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22503 @justinuang, okay. Mind rebasing this please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22263 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22263 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97476/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22263 **[Test build #97476 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97476/testReport)** for PR 22263 at commit [`5e088b8`](https://github.com/apache/spark/commit/5e088b86822dd6b1bf4c3bb085fde3c96af03658). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22295 @huaxingao, thanks for addressing comments. Would you mind rebasing it and resolving the conflicts? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingListener...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22752 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97474/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingListener...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22752 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingListener...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22752 **[Test build #97474 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97474/testReport)** for PR 22752 at commit [`a3f53c4`](https://github.com/apache/spark/commit/a3f53c41879e28d71d4dbd79d80a51e50d82ecee). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22482 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22482 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97475/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22482 **[Test build #97475 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97475/testReport)** for PR 22482 at commit [`5c74609`](https://github.com/apache/spark/commit/5c746090a8d5560f043754383656d54653a315dc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22729: [SPARK-25737][CORE] Remove JavaSparkContextVarargsWorkar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22729 **[Test build #4380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4380/testReport)** for PR 22729 at commit [`0860d27`](https://github.com/apache/spark/commit/0860d27a205d3dd3d94e6bbe2c9db49b7e432ef4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22749 **[Test build #97479 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97479/testReport)** for PR 22749 at commit [`6a6fa45`](https://github.com/apache/spark/commit/6a6fa454e22728cc2ad8e5515cd587fe0be84b26). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97479/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225769471 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.List; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.ArrayType; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +public class JavaBeanWithArraySuite { + +private static final List RECORDS = new ArrayList<>(); + +static { +RECORDS.add(new Record(1, +Arrays.asList(new Interval(111, 211), new Interval(121, 221)), +Arrays.asList(11, 21, 31, 41) +)); +RECORDS.add(new Record(2, +Arrays.asList(new Interval(112, 212), new Interval(122, 222)), +Arrays.asList(12, 22, 32, 42) +)); +RECORDS.add(new Record(3, +Arrays.asList(new Interval(113, 213), new Interval(123, 223)), +Arrays.asList(13, 23, 33, 43) +)); +} + +private TestSparkSession spark; + +@Before +public void setUp() { +spark = new TestSparkSession(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void testBeanWithArrayFieldsDeserialization() { + +StructType schema = createSchema(); +Encoder encoder = Encoders.bean(Record.class); + +Dataset dataset = spark +.read() +.format("json") +.schema(schema) +.load("src/test/resources/test-data/with-array-fields") +.as(encoder); + +List records = dataset.collectAsList(); + +Assert.assertTrue(Util.equals(records, RECORDS)); +} + +private static StructType createSchema() { +StructField[] intervalFields = { +new StructField("startTime", DataTypes.LongType, true, Metadata.empty()), +new StructField("endTime", DataTypes.LongType, true, Metadata.empty()) +}; +DataType intervalType = new StructType(intervalFields); + +DataType intervalsType = new ArrayType(intervalType, true); + +DataType valuesType = new ArrayType(DataTypes.IntegerType, true); + +StructField[] fields = { +new StructField("id", DataTypes.IntegerType, true, Metadata.empty()), +new StructField("intervals", intervalsType, true, Metadata.empty()), +new StructField("values", valuesType, true, Metadata.empty()) +}; +return new StructType(fields); +} + +public static class Record { + +private int id; +private List intervals; +private List values; + +public Record() { } + +Record(int id, List intervals, List values) { +this.id = id; +this.intervals = intervals; +this.values = values; +} + +public int getId() { +return id; +} + +public void setId(int id) {
[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225768857 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.List; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.ArrayType; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +public class JavaBeanWithArraySuite { + +private static final List RECORDS = new ArrayList<>(); + +static { +RECORDS.add(new Record(1, +Arrays.asList(new Interval(111, 211), new Interval(121, 221)), +Arrays.asList(11, 21, 31, 41) +)); +RECORDS.add(new Record(2, +Arrays.asList(new Interval(112, 212), new Interval(122, 222)), +Arrays.asList(12, 22, 32, 42) +)); +RECORDS.add(new Record(3, +Arrays.asList(new Interval(113, 213), new Interval(123, 223)), +Arrays.asList(13, 23, 33, 43) +)); +} + +private TestSparkSession spark; + +@Before +public void setUp() { +spark = new TestSparkSession(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void testBeanWithArrayFieldsDeserialization() { + +StructType schema = createSchema(); +Encoder encoder = Encoders.bean(Record.class); + +Dataset dataset = spark +.read() +.format("json") +.schema(schema) +.load("src/test/resources/test-data/with-array-fields") +.as(encoder); + +List records = dataset.collectAsList(); + +Assert.assertTrue(Util.equals(records, RECORDS)); +} + +private static StructType createSchema() { +StructField[] intervalFields = { +new StructField("startTime", DataTypes.LongType, true, Metadata.empty()), +new StructField("endTime", DataTypes.LongType, true, Metadata.empty()) +}; +DataType intervalType = new StructType(intervalFields); + +DataType intervalsType = new ArrayType(intervalType, true); + +DataType valuesType = new ArrayType(DataTypes.IntegerType, true); + +StructField[] fields = { +new StructField("id", DataTypes.IntegerType, true, Metadata.empty()), +new StructField("intervals", intervalsType, true, Metadata.empty()), +new StructField("values", valuesType, true, Metadata.empty()) +}; +return new StructType(fields); +} + +public static class Record { + +private int id; +private List intervals; +private List values; --- End diff -- Will this list of int affect the test? If no, maybe we can get rid of it to simplify the test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22745: [SPARK-21402][SQL][FOLLOW-UP] Fix java map of str...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22745#discussion_r225768707 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithMapSuite.java --- @@ -0,0 +1,257 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.MapType; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +public class JavaBeanWithMapSuite { + +private static final List RECORDS = new ArrayList<>(); + +static { +RECORDS.add(new Record(1, +toMap( +Arrays.asList("a", "b"), +Arrays.asList(new Interval(111, 211), new Interval(121, 221)) +), +toMap(Arrays.asList("a", "b", "c"), Arrays.asList(11, 21, 31)) +)); +RECORDS.add(new Record(2, +toMap( +Arrays.asList("a", "b"), +Arrays.asList(new Interval(112, 212), new Interval(122, 222)) +), +toMap(Arrays.asList("a", "b", "c"), Arrays.asList(12, 22, 32)) +)); +RECORDS.add(new Record(3, +toMap( +Arrays.asList("a", "b"), +Arrays.asList(new Interval(113, 213), new Interval(123, 223)) +), +toMap(Arrays.asList("a", "b", "c"), Arrays.asList(13, 23, 33)) +)); +} + +private static Map toMap(Collection keys, Collection values) { +Map map = new HashMap<>(); +Iterator keyI = keys.iterator(); +Iterator valueI = values.iterator(); +while (keyI.hasNext() && valueI.hasNext()) { +map.put(keyI.next(), valueI.next()); +} +return map; +} + +private TestSparkSession spark; + +@Before +public void setUp() { +spark = new TestSparkSession(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void testBeanWithMapFieldsDeserialization() { + +StructType schema = createSchema(); +Encoder encoder = Encoders.bean(Record.class); + +Dataset dataset = spark +.read() +.format("json") +.schema(schema) +.load("src/test/resources/test-data/with-map-fields") +.as(encoder); + +List records = dataset.collectAsList(); + +Assert.assertTrue(Util.equals(records, RECORDS)); +} + +private static StructType createSchema() { +StructField[] intervalFields = { +new StructField("startTime", DataTypes.LongType, true, Metadata.empty()), +new StructField("endTime", DataTypes.LongType, true, Metadata.empty()) +}; +DataType intervalType = new StructType(intervalFields); + +DataType intervalsType = new MapType(DataTypes.StringType, intervalType, true); + +DataType valuesType = new MapType(DataTypes.StringType,
[GitHub] spark pull request #22724: [SPARK-25734][SQL] Literal should have a value co...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22724 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22724: [SPARK-25734][SQL] Literal should have a value correspon...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22724 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225767103 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.List; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.ArrayType; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +public class JavaBeanWithArraySuite { + +private static final List RECORDS = new ArrayList<>(); + +static { +RECORDS.add(new Record(1, +Arrays.asList(new Interval(111, 211), new Interval(121, 221)), +Arrays.asList(11, 21, 31, 41) +)); +RECORDS.add(new Record(2, +Arrays.asList(new Interval(112, 212), new Interval(122, 222)), +Arrays.asList(12, 22, 32, 42) +)); +RECORDS.add(new Record(3, +Arrays.asList(new Interval(113, 213), new Interval(123, 223)), +Arrays.asList(13, 23, 33, 43) +)); +} + +private TestSparkSession spark; + +@Before +public void setUp() { +spark = new TestSparkSession(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void testBeanWithArrayFieldsDeserialization() { + +StructType schema = createSchema(); +Encoder encoder = Encoders.bean(Record.class); + +Dataset dataset = spark +.read() +.format("json") +.schema(schema) +.load("src/test/resources/test-data/with-array-fields") +.as(encoder); + +List records = dataset.collectAsList(); + +Assert.assertTrue(Util.equals(records, RECORDS)); +} + +private static StructType createSchema() { +StructField[] intervalFields = { +new StructField("startTime", DataTypes.LongType, true, Metadata.empty()), +new StructField("endTime", DataTypes.LongType, true, Metadata.empty()) +}; +DataType intervalType = new StructType(intervalFields); + +DataType intervalsType = new ArrayType(intervalType, true); + +DataType valuesType = new ArrayType(DataTypes.IntegerType, true); + +StructField[] fields = { +new StructField("id", DataTypes.IntegerType, true, Metadata.empty()), +new StructField("intervals", intervalsType, true, Metadata.empty()), +new StructField("values", valuesType, true, Metadata.empty()) +}; +return new StructType(fields); +} + +public static class Record { + +private int id; +private List intervals; +private List values; + +public Record() { } + +Record(int id, List intervals, List values) { +this.id = id; +this.intervals = intervals; +this.values = values; +} + +public int getId() { +return id; +} + +public void setId(int id)
[GitHub] spark issue #22745: [SPARK-21402][SQL][FOLLOW-UP] Fix java map of structs de...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22745 It's a different issue, I think it worth a new ticket --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22732#discussion_r225764876 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -81,11 +81,11 @@ case class UserDefinedFunction protected[sql] ( f, dataType, exprs.map(_.expr), + nullableTypes.map(_.map(!_)).getOrElse(exprs.map(_ => false)), --- End diff -- Hm, but we can't use getParameterTypes anymore. It won't work in Scala 2.12. Where the nullability info is definitely not available, be conservative and assume it all needs null handling? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22746 @gatorsmile Sorry for the late on this, please have a look when you have time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4051/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4050/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4050/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/22732#discussion_r225762708 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -81,11 +81,11 @@ case class UserDefinedFunction protected[sql] ( f, dataType, exprs.map(_.expr), + nullableTypes.map(_.map(!_)).getOrElse(exprs.map(_ => false)), --- End diff -- In addition to what I just pointed out, which is when we did try to get `inputSchemas` through `ScalaReflection.schemaFor` and got an exception for unrecognized types, there's another case where we could get an unspecified `nullableTypes`, and that is when `UserDefinedFunction` is instantiated calling the constructor but not the `create` method. Then I assume it's created by an earlier version, and we should use the old logic, i.e., `ScalaReflection.getParameterTypes` (https://github.com/apache/spark/pull/22259/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2L2153) to get the correct information for `nullableTypes`. Is that right, @cloud-fan @srowen ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22749 **[Test build #97479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97479/testReport)** for PR 22749 at commit [`6a6fa45`](https://github.com/apache/spark/commit/6a6fa454e22728cc2ad8e5515cd587fe0be84b26). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4050/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225762148 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging { SparkSession.clearDefaultSession() } } + + /** + * Initialize extensions if the user has defined a configurator class in their SparkConf. + * This class will be applied to the extensions passed into this function. + */ + private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) { --- End diff -- Oh, I see, moving to the default constructor was not a good idea. How about the first suggestion? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r225762035 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext } } + test("SQL interface support storageLevel(DISK_ONLY)") { --- End diff -- How about this: ```scala Seq("LAZY", "").foreach { isLazy => Seq(true, false).foreach { withInvalidOptions => Seq(true, false).foreach { withCacheTempView => Map("DISK_ONLY" -> Disk, "MEMORY_ONLY" -> Memory).foreach { case (storageLevel, dataReadMethod) => val testName = s"SQL interface support option: storageLevel: $storageLevel, " + s"isLazy: ${isLazy.equals("LAZY")}, " + s"withInvalidOptions: $withInvalidOptions, withCacheTempView: $withCacheTempView" val cacheOption = if (withInvalidOptions) { s"OPTIONS('storageLevel' '$storageLevel', 'a' '1', 'b' '2')" } else { s"OPTIONS('storageLevel' '$storageLevel')" } test(testName) { if (withCacheTempView) { withTempView("testSelect") { sql(s"CACHE $isLazy TABLE testSelect $cacheOption SELECT * FROM testData") assertCached(spark.table("testSelect")) val rddId = rddIdOf("testSelect") if (isLazy.equals("LAZY")) { sql("SELECT COUNT(*) FROM testSelect").collect() } assert(isExpectStorageLevel(rddId, dataReadMethod)) } } else { sql(s"CACHE $isLazy TABLE testData $cacheOption") assertCached(spark.table("testData")) val rddId = rddIdOf("testData") if (isLazy.equals("LAZY")) { sql("SELECT COUNT(*) FROM testData").collect() } assert(isExpectStorageLevel(rddId, dataReadMethod)) } } } } } } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21588 @rxin and @gatorsmile, WDYT? I already had to argue about Hadoop 3 support here and there (for instance see [SPARK-18112|https://issues.apache.org/jira/browse/SPARK-18112] and [SPARK-18673|https://issues.apache.org/jira/browse/SPARK-18673]), and explain what's going on. Looks ideally we should go ahead 2. (https://github.com/apache/spark/pull/21588#issuecomment-429272279) if I am not mistaken. If there are some more concerns we should address before going ahead, definitely I am willing to help investigating as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...
Github user rezasafi commented on the issue: https://github.com/apache/spark/pull/22612 Jenkins retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 **[Test build #97478 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97478/testReport)** for PR 22608 at commit [`4c9b886`](https://github.com/apache/spark/commit/4c9b886c1f23bbdd3d8e1ec7df25f03e45892d88). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4049/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4049/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4049/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22707#discussion_r225759293 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -227,18 +227,22 @@ case class InsertIntoHiveTable( // Newer Hive largely improves insert overwrite performance. As Spark uses older Hive // version and we may not want to catch up new Hive version every time. We delete the // Hive partition first and then load data file into the Hive partition. - if (oldPart.nonEmpty && overwrite) { -oldPart.get.storage.locationUri.foreach { uri => - val partitionPath = new Path(uri) - val fs = partitionPath.getFileSystem(hadoopConf) - if (fs.exists(partitionPath)) { -if (!fs.delete(partitionPath, true)) { - throw new RuntimeException( -"Cannot remove partition directory '" + partitionPath.toString) -} -// Don't let Hive do overwrite operation since it is slower. -doHiveOverwrite = false + if (overwrite) { +val oldPartitionPath = oldPart.flatMap(_.storage.locationUri.map(new Path(_))) + .getOrElse { +ExternalCatalogUtils.generatePartitionPath( + partitionSpec, + partitionColumnNames, + HiveClientImpl.toHiveTable(table).getDataLocation) --- End diff -- Looks correct as I saw we assign `CatalogTable.storage.locationUr` to HiveTable's data location. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 **[Test build #97477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97477/testReport)** for PR 22608 at commit [`5d270f1`](https://github.com/apache/spark/commit/5d270f17dccbb2eac6d3c2ab8c12987e3d992086). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22379 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21588 Thanks @HyukjinKwon Upgrade Hive to 2.3.2 can fix [SPARK-12014](https://issues.apache.org/jira/browse/SPARK-12014), [SPARK-18673](https://issues.apache.org/jira/browse/SPARK-18673), [SPARK-24766](https://issues.apache.org/jira/browse/SPARK-24766) and [SPARK-25193](https://issues.apache.org/jira/browse/SPARK-25193). Also, can improve the performance of the [SPARK-18107](https://issues.apache.org/jira/browse/SPARK-18107). Seems it doesn’t break backward compatibility. I have verified it in our production environment (Hive 1.2.1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22666 Woah .. let me resolve the conflicts tonight. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22379 Thanks all! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22379 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...
Github user fjh100456 commented on a diff in the pull request: https://github.com/apache/spark/pull/22707#discussion_r225756219 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -227,18 +227,22 @@ case class InsertIntoHiveTable( // Newer Hive largely improves insert overwrite performance. As Spark uses older Hive // version and we may not want to catch up new Hive version every time. We delete the // Hive partition first and then load data file into the Hive partition. - if (oldPart.nonEmpty && overwrite) { -oldPart.get.storage.locationUri.foreach { uri => - val partitionPath = new Path(uri) - val fs = partitionPath.getFileSystem(hadoopConf) - if (fs.exists(partitionPath)) { -if (!fs.delete(partitionPath, true)) { - throw new RuntimeException( -"Cannot remove partition directory '" + partitionPath.toString) -} -// Don't let Hive do overwrite operation since it is slower. -doHiveOverwrite = false + if (overwrite) { +val oldPartitionPath = oldPart.flatMap(_.storage.locationUri.map(new Path(_))) + .getOrElse { +ExternalCatalogUtils.generatePartitionPath( + partitionSpec, + partitionColumnNames, + HiveClientImpl.toHiveTable(table).getDataLocation) --- End diff -- > > > `HiveClientImpl.toHiveTable(table).getDataLocation` -> `new Path(table.location)`? Yes, they get the same value. I'll change it , thank you very much. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22748: [SPARK-25745][K8S] Improve docker-image-tool.sh script
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/22748 There seems to be overlapping logic between this PR and https://github.com/apache/spark/pull/22681 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21990 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21990 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97472/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21990 **[Test build #97472 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97472/testReport)** for PR 21990 at commit [`d9b2a55`](https://github.com/apache/spark/commit/d9b2a55275b74c406d9f9c435bf1b53a6ef4b35a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22745: [SPARK-21402][SQL][FOLLOW-UP] Fix java map of structs de...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22745 Is this a separate PR because this part is pretty separable, and you think could be considered separately? if it's all part of one logical change that should go in together or not at all, they can be in the original PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/22598#discussion_r225752604 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/TokenUtil.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.kafka010 + +import java.text.SimpleDateFormat +import java.util.Properties + +import org.apache.hadoop.io.Text +import org.apache.hadoop.security.token.{Token, TokenIdentifier} +import org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier +import org.apache.kafka.clients.CommonClientConfigs +import org.apache.kafka.clients.admin.{AdminClient, CreateDelegationTokenOptions} +import org.apache.kafka.common.config.SaslConfigs +import org.apache.kafka.common.security.token.delegation.DelegationToken + +import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ + +private[kafka010] object TokenUtil extends Logging { + private[kafka010] val TOKEN_KIND = new Text("KAFKA_DELEGATION_TOKEN") + private[kafka010] val TOKEN_SERVICE = new Text("kafka.server.delegation.token") + + private[kafka010] class KafkaDelegationTokenIdentifier extends AbstractDelegationTokenIdentifier { +override def getKind: Text = TOKEN_KIND; + } + + private def printToken(token: DelegationToken): Unit = { +if (log.isDebugEnabled) { + val dateFormat = new SimpleDateFormat("-MM-dd'T'HH:mm") + logDebug("%-15s %-30s %-15s %-25s %-15s %-15s %-15s".format( +"TOKENID", "HMAC", "OWNER", "RENEWERS", "ISSUEDATE", "EXPIRYDATE", "MAXDATE")) + val tokenInfo = token.tokenInfo + logDebug("%-15s [hidden] %-15s %-25s %-15s %-15s %-15s".format( +tokenInfo.tokenId, +tokenInfo.owner, +tokenInfo.renewersAsString, +dateFormat.format(tokenInfo.issueTimestamp), +dateFormat.format(tokenInfo.expiryTimestamp), +dateFormat.format(tokenInfo.maxTimestamp))) +} + } + + private[kafka010] def createAdminClientProperties(sparkConf: SparkConf): Properties = { +val adminClientProperties = new Properties + +val bootstrapServers = sparkConf.get(KAFKA_BOOTSTRAP_SERVERS) +require(bootstrapServers.nonEmpty, s"Tried to obtain kafka delegation token but bootstrap " + + "servers not configured.") + adminClientProperties.put(CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers.get) + +val protocol = sparkConf.get(KAFKA_SECURITY_PROTOCOL) + adminClientProperties.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, protocol) +if (protocol.endsWith("SSL")) { + logInfo("SSL protocol detected.") + sparkConf.get(KAFKA_TRUSTSTORE_LOCATION).foreach { truststoreLocation => +adminClientProperties.put("ssl.truststore.location", truststoreLocation) + } + sparkConf.get(KAFKA_TRUSTSTORE_PASSWORD).foreach { truststorePassword => +adminClientProperties.put("ssl.truststore.password", truststorePassword) + } +} else { + logWarning("Obtaining kafka delegation token through plain communication channel. Please " + +"consider the security impact.") +} + +// There are multiple possibilities to log in: +// - Keytab is provided -> try to log in with kerberos module using kafka's dynamic JAAS +// configuration. +// - Keytab not provided -> try to log in with JVM global security configuration +// which can be configured for example with 'java.security.auth.login.config'. +// For this no additional parameter needed. +KafkaSecurityHelper.getKeytabJaasParams(sparkConf).foreach { jaasParams => + logInfo("Keytab detected, using it for login.") + adminClientProperties.put(SaslConfigs.SASL_MECHANISM, SaslConfigs.GSSAPI_MECHANISM)
[GitHub] spark issue #22725: [SPARK-24610][[CORE][FOLLOW-UP]fix reading small files v...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/22725 @tgravescs ok, I will do it ,thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225752208 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.List; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.ArrayType; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; --- End diff -- If we remove `createSchema`, we can remove Line 35 ~ 40 , too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225751969 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.List; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.ArrayType; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +public class JavaBeanWithArraySuite { + +private static final List RECORDS = new ArrayList<>(); + +static { +RECORDS.add(new Record(1, +Arrays.asList(new Interval(111, 211), new Interval(121, 221)), +Arrays.asList(11, 21, 31, 41) +)); +RECORDS.add(new Record(2, +Arrays.asList(new Interval(112, 212), new Interval(122, 222)), +Arrays.asList(12, 22, 32, 42) +)); +RECORDS.add(new Record(3, +Arrays.asList(new Interval(113, 213), new Interval(123, 223)), +Arrays.asList(13, 23, 33, 43) +)); +} + +private TestSparkSession spark; + +@Before +public void setUp() { +spark = new TestSparkSession(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void testBeanWithArrayFieldsDeserialization() { + +StructType schema = createSchema(); +Encoder encoder = Encoders.bean(Record.class); + +Dataset dataset = spark +.read() +.format("json") +.schema(schema) +.load("src/test/resources/test-data/with-array-fields") +.as(encoder); + +List records = dataset.collectAsList(); + +Assert.assertTrue(Util.equals(records, RECORDS)); +} + +private static StructType createSchema() { +StructField[] intervalFields = { +new StructField("startTime", DataTypes.LongType, true, Metadata.empty()), +new StructField("endTime", DataTypes.LongType, true, Metadata.empty()) +}; +DataType intervalType = new StructType(intervalFields); + +DataType intervalsType = new ArrayType(intervalType, true); + +DataType valuesType = new ArrayType(DataTypes.IntegerType, true); + +StructField[] fields = { +new StructField("id", DataTypes.IntegerType, true, Metadata.empty()), +new StructField("intervals", intervalsType, true, Metadata.empty()), +new StructField("values", valuesType, true, Metadata.empty()) +}; +return new StructType(fields); +} + +public static class Record { + +private int id; +private List intervals; +private List values; + +public Record() { } + +Record(int id, List intervals, List values) { +this.id = id; +this.intervals = intervals; +this.values = values; +} + +public int getId() { +return id; +} + +public void setId(int
[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22655 Thanks @viirya ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225751513 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.List; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.ArrayType; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +public class JavaBeanWithArraySuite { + +private static final List RECORDS = new ArrayList<>(); + +static { +RECORDS.add(new Record(1, +Arrays.asList(new Interval(111, 211), new Interval(121, 221)), +Arrays.asList(11, 21, 31, 41) +)); +RECORDS.add(new Record(2, +Arrays.asList(new Interval(112, 212), new Interval(122, 222)), +Arrays.asList(12, 22, 32, 42) +)); +RECORDS.add(new Record(3, +Arrays.asList(new Interval(113, 213), new Interval(123, 223)), +Arrays.asList(13, 23, 33, 43) +)); +} + +private TestSparkSession spark; + +@Before +public void setUp() { +spark = new TestSparkSession(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void testBeanWithArrayFieldsDeserialization() { + +StructType schema = createSchema(); +Encoder encoder = Encoders.bean(Record.class); + +Dataset dataset = spark +.read() +.format("json") +.schema(schema) +.load("src/test/resources/test-data/with-array-fields") +.as(encoder); + +List records = dataset.collectAsList(); + +Assert.assertTrue(Util.equals(records, RECORDS)); +} + +private static StructType createSchema() { +StructField[] intervalFields = { +new StructField("startTime", DataTypes.LongType, true, Metadata.empty()), +new StructField("endTime", DataTypes.LongType, true, Metadata.empty()) +}; +DataType intervalType = new StructType(intervalFields); + +DataType intervalsType = new ArrayType(intervalType, true); + +DataType valuesType = new ArrayType(DataTypes.IntegerType, true); + +StructField[] fields = { +new StructField("id", DataTypes.IntegerType, true, Metadata.empty()), +new StructField("intervals", intervalsType, true, Metadata.empty()), +new StructField("values", valuesType, true, Metadata.empty()) +}; +return new StructType(fields); +} + +public static class Record { + +private int id; +private List intervals; +private List values; + +public Record() { } + +Record(int id, List intervals, List values) { +this.id = id; +this.intervals = intervals; +this.values = values; +} + +public int getId() { +return id; +} + +public void setId(int
[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225751459 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.List; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.ArrayType; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +public class JavaBeanWithArraySuite { + +private static final List RECORDS = new ArrayList<>(); + +static { +RECORDS.add(new Record(1, +Arrays.asList(new Interval(111, 211), new Interval(121, 221)), +Arrays.asList(11, 21, 31, 41) +)); +RECORDS.add(new Record(2, +Arrays.asList(new Interval(112, 212), new Interval(122, 222)), +Arrays.asList(12, 22, 32, 42) +)); +RECORDS.add(new Record(3, +Arrays.asList(new Interval(113, 213), new Interval(123, 223)), +Arrays.asList(13, 23, 33, 43) +)); +} + +private TestSparkSession spark; + +@Before +public void setUp() { +spark = new TestSparkSession(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void testBeanWithArrayFieldsDeserialization() { + +StructType schema = createSchema(); +Encoder encoder = Encoders.bean(Record.class); + +Dataset dataset = spark +.read() +.format("json") +.schema(schema) +.load("src/test/resources/test-data/with-array-fields") +.as(encoder); + +List records = dataset.collectAsList(); + +Assert.assertTrue(Util.equals(records, RECORDS)); +} + +private static StructType createSchema() { +StructField[] intervalFields = { +new StructField("startTime", DataTypes.LongType, true, Metadata.empty()), +new StructField("endTime", DataTypes.LongType, true, Metadata.empty()) +}; +DataType intervalType = new StructType(intervalFields); + +DataType intervalsType = new ArrayType(intervalType, true); + +DataType valuesType = new ArrayType(DataTypes.IntegerType, true); + +StructField[] fields = { +new StructField("id", DataTypes.IntegerType, true, Metadata.empty()), +new StructField("intervals", intervalsType, true, Metadata.empty()), +new StructField("values", valuesType, true, Metadata.empty()) +}; +return new StructType(fields); +} + +public static class Record { + +private int id; +private List intervals; +private List values; + +public Record() { } + +Record(int id, List intervals, List values) { +this.id = id; +this.intervals = intervals; +this.values = values; +} + +public int getId() { +return id; +} + +public void setId(int
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22263 **[Test build #97476 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97476/testReport)** for PR 22263 at commit [`5e088b8`](https://github.com/apache/spark/commit/5e088b86822dd6b1bf4c3bb085fde3c96af03658). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22482 **[Test build #97475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97475/testReport)** for PR 22482 at commit [`5c74609`](https://github.com/apache/spark/commit/5c746090a8d5560f043754383656d54653a315dc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22263 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4048/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22263 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22729: [SPARK-25737][CORE] Remove JavaSparkContextVarargsWorkar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22729 **[Test build #4380 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4380/testReport)** for PR 22729 at commit [`0860d27`](https://github.com/apache/spark/commit/0860d27a205d3dd3d94e6bbe2c9db49b7e432ef4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225749733 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.List; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.ArrayType; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +public class JavaBeanWithArraySuite { + +private static final List RECORDS = new ArrayList<>(); + +static { +RECORDS.add(new Record(1, +Arrays.asList(new Interval(111, 211), new Interval(121, 221)), +Arrays.asList(11, 21, 31, 41) +)); +RECORDS.add(new Record(2, +Arrays.asList(new Interval(112, 212), new Interval(122, 222)), +Arrays.asList(12, 22, 32, 42) +)); +RECORDS.add(new Record(3, +Arrays.asList(new Interval(113, 213), new Interval(123, 223)), +Arrays.asList(13, 23, 33, 43) +)); +} + +private TestSparkSession spark; + +@Before +public void setUp() { +spark = new TestSparkSession(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void testBeanWithArrayFieldsDeserialization() { + +StructType schema = createSchema(); +Encoder encoder = Encoders.bean(Record.class); + +Dataset dataset = spark +.read() +.format("json") +.schema(schema) --- End diff -- @vofque Please note the `startTime` and `endTime`. It should be case-sensitive. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225749174 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.List; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.ArrayType; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +public class JavaBeanWithArraySuite { + +private static final List RECORDS = new ArrayList<>(); + +static { +RECORDS.add(new Record(1, +Arrays.asList(new Interval(111, 211), new Interval(121, 221)), +Arrays.asList(11, 21, 31, 41) +)); +RECORDS.add(new Record(2, +Arrays.asList(new Interval(112, 212), new Interval(122, 222)), +Arrays.asList(12, 22, 32, 42) +)); +RECORDS.add(new Record(3, +Arrays.asList(new Interval(113, 213), new Interval(123, 223)), +Arrays.asList(13, 23, 33, 43) +)); +} + +private TestSparkSession spark; + +@Before +public void setUp() { +spark = new TestSparkSession(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void testBeanWithArrayFieldsDeserialization() { + +StructType schema = createSchema(); +Encoder encoder = Encoders.bean(Record.class); + +Dataset dataset = spark +.read() +.format("json") +.schema(schema) --- End diff -- I'm wondering if we can use the latest and neat approach in this PR. Then, we can remove `createSchema()` here. ```scala - .schema(schema) + .schema("id int, intervals array>, values array") ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22288 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22288 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97469/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22288 **[Test build #97469 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97469/testReport)** for PR 22288 at commit [`2c5a753`](https://github.com/apache/spark/commit/2c5a75354d36d08199b9805a7513a4ec4a546a27). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingListener...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22752 **[Test build #97474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97474/testReport)** for PR 22752 at commit [`a3f53c4`](https://github.com/apache/spark/commit/a3f53c41879e28d71d4dbd79d80a51e50d82ecee). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingListener...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22752 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingListener...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22752 add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingListener...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22752 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingListener...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22752 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingListener...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22752 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingL...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/22752 [SPARK-24787][CORE] Revert hsync in EventLoggingListener and make FsHistoryProvider to read lastBlockBeingWritten data for logs ## What changes were proposed in this pull request? `hsync` has been added as part of SPARK-19531 to get the latest data in the history sever ui, but that is causing the performance overhead and also leading to drop many history log events. `hsync` uses the force `FileChannel.force` to sync the data to the disk and happens for the data pipeline, it is costly operation and making the application to face overhead and drop the events. I think getting the latest data in history server can be done in different way (no impact to application while writing events), there is an api `DFSInputStream.getFileLength()` which gives the file length including the `lastBlockBeingWrittenLength`(different from `FileStatus.getLen()`), this api can be used when the file status length and previously cached length are equal to verify whether any new data has been written or not, if there is any update in data length then the history server can update the in progress history log. And also I made this change as configurable with the default value false, and can be enabled for history server if users want to see the updated data in ui. ## How was this patch tested? Added new test and verified manually, with the added conf `spark.history.fs.inProgressAbsoluteLengthCheck.enabled=true`, history server is reading the logs including the last block data which is being written and updating the Web UI with the latest data. You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-24787 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22752.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22752 commit a3f53c41879e28d71d4dbd79d80a51e50d82ecee Author: Devaraj K Date: 2018-10-16T23:50:20Z [SPARK-24787][CORE] Revert hsync in EventLoggingListener and make FsHistoryProvider to read lastBlockBeingWritten data for logs --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225740504 --- Diff: sql/core/src/test/resources/test-data/with-array-fields --- @@ -0,0 +1,3 @@ +{ "id": 1, "intervals": [{ "startTime": 111, "endTime": 211 }, { "startTime": 121, "endTime": 221 }], "values": [11, 21, 31, 41]} --- End diff -- Could you rename this to `with-array-fields.json`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22624: [SPARK-23781][CORE] Add base class for token renewal fun...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22624 There's stuff that I need to fix for the recent changes in the kubernetes code; also I'm going to do the work I meant to do for SPARK-25693 here, since it requires as much testing and isn't that much more code. So hang on a bit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22732#discussion_r225735714 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -81,11 +81,11 @@ case class UserDefinedFunction protected[sql] ( f, dataType, exprs.map(_.expr), + nullableTypes.map(_.map(!_)).getOrElse(exprs.map(_ => false)), --- End diff -- Yes that's right. There are a number of UDFs in MLlib, etc that have inputs of type "Any", which isn't great, but I wanted to work around rather than change them for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22670: [SPARK-25631][SPARK-25632][SQL][TEST] Improve the test r...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22670 @srowen Thank you very much. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22670: [SPARK-25631][SPARK-25632][SQL][TEST] Improve the...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22670 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22670: [SPARK-25631][SPARK-25632][SQL][TEST] Improve the test r...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22670 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22381: [SPARK-25394][CORE] Add an application status metrics so...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/22381 Thanx @vanzin! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22381: [SPARK-25394][CORE] Add an application status met...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22381 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org