[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22746 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22746 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97482/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22746 **[Test build #97482 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97482/testReport)** for PR 22746 at commit [`58115e5`](https://github.com/apache/spark/commit/58115e5a69670f45cf05d2026cb57abb595fe073). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22746 This is very cool! thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22746#discussion_r225797461 --- Diff: docs/_data/menu-sql.yaml --- @@ -0,0 +1,79 @@ +- text: Getting Started + url: sql-getting-started.html + subitems: +- text: "Starting Point: SparkSession" + url: sql-getting-started.html#starting-point-sparksession +- text: Creating DataFrames + url: sql-getting-started.html#creating-dataframes +- text: Untyped Dataset Operations --- End diff -- how about `Untyped Dataset Operations (DataFrame operations)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22750: [SPARK-25747][SQL] remove ColumnarBatchScan.needsUnsafeR...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22750 **[Test build #97483 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97483/testReport)** for PR 22750 at commit [`318762c`](https://github.com/apache/spark/commit/318762ce5107bc6bcfc717b2d648cba3b86080f0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22750: [SPARK-25747][SQL] remove ColumnarBatchScan.needsUnsafeR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22750 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4055/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22750: [SPARK-25747][SQL] remove ColumnarBatchScan.needsUnsafeR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22750 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22746 **[Test build #97482 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97482/testReport)** for PR 22746 at commit [`58115e5`](https://github.com/apache/spark/commit/58115e5a69670f45cf05d2026cb57abb595fe073). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22746 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22746 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4054/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22753: [SPARK-25754][DOC] Change CDN for MathJax
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22753 **[Test build #97481 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97481/testReport)** for PR 22753 at commit [`e700c82`](https://github.com/apache/spark/commit/e700c82338d3f0123629a77afc2fb5bd1ac466f8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22753: [SPARK-25754][DOC] Change CDN for MathJax
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22753 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/22746#discussion_r225794532 --- Diff: docs/sql-reference.md --- @@ -0,0 +1,641 @@ +--- +layout: global +title: Reference +displayTitle: Reference +--- + +* Table of contents +{:toc} + +## Data Types + +Spark SQL and DataFrames support the following data types: + +* Numeric types +- `ByteType`: Represents 1-byte signed integer numbers. --- End diff -- Thanks, done in 58115e5. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22753: [SPARK-25754][DOC] Change CDN for MathJax
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22753 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97481/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22742: [SPARK-25588][WIP] SchemaParseException: Can't redefine:...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22742 Hi @heuermh , I left some comments in JIRA yesterday. I tried the test case in branch-2.3(with tag v2.3.1 and v2.3.0), the case is still reproduced by running: ``` ./build/sbt "; clean; project sql; testOnly *Spark25588Suite" ``` Can you confirm that? I have also seen a similar issue in Parquet 1.10: https://jira.apache.org/jira/browse/PARQUET-1409 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/22746#discussion_r225794477 --- Diff: docs/sql-getting-started.md --- @@ -0,0 +1,369 @@ +--- +layout: global +title: Getting Started +displayTitle: Getting Started +--- + +* Table of contents +{:toc} + +## Starting Point: SparkSession + + + + +The entry point into all functionality in Spark is the [`SparkSession`](api/scala/index.html#org.apache.spark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder()`: + +{% include_example init_session scala/org/apache/spark/examples/sql/SparkSQLExample.scala %} + + + + +The entry point into all functionality in Spark is the [`SparkSession`](api/java/index.html#org.apache.spark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder()`: + +{% include_example init_session java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %} + + + + +The entry point into all functionality in Spark is the [`SparkSession`](api/python/pyspark.sql.html#pyspark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder`: + +{% include_example init_session python/sql/basic.py %} + + + + +The entry point into all functionality in Spark is the [`SparkSession`](api/R/sparkR.session.html) class. To initialize a basic `SparkSession`, just call `sparkR.session()`: + +{% include_example init_session r/RSparkSQLExample.R %} + +Note that when invoked for the first time, `sparkR.session()` initializes a global `SparkSession` singleton instance, and always returns a reference to this instance for successive invocations. In this way, users only need to initialize the `SparkSession` once, then SparkR functions like `read.df` will be able to access this global instance implicitly, and users don't need to pass the `SparkSession` instance around. + + + +`SparkSession` in Spark 2.0 provides builtin support for Hive features including the ability to +write queries using HiveQL, access to Hive UDFs, and the ability to read data from Hive tables. +To use these features, you do not need to have an existing Hive setup. + +## Creating DataFrames + + + +With a `SparkSession`, applications can create DataFrames from an [existing `RDD`](#interoperating-with-rdds), +from a Hive table, or from [Spark data sources](#data-sources). --- End diff -- Done in 58115e5, also fix link in ml-pipeline.md\sparkr.md\structured-streaming-programming-guide.md --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22753: [SPARK-25754][DOC] Change CDN for MathJax
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22753 @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22753: [SPARK-25754][DOC] Change CDN for MathJax
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22753 **[Test build #97481 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97481/testReport)** for PR 22753 at commit [`e700c82`](https://github.com/apache/spark/commit/e700c82338d3f0123629a77afc2fb5bd1ac466f8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22753: [SPARK-25754][DOC] Change CDN for MathJax
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22753 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4053/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22753: [SPARK-25754][DOC] Change CDN for MathJax
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22753 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22753: [SPARK-25754][DOC] Change CDN for MathJax
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/22753 [SPARK-25754][DOC] Change CDN for MathJax ## What changes were proposed in this pull request? Currently when we open our doc site: https://spark.apache.org/docs/latest/index.html , there is one warning ![image](https://user-images.githubusercontent.com/1097932/47065926-2b757980-d217-11e8-868f-02ce73f513ae.png) This PR is to change the CDN as per the migration tips: https://www.mathjax.org/cdn-shutting-down/ This is very very trivial. But it would be good to follow the suggestion from MathJax team and remove the warning, in case one day the original CDN is no longer available. ## How was this patch tested? Manual check. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark migrateMathJax Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22753.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22753 commit e700c82338d3f0123629a77afc2fb5bd1ac466f8 Author: Gengliang Wang Date: 2018-10-17T06:08:44Z change cdn for MathJax --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97478/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 **[Test build #97478 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97478/testReport)** for PR 22608 at commit [`4c9b886`](https://github.com/apache/spark/commit/4c9b886c1f23bbdd3d8e1ec7df25f03e45892d88). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/22746#discussion_r225789933 --- Diff: docs/sql-getting-started.md --- @@ -0,0 +1,369 @@ +--- +layout: global +title: Getting Started +displayTitle: Getting Started +--- + +* Table of contents +{:toc} + +## Starting Point: SparkSession + + + + +The entry point into all functionality in Spark is the [`SparkSession`](api/scala/index.html#org.apache.spark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder()`: + +{% include_example init_session scala/org/apache/spark/examples/sql/SparkSQLExample.scala %} + + + + +The entry point into all functionality in Spark is the [`SparkSession`](api/java/index.html#org.apache.spark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder()`: + +{% include_example init_session java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %} + + + + +The entry point into all functionality in Spark is the [`SparkSession`](api/python/pyspark.sql.html#pyspark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder`: + +{% include_example init_session python/sql/basic.py %} + + + + +The entry point into all functionality in Spark is the [`SparkSession`](api/R/sparkR.session.html) class. To initialize a basic `SparkSession`, just call `sparkR.session()`: + +{% include_example init_session r/RSparkSQLExample.R %} + +Note that when invoked for the first time, `sparkR.session()` initializes a global `SparkSession` singleton instance, and always returns a reference to this instance for successive invocations. In this way, users only need to initialize the `SparkSession` once, then SparkR functions like `read.df` will be able to access this global instance implicitly, and users don't need to pass the `SparkSession` instance around. + + + +`SparkSession` in Spark 2.0 provides builtin support for Hive features including the ability to +write queries using HiveQL, access to Hive UDFs, and the ability to read data from Hive tables. +To use these features, you do not need to have an existing Hive setup. + +## Creating DataFrames + + + +With a `SparkSession`, applications can create DataFrames from an [existing `RDD`](#interoperating-with-rdds), +from a Hive table, or from [Spark data sources](#data-sources). --- End diff -- Sorry for the missing, will check all inner link by `
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97477/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 **[Test build #97477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97477/testReport)** for PR 22608 at commit [`5d270f1`](https://github.com/apache/spark/commit/5d270f17dccbb2eac6d3c2ab8c12987e3d992086). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20433 @maropu Thanks! This is great to make our Spark SQL parser fully compatible with ANSI SQL. Please continue the efforts! cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20433#discussion_r225784123 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -335,6 +335,12 @@ object SQLConf { .booleanConf .createWithDefault(true) + val ANSI_SQL_PARSER = +buildConf("spark.sql.parser.ansi.enabled") + .doc("When true, tries to conform to ANSI SQL syntax.") + .booleanConf + .createWithDefault(false) --- End diff -- Since the next is the 3.0 release, we will turn this on by default. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20433#discussion_r225783980 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -335,6 +335,12 @@ object SQLConf { .booleanConf .createWithDefault(true) + val ANSI_SQL_PARSER = --- End diff -- The legacy flag will be removed in 3.0 release. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22746#discussion_r225783658 --- Diff: docs/sql-reference.md --- @@ -0,0 +1,641 @@ +--- +layout: global +title: Reference +displayTitle: Reference +--- + +* Table of contents +{:toc} + +## Data Types + +Spark SQL and DataFrames support the following data types: + +* Numeric types +- `ByteType`: Represents 1-byte signed integer numbers. --- End diff -- nit: use 2 space indent. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4052/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22749 **[Test build #97480 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97480/testReport)** for PR 22749 at commit [`25a6162`](https://github.com/apache/spark/commit/25a616286075ca4f0a7d528095b387172b05c6c3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22219: [SPARK-25224][SQL] Improvement of Spark SQL ThriftServer...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22219 cc @srinathshankar @yuchenhuo --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22746#discussion_r225780740 --- Diff: docs/sql-getting-started.md --- @@ -0,0 +1,369 @@ +--- +layout: global +title: Getting Started +displayTitle: Getting Started +--- + +* Table of contents +{:toc} + +## Starting Point: SparkSession + + + + +The entry point into all functionality in Spark is the [`SparkSession`](api/scala/index.html#org.apache.spark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder()`: + +{% include_example init_session scala/org/apache/spark/examples/sql/SparkSQLExample.scala %} + + + + +The entry point into all functionality in Spark is the [`SparkSession`](api/java/index.html#org.apache.spark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder()`: + +{% include_example init_session java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %} + + + + +The entry point into all functionality in Spark is the [`SparkSession`](api/python/pyspark.sql.html#pyspark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder`: + +{% include_example init_session python/sql/basic.py %} + + + + +The entry point into all functionality in Spark is the [`SparkSession`](api/R/sparkR.session.html) class. To initialize a basic `SparkSession`, just call `sparkR.session()`: + +{% include_example init_session r/RSparkSQLExample.R %} + +Note that when invoked for the first time, `sparkR.session()` initializes a global `SparkSession` singleton instance, and always returns a reference to this instance for successive invocations. In this way, users only need to initialize the `SparkSession` once, then SparkR functions like `read.df` will be able to access this global instance implicitly, and users don't need to pass the `SparkSession` instance around. + + + +`SparkSession` in Spark 2.0 provides builtin support for Hive features including the ability to +write queries using HiveQL, access to Hive UDFs, and the ability to read data from Hive tables. +To use these features, you do not need to have an existing Hive setup. + +## Creating DataFrames + + + +With a `SparkSession`, applications can create DataFrames from an [existing `RDD`](#interoperating-with-rdds), +from a Hive table, or from [Spark data sources](#data-sources). --- End diff -- The link `[Spark data sources](#data-sources)` does not work after this change. Could you fix all the similar cases? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22694: [SQL][CATALYST][MINOR] update some error comments
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22694 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22694: [SQL][CATALYST][MINOR] update some error comments
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22694 Merged to master and branch-2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22503 @justinuang, okay. Mind rebasing this please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22263 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22263 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97476/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22263 **[Test build #97476 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97476/testReport)** for PR 22263 at commit [`5e088b8`](https://github.com/apache/spark/commit/5e088b86822dd6b1bf4c3bb085fde3c96af03658). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22295 @huaxingao, thanks for addressing comments. Would you mind rebasing it and resolving the conflicts? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingListener...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22752 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97474/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingListener...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22752 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingListener...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22752 **[Test build #97474 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97474/testReport)** for PR 22752 at commit [`a3f53c4`](https://github.com/apache/spark/commit/a3f53c41879e28d71d4dbd79d80a51e50d82ecee). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22482 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22482 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97475/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22482 **[Test build #97475 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97475/testReport)** for PR 22482 at commit [`5c74609`](https://github.com/apache/spark/commit/5c746090a8d5560f043754383656d54653a315dc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22729: [SPARK-25737][CORE] Remove JavaSparkContextVarargsWorkar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22729 **[Test build #4380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4380/testReport)** for PR 22729 at commit [`0860d27`](https://github.com/apache/spark/commit/0860d27a205d3dd3d94e6bbe2c9db49b7e432ef4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22749 **[Test build #97479 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97479/testReport)** for PR 22749 at commit [`6a6fa45`](https://github.com/apache/spark/commit/6a6fa454e22728cc2ad8e5515cd587fe0be84b26). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97479/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225769471 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.List; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.ArrayType; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +public class JavaBeanWithArraySuite { + +private static final List RECORDS = new ArrayList<>(); + +static { +RECORDS.add(new Record(1, +Arrays.asList(new Interval(111, 211), new Interval(121, 221)), +Arrays.asList(11, 21, 31, 41) +)); +RECORDS.add(new Record(2, +Arrays.asList(new Interval(112, 212), new Interval(122, 222)), +Arrays.asList(12, 22, 32, 42) +)); +RECORDS.add(new Record(3, +Arrays.asList(new Interval(113, 213), new Interval(123, 223)), +Arrays.asList(13, 23, 33, 43) +)); +} + +private TestSparkSession spark; + +@Before +public void setUp() { +spark = new TestSparkSession(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void testBeanWithArrayFieldsDeserialization() { + +StructType schema = createSchema(); +Encoder encoder = Encoders.bean(Record.class); + +Dataset dataset = spark +.read() +.format("json") +.schema(schema) +.load("src/test/resources/test-data/with-array-fields") +.as(encoder); + +List records = dataset.collectAsList(); + +Assert.assertTrue(Util.equals(records, RECORDS)); +} + +private static StructType createSchema() { +StructField[] intervalFields = { +new StructField("startTime", DataTypes.LongType, true, Metadata.empty()), +new StructField("endTime", DataTypes.LongType, true, Metadata.empty()) +}; +DataType intervalType = new StructType(intervalFields); + +DataType intervalsType = new ArrayType(intervalType, true); + +DataType valuesType = new ArrayType(DataTypes.IntegerType, true); + +StructField[] fields = { +new StructField("id", DataTypes.IntegerType, true, Metadata.empty()), +new StructField("intervals", intervalsType, true, Metadata.empty()), +new StructField("values", valuesType, true, Metadata.empty()) +}; +return new StructType(fields); +} + +public static class Record { + +private int id; +private List intervals; +private List values; + +public Record() { } + +Record(int id, List intervals, List values) { +this.id = id; +this.intervals = intervals; +this.values = values; +} + +public int getId() { +return id; +} + +public void setId(int id) {
[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225768857 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.List; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.ArrayType; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +public class JavaBeanWithArraySuite { + +private static final List RECORDS = new ArrayList<>(); + +static { +RECORDS.add(new Record(1, +Arrays.asList(new Interval(111, 211), new Interval(121, 221)), +Arrays.asList(11, 21, 31, 41) +)); +RECORDS.add(new Record(2, +Arrays.asList(new Interval(112, 212), new Interval(122, 222)), +Arrays.asList(12, 22, 32, 42) +)); +RECORDS.add(new Record(3, +Arrays.asList(new Interval(113, 213), new Interval(123, 223)), +Arrays.asList(13, 23, 33, 43) +)); +} + +private TestSparkSession spark; + +@Before +public void setUp() { +spark = new TestSparkSession(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void testBeanWithArrayFieldsDeserialization() { + +StructType schema = createSchema(); +Encoder encoder = Encoders.bean(Record.class); + +Dataset dataset = spark +.read() +.format("json") +.schema(schema) +.load("src/test/resources/test-data/with-array-fields") +.as(encoder); + +List records = dataset.collectAsList(); + +Assert.assertTrue(Util.equals(records, RECORDS)); +} + +private static StructType createSchema() { +StructField[] intervalFields = { +new StructField("startTime", DataTypes.LongType, true, Metadata.empty()), +new StructField("endTime", DataTypes.LongType, true, Metadata.empty()) +}; +DataType intervalType = new StructType(intervalFields); + +DataType intervalsType = new ArrayType(intervalType, true); + +DataType valuesType = new ArrayType(DataTypes.IntegerType, true); + +StructField[] fields = { +new StructField("id", DataTypes.IntegerType, true, Metadata.empty()), +new StructField("intervals", intervalsType, true, Metadata.empty()), +new StructField("values", valuesType, true, Metadata.empty()) +}; +return new StructType(fields); +} + +public static class Record { + +private int id; +private List intervals; +private List values; --- End diff -- Will this list of int affect the test? If no, maybe we can get rid of it to simplify the test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22745: [SPARK-21402][SQL][FOLLOW-UP] Fix java map of str...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22745#discussion_r225768707 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithMapSuite.java --- @@ -0,0 +1,257 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.MapType; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +public class JavaBeanWithMapSuite { + +private static final List RECORDS = new ArrayList<>(); + +static { +RECORDS.add(new Record(1, +toMap( +Arrays.asList("a", "b"), +Arrays.asList(new Interval(111, 211), new Interval(121, 221)) +), +toMap(Arrays.asList("a", "b", "c"), Arrays.asList(11, 21, 31)) +)); +RECORDS.add(new Record(2, +toMap( +Arrays.asList("a", "b"), +Arrays.asList(new Interval(112, 212), new Interval(122, 222)) +), +toMap(Arrays.asList("a", "b", "c"), Arrays.asList(12, 22, 32)) +)); +RECORDS.add(new Record(3, +toMap( +Arrays.asList("a", "b"), +Arrays.asList(new Interval(113, 213), new Interval(123, 223)) +), +toMap(Arrays.asList("a", "b", "c"), Arrays.asList(13, 23, 33)) +)); +} + +private static Map toMap(Collection keys, Collection values) { +Map map = new HashMap<>(); +Iterator keyI = keys.iterator(); +Iterator valueI = values.iterator(); +while (keyI.hasNext() && valueI.hasNext()) { +map.put(keyI.next(), valueI.next()); +} +return map; +} + +private TestSparkSession spark; + +@Before +public void setUp() { +spark = new TestSparkSession(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void testBeanWithMapFieldsDeserialization() { + +StructType schema = createSchema(); +Encoder encoder = Encoders.bean(Record.class); + +Dataset dataset = spark +.read() +.format("json") +.schema(schema) +.load("src/test/resources/test-data/with-map-fields") +.as(encoder); + +List records = dataset.collectAsList(); + +Assert.assertTrue(Util.equals(records, RECORDS)); +} + +private static StructType createSchema() { +StructField[] intervalFields = { +new StructField("startTime", DataTypes.LongType, true, Metadata.empty()), +new StructField("endTime", DataTypes.LongType, true, Metadata.empty()) +}; +DataType intervalType = new StructType(intervalFields); + +DataType intervalsType = new MapType(DataTypes.StringType, intervalType, true); + +DataType valuesType = new MapType(DataTypes.StringType, DataTypes.IntegerT
[GitHub] spark pull request #22724: [SPARK-25734][SQL] Literal should have a value co...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22724 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22724: [SPARK-25734][SQL] Literal should have a value correspon...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22724 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225767103 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.List; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.ArrayType; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +public class JavaBeanWithArraySuite { + +private static final List RECORDS = new ArrayList<>(); + +static { +RECORDS.add(new Record(1, +Arrays.asList(new Interval(111, 211), new Interval(121, 221)), +Arrays.asList(11, 21, 31, 41) +)); +RECORDS.add(new Record(2, +Arrays.asList(new Interval(112, 212), new Interval(122, 222)), +Arrays.asList(12, 22, 32, 42) +)); +RECORDS.add(new Record(3, +Arrays.asList(new Interval(113, 213), new Interval(123, 223)), +Arrays.asList(13, 23, 33, 43) +)); +} + +private TestSparkSession spark; + +@Before +public void setUp() { +spark = new TestSparkSession(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void testBeanWithArrayFieldsDeserialization() { + +StructType schema = createSchema(); +Encoder encoder = Encoders.bean(Record.class); + +Dataset dataset = spark +.read() +.format("json") +.schema(schema) +.load("src/test/resources/test-data/with-array-fields") +.as(encoder); + +List records = dataset.collectAsList(); + +Assert.assertTrue(Util.equals(records, RECORDS)); +} + +private static StructType createSchema() { +StructField[] intervalFields = { +new StructField("startTime", DataTypes.LongType, true, Metadata.empty()), +new StructField("endTime", DataTypes.LongType, true, Metadata.empty()) +}; +DataType intervalType = new StructType(intervalFields); + +DataType intervalsType = new ArrayType(intervalType, true); + +DataType valuesType = new ArrayType(DataTypes.IntegerType, true); + +StructField[] fields = { +new StructField("id", DataTypes.IntegerType, true, Metadata.empty()), +new StructField("intervals", intervalsType, true, Metadata.empty()), +new StructField("values", valuesType, true, Metadata.empty()) +}; +return new StructType(fields); +} + +public static class Record { + +private int id; +private List intervals; +private List values; + +public Record() { } + +Record(int id, List intervals, List values) { +this.id = id; +this.intervals = intervals; +this.values = values; +} + +public int getId() { +return id; +} + +public void setId(int id) {
[GitHub] spark issue #22745: [SPARK-21402][SQL][FOLLOW-UP] Fix java map of structs de...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22745 It's a different issue, I think it worth a new ticket --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22732#discussion_r225764876 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -81,11 +81,11 @@ case class UserDefinedFunction protected[sql] ( f, dataType, exprs.map(_.expr), + nullableTypes.map(_.map(!_)).getOrElse(exprs.map(_ => false)), --- End diff -- Hm, but we can't use getParameterTypes anymore. It won't work in Scala 2.12. Where the nullability info is definitely not available, be conservative and assume it all needs null handling? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22746 @gatorsmile Sorry for the late on this, please have a look when you have time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22749 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4051/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4050/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4050/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/22732#discussion_r225762708 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -81,11 +81,11 @@ case class UserDefinedFunction protected[sql] ( f, dataType, exprs.map(_.expr), + nullableTypes.map(_.map(!_)).getOrElse(exprs.map(_ => false)), --- End diff -- In addition to what I just pointed out, which is when we did try to get `inputSchemas` through `ScalaReflection.schemaFor` and got an exception for unrecognized types, there's another case where we could get an unspecified `nullableTypes`, and that is when `UserDefinedFunction` is instantiated calling the constructor but not the `create` method. Then I assume it's created by an earlier version, and we should use the old logic, i.e., `ScalaReflection.getParameterTypes` (https://github.com/apache/spark/pull/22259/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2L2153) to get the correct information for `nullableTypes`. Is that right, @cloud-fan @srowen ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22749 **[Test build #97479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97479/testReport)** for PR 22749 at commit [`6a6fa45`](https://github.com/apache/spark/commit/6a6fa454e22728cc2ad8e5515cd587fe0be84b26). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4050/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225762148 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging { SparkSession.clearDefaultSession() } } + + /** + * Initialize extensions if the user has defined a configurator class in their SparkConf. + * This class will be applied to the extensions passed into this function. + */ + private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: SparkSessionExtensions) { --- End diff -- Oh, I see, moving to the default constructor was not a good idea. How about the first suggestion? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22263#discussion_r225762035 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext } } + test("SQL interface support storageLevel(DISK_ONLY)") { --- End diff -- How about this: ```scala Seq("LAZY", "").foreach { isLazy => Seq(true, false).foreach { withInvalidOptions => Seq(true, false).foreach { withCacheTempView => Map("DISK_ONLY" -> Disk, "MEMORY_ONLY" -> Memory).foreach { case (storageLevel, dataReadMethod) => val testName = s"SQL interface support option: storageLevel: $storageLevel, " + s"isLazy: ${isLazy.equals("LAZY")}, " + s"withInvalidOptions: $withInvalidOptions, withCacheTempView: $withCacheTempView" val cacheOption = if (withInvalidOptions) { s"OPTIONS('storageLevel' '$storageLevel', 'a' '1', 'b' '2')" } else { s"OPTIONS('storageLevel' '$storageLevel')" } test(testName) { if (withCacheTempView) { withTempView("testSelect") { sql(s"CACHE $isLazy TABLE testSelect $cacheOption SELECT * FROM testData") assertCached(spark.table("testSelect")) val rddId = rddIdOf("testSelect") if (isLazy.equals("LAZY")) { sql("SELECT COUNT(*) FROM testSelect").collect() } assert(isExpectStorageLevel(rddId, dataReadMethod)) } } else { sql(s"CACHE $isLazy TABLE testData $cacheOption") assertCached(spark.table("testData")) val rddId = rddIdOf("testData") if (isLazy.equals("LAZY")) { sql("SELECT COUNT(*) FROM testData").collect() } assert(isExpectStorageLevel(rddId, dataReadMethod)) } } } } } } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21588 @rxin and @gatorsmile, WDYT? I already had to argue about Hadoop 3 support here and there (for instance see [SPARK-18112|https://issues.apache.org/jira/browse/SPARK-18112] and [SPARK-18673|https://issues.apache.org/jira/browse/SPARK-18673]), and explain what's going on. Looks ideally we should go ahead 2. (https://github.com/apache/spark/pull/21588#issuecomment-429272279) if I am not mistaken. If there are some more concerns we should address before going ahead, definitely I am willing to help investigating as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...
Github user rezasafi commented on the issue: https://github.com/apache/spark/pull/22612 Jenkins retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 **[Test build #97478 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97478/testReport)** for PR 22608 at commit [`4c9b886`](https://github.com/apache/spark/commit/4c9b886c1f23bbdd3d8e1ec7df25f03e45892d88). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4049/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4049/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4049/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22707#discussion_r225759293 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -227,18 +227,22 @@ case class InsertIntoHiveTable( // Newer Hive largely improves insert overwrite performance. As Spark uses older Hive // version and we may not want to catch up new Hive version every time. We delete the // Hive partition first and then load data file into the Hive partition. - if (oldPart.nonEmpty && overwrite) { -oldPart.get.storage.locationUri.foreach { uri => - val partitionPath = new Path(uri) - val fs = partitionPath.getFileSystem(hadoopConf) - if (fs.exists(partitionPath)) { -if (!fs.delete(partitionPath, true)) { - throw new RuntimeException( -"Cannot remove partition directory '" + partitionPath.toString) -} -// Don't let Hive do overwrite operation since it is slower. -doHiveOverwrite = false + if (overwrite) { +val oldPartitionPath = oldPart.flatMap(_.storage.locationUri.map(new Path(_))) + .getOrElse { +ExternalCatalogUtils.generatePartitionPath( + partitionSpec, + partitionColumnNames, + HiveClientImpl.toHiveTable(table).getDataLocation) --- End diff -- Looks correct as I saw we assign `CatalogTable.storage.locationUr` to HiveTable's data location. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 **[Test build #97477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97477/testReport)** for PR 22608 at commit [`5d270f1`](https://github.com/apache/spark/commit/5d270f17dccbb2eac6d3c2ab8c12987e3d992086). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22379 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21588 Thanks @HyukjinKwon Upgrade Hive to 2.3.2 can fix [SPARK-12014](https://issues.apache.org/jira/browse/SPARK-12014), [SPARK-18673](https://issues.apache.org/jira/browse/SPARK-18673), [SPARK-24766](https://issues.apache.org/jira/browse/SPARK-24766) and [SPARK-25193](https://issues.apache.org/jira/browse/SPARK-25193). Also, can improve the performance of the [SPARK-18107](https://issues.apache.org/jira/browse/SPARK-18107). Seems it doesnât break backward compatibility. I have verified it in our production environment (Hive 1.2.1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22666 Woah .. let me resolve the conflicts tonight. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22379 Thanks all! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22379 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...
Github user fjh100456 commented on a diff in the pull request: https://github.com/apache/spark/pull/22707#discussion_r225756219 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -227,18 +227,22 @@ case class InsertIntoHiveTable( // Newer Hive largely improves insert overwrite performance. As Spark uses older Hive // version and we may not want to catch up new Hive version every time. We delete the // Hive partition first and then load data file into the Hive partition. - if (oldPart.nonEmpty && overwrite) { -oldPart.get.storage.locationUri.foreach { uri => - val partitionPath = new Path(uri) - val fs = partitionPath.getFileSystem(hadoopConf) - if (fs.exists(partitionPath)) { -if (!fs.delete(partitionPath, true)) { - throw new RuntimeException( -"Cannot remove partition directory '" + partitionPath.toString) -} -// Don't let Hive do overwrite operation since it is slower. -doHiveOverwrite = false + if (overwrite) { +val oldPartitionPath = oldPart.flatMap(_.storage.locationUri.map(new Path(_))) + .getOrElse { +ExternalCatalogUtils.generatePartitionPath( + partitionSpec, + partitionColumnNames, + HiveClientImpl.toHiveTable(table).getDataLocation) --- End diff -- > > > `HiveClientImpl.toHiveTable(table).getDataLocation` -> `new Path(table.location)`? Yesï¼ they get the same value. I'll change it , thank you very much. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22748: [SPARK-25745][K8S] Improve docker-image-tool.sh script
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/22748 There seems to be overlapping logic between this PR and https://github.com/apache/spark/pull/22681 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21990 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21990 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97472/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21990 **[Test build #97472 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97472/testReport)** for PR 21990 at commit [`d9b2a55`](https://github.com/apache/spark/commit/d9b2a55275b74c406d9f9c435bf1b53a6ef4b35a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22745: [SPARK-21402][SQL][FOLLOW-UP] Fix java map of structs de...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22745 Is this a separate PR because this part is pretty separable, and you think could be considered separately? if it's all part of one logical change that should go in together or not at all, they can be in the original PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/22598#discussion_r225752604 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/TokenUtil.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.kafka010 + +import java.text.SimpleDateFormat +import java.util.Properties + +import org.apache.hadoop.io.Text +import org.apache.hadoop.security.token.{Token, TokenIdentifier} +import org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier +import org.apache.kafka.clients.CommonClientConfigs +import org.apache.kafka.clients.admin.{AdminClient, CreateDelegationTokenOptions} +import org.apache.kafka.common.config.SaslConfigs +import org.apache.kafka.common.security.token.delegation.DelegationToken + +import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ + +private[kafka010] object TokenUtil extends Logging { + private[kafka010] val TOKEN_KIND = new Text("KAFKA_DELEGATION_TOKEN") + private[kafka010] val TOKEN_SERVICE = new Text("kafka.server.delegation.token") + + private[kafka010] class KafkaDelegationTokenIdentifier extends AbstractDelegationTokenIdentifier { +override def getKind: Text = TOKEN_KIND; + } + + private def printToken(token: DelegationToken): Unit = { +if (log.isDebugEnabled) { + val dateFormat = new SimpleDateFormat("-MM-dd'T'HH:mm") + logDebug("%-15s %-30s %-15s %-25s %-15s %-15s %-15s".format( +"TOKENID", "HMAC", "OWNER", "RENEWERS", "ISSUEDATE", "EXPIRYDATE", "MAXDATE")) + val tokenInfo = token.tokenInfo + logDebug("%-15s [hidden] %-15s %-25s %-15s %-15s %-15s".format( +tokenInfo.tokenId, +tokenInfo.owner, +tokenInfo.renewersAsString, +dateFormat.format(tokenInfo.issueTimestamp), +dateFormat.format(tokenInfo.expiryTimestamp), +dateFormat.format(tokenInfo.maxTimestamp))) +} + } + + private[kafka010] def createAdminClientProperties(sparkConf: SparkConf): Properties = { +val adminClientProperties = new Properties + +val bootstrapServers = sparkConf.get(KAFKA_BOOTSTRAP_SERVERS) +require(bootstrapServers.nonEmpty, s"Tried to obtain kafka delegation token but bootstrap " + + "servers not configured.") + adminClientProperties.put(CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers.get) + +val protocol = sparkConf.get(KAFKA_SECURITY_PROTOCOL) + adminClientProperties.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, protocol) +if (protocol.endsWith("SSL")) { + logInfo("SSL protocol detected.") + sparkConf.get(KAFKA_TRUSTSTORE_LOCATION).foreach { truststoreLocation => +adminClientProperties.put("ssl.truststore.location", truststoreLocation) + } + sparkConf.get(KAFKA_TRUSTSTORE_PASSWORD).foreach { truststorePassword => +adminClientProperties.put("ssl.truststore.password", truststorePassword) + } +} else { + logWarning("Obtaining kafka delegation token through plain communication channel. Please " + +"consider the security impact.") +} + +// There are multiple possibilities to log in: +// - Keytab is provided -> try to log in with kerberos module using kafka's dynamic JAAS +// configuration. +// - Keytab not provided -> try to log in with JVM global security configuration +// which can be configured for example with 'java.security.auth.login.config'. +// For this no additional parameter needed. +KafkaSecurityHelper.getKeytabJaasParams(sparkConf).foreach { jaasParams => + logInfo("Keytab detected, using it for login.") + adminClientProperties.put(SaslConfigs.SASL_MECHANISM, SaslConfigs.GSSAPI_MECHANISM) +
[GitHub] spark issue #22725: [SPARK-24610][[CORE][FOLLOW-UP]fix reading small files v...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/22725 @tgravescs ok, I will do it ,thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225752208 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.List; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.ArrayType; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; --- End diff -- If we remove `createSchema`, we can remove Line 35 ~ 40 , too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225751969 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.List; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.ArrayType; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +public class JavaBeanWithArraySuite { + +private static final List RECORDS = new ArrayList<>(); + +static { +RECORDS.add(new Record(1, +Arrays.asList(new Interval(111, 211), new Interval(121, 221)), +Arrays.asList(11, 21, 31, 41) +)); +RECORDS.add(new Record(2, +Arrays.asList(new Interval(112, 212), new Interval(122, 222)), +Arrays.asList(12, 22, 32, 42) +)); +RECORDS.add(new Record(3, +Arrays.asList(new Interval(113, 213), new Interval(123, 223)), +Arrays.asList(13, 23, 33, 43) +)); +} + +private TestSparkSession spark; + +@Before +public void setUp() { +spark = new TestSparkSession(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void testBeanWithArrayFieldsDeserialization() { + +StructType schema = createSchema(); +Encoder encoder = Encoders.bean(Record.class); + +Dataset dataset = spark +.read() +.format("json") +.schema(schema) +.load("src/test/resources/test-data/with-array-fields") +.as(encoder); + +List records = dataset.collectAsList(); + +Assert.assertTrue(Util.equals(records, RECORDS)); +} + +private static StructType createSchema() { +StructField[] intervalFields = { +new StructField("startTime", DataTypes.LongType, true, Metadata.empty()), +new StructField("endTime", DataTypes.LongType, true, Metadata.empty()) +}; +DataType intervalType = new StructType(intervalFields); + +DataType intervalsType = new ArrayType(intervalType, true); + +DataType valuesType = new ArrayType(DataTypes.IntegerType, true); + +StructField[] fields = { +new StructField("id", DataTypes.IntegerType, true, Metadata.empty()), +new StructField("intervals", intervalsType, true, Metadata.empty()), +new StructField("values", valuesType, true, Metadata.empty()) +}; +return new StructType(fields); +} + +public static class Record { + +private int id; +private List intervals; +private List values; + +public Record() { } + +Record(int id, List intervals, List values) { +this.id = id; +this.intervals = intervals; +this.values = values; +} + +public int getId() { +return id; +} + +public void setId(int i
[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22655 Thanks @viirya ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225751513 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Iterator; +import java.util.List; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.test.TestSparkSession; +import org.apache.spark.sql.types.ArrayType; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +public class JavaBeanWithArraySuite { + +private static final List RECORDS = new ArrayList<>(); + +static { +RECORDS.add(new Record(1, +Arrays.asList(new Interval(111, 211), new Interval(121, 221)), +Arrays.asList(11, 21, 31, 41) +)); +RECORDS.add(new Record(2, +Arrays.asList(new Interval(112, 212), new Interval(122, 222)), +Arrays.asList(12, 22, 32, 42) +)); +RECORDS.add(new Record(3, +Arrays.asList(new Interval(113, 213), new Interval(123, 223)), +Arrays.asList(13, 23, 33, 43) +)); +} + +private TestSparkSession spark; + +@Before +public void setUp() { +spark = new TestSparkSession(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void testBeanWithArrayFieldsDeserialization() { + +StructType schema = createSchema(); +Encoder encoder = Encoders.bean(Record.class); + +Dataset dataset = spark +.read() +.format("json") +.schema(schema) +.load("src/test/resources/test-data/with-array-fields") +.as(encoder); + +List records = dataset.collectAsList(); + +Assert.assertTrue(Util.equals(records, RECORDS)); +} + +private static StructType createSchema() { +StructField[] intervalFields = { +new StructField("startTime", DataTypes.LongType, true, Metadata.empty()), +new StructField("endTime", DataTypes.LongType, true, Metadata.empty()) +}; +DataType intervalType = new StructType(intervalFields); + +DataType intervalsType = new ArrayType(intervalType, true); + +DataType valuesType = new ArrayType(DataTypes.IntegerType, true); + +StructField[] fields = { +new StructField("id", DataTypes.IntegerType, true, Metadata.empty()), +new StructField("intervals", intervalsType, true, Metadata.empty()), +new StructField("values", valuesType, true, Metadata.empty()) +}; +return new StructType(fields); +} + +public static class Record { + +private int id; +private List intervals; +private List values; + +public Record() { } + +Record(int id, List intervals, List values) { +this.id = id; +this.intervals = intervals; +this.values = values; +} + +public int getId() { +return id; +} + +public void setId(int i