[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22746
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22746
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97482/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22746
  
**[Test build #97482 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97482/testReport)**
 for PR 22746 at commit 
[`58115e5`](https://github.com/apache/spark/commit/58115e5a69670f45cf05d2026cb57abb595fe073).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...

2018-10-16 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22746
  
This is very cool! thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...

2018-10-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22746#discussion_r225797461
  
--- Diff: docs/_data/menu-sql.yaml ---
@@ -0,0 +1,79 @@
+- text: Getting Started
+  url: sql-getting-started.html
+  subitems:
+- text: "Starting Point: SparkSession"
+  url: sql-getting-started.html#starting-point-sparksession
+- text: Creating DataFrames
+  url: sql-getting-started.html#creating-dataframes
+- text: Untyped Dataset Operations
--- End diff --

how about `Untyped Dataset Operations (DataFrame operations)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22750: [SPARK-25747][SQL] remove ColumnarBatchScan.needsUnsafeR...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22750
  
**[Test build #97483 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97483/testReport)**
 for PR 22750 at commit 
[`318762c`](https://github.com/apache/spark/commit/318762ce5107bc6bcfc717b2d648cba3b86080f0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22750: [SPARK-25747][SQL] remove ColumnarBatchScan.needsUnsafeR...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22750
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4055/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22750: [SPARK-25747][SQL] remove ColumnarBatchScan.needsUnsafeR...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22750
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22746
  
**[Test build #97482 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97482/testReport)**
 for PR 22746 at commit 
[`58115e5`](https://github.com/apache/spark/commit/58115e5a69670f45cf05d2026cb57abb595fe073).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22746
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22746
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4054/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22753: [SPARK-25754][DOC] Change CDN for MathJax

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22753
  
**[Test build #97481 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97481/testReport)**
 for PR 22753 at commit 
[`e700c82`](https://github.com/apache/spark/commit/e700c82338d3f0123629a77afc2fb5bd1ac466f8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22753: [SPARK-25754][DOC] Change CDN for MathJax

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22753
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...

2018-10-16 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/22746#discussion_r225794532
  
--- Diff: docs/sql-reference.md ---
@@ -0,0 +1,641 @@
+---
+layout: global
+title: Reference
+displayTitle: Reference
+---
+
+* Table of contents
+{:toc}
+
+## Data Types
+
+Spark SQL and DataFrames support the following data types:
+
+* Numeric types
+- `ByteType`: Represents 1-byte signed integer numbers.
--- End diff --

Thanks, done in 58115e5.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22753: [SPARK-25754][DOC] Change CDN for MathJax

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97481/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22742: [SPARK-25588][WIP] SchemaParseException: Can't redefine:...

2018-10-16 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22742
  
Hi @heuermh ,

I left some comments in JIRA yesterday. I tried the test case in 
branch-2.3(with tag v2.3.1 and v2.3.0), the case is still reproduced by running:
```
./build/sbt "; clean; project sql; testOnly *Spark25588Suite"
```
Can you confirm that?

I have also seen a similar issue in Parquet 1.10: 
https://jira.apache.org/jira/browse/PARQUET-1409



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...

2018-10-16 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/22746#discussion_r225794477
  
--- Diff: docs/sql-getting-started.md ---
@@ -0,0 +1,369 @@
+---
+layout: global
+title: Getting Started
+displayTitle: Getting Started
+---
+
+* Table of contents
+{:toc}
+
+## Starting Point: SparkSession
+
+
+
+
+The entry point into all functionality in Spark is the 
[`SparkSession`](api/scala/index.html#org.apache.spark.sql.SparkSession) class. 
To create a basic `SparkSession`, just use `SparkSession.builder()`:
+
+{% include_example init_session 
scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
+
+
+
+
+The entry point into all functionality in Spark is the 
[`SparkSession`](api/java/index.html#org.apache.spark.sql.SparkSession) class. 
To create a basic `SparkSession`, just use `SparkSession.builder()`:
+
+{% include_example init_session 
java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}
+
+
+
+
+The entry point into all functionality in Spark is the 
[`SparkSession`](api/python/pyspark.sql.html#pyspark.sql.SparkSession) class. 
To create a basic `SparkSession`, just use `SparkSession.builder`:
+
+{% include_example init_session python/sql/basic.py %}
+
+
+
+
+The entry point into all functionality in Spark is the 
[`SparkSession`](api/R/sparkR.session.html) class. To initialize a basic 
`SparkSession`, just call `sparkR.session()`:
+
+{% include_example init_session r/RSparkSQLExample.R %}
+
+Note that when invoked for the first time, `sparkR.session()` initializes 
a global `SparkSession` singleton instance, and always returns a reference to 
this instance for successive invocations. In this way, users only need to 
initialize the `SparkSession` once, then SparkR functions like `read.df` will 
be able to access this global instance implicitly, and users don't need to pass 
the `SparkSession` instance around.
+
+
+
+`SparkSession` in Spark 2.0 provides builtin support for Hive features 
including the ability to
+write queries using HiveQL, access to Hive UDFs, and the ability to read 
data from Hive tables.
+To use these features, you do not need to have an existing Hive setup.
+
+## Creating DataFrames
+
+
+
+With a `SparkSession`, applications can create DataFrames from an 
[existing `RDD`](#interoperating-with-rdds),
+from a Hive table, or from [Spark data sources](#data-sources).
--- End diff --

Done in 58115e5, also fix link in 
ml-pipeline.md\sparkr.md\structured-streaming-programming-guide.md


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22753: [SPARK-25754][DOC] Change CDN for MathJax

2018-10-16 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22753
  
@srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22753: [SPARK-25754][DOC] Change CDN for MathJax

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22753
  
**[Test build #97481 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97481/testReport)**
 for PR 22753 at commit 
[`e700c82`](https://github.com/apache/spark/commit/e700c82338d3f0123629a77afc2fb5bd1ac466f8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22753: [SPARK-25754][DOC] Change CDN for MathJax

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4053/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22753: [SPARK-25754][DOC] Change CDN for MathJax

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22753
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22753: [SPARK-25754][DOC] Change CDN for MathJax

2018-10-16 Thread gengliangwang
GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/22753

[SPARK-25754][DOC] Change CDN for MathJax

## What changes were proposed in this pull request?

Currently when we open our doc site: 
https://spark.apache.org/docs/latest/index.html , there is one warning 

![image](https://user-images.githubusercontent.com/1097932/47065926-2b757980-d217-11e8-868f-02ce73f513ae.png)

This PR is to change the CDN as per the migration tips: 
https://www.mathjax.org/cdn-shutting-down/

This is very very trivial. But it would be good to follow the suggestion 
from MathJax team and remove the warning, in case one day the original CDN is 
no longer available.

## How was this patch tested?

Manual check.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark migrateMathJax

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22753.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22753


commit e700c82338d3f0123629a77afc2fb5bd1ac466f8
Author: Gengliang Wang 
Date:   2018-10-17T06:08:44Z

change cdn for MathJax




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22608
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97478/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22608
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22608
  
**[Test build #97478 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97478/testReport)**
 for PR 22608 at commit 
[`4c9b886`](https://github.com/apache/spark/commit/4c9b886c1f23bbdd3d8e1ec7df25f03e45892d88).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...

2018-10-16 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/22746#discussion_r225789933
  
--- Diff: docs/sql-getting-started.md ---
@@ -0,0 +1,369 @@
+---
+layout: global
+title: Getting Started
+displayTitle: Getting Started
+---
+
+* Table of contents
+{:toc}
+
+## Starting Point: SparkSession
+
+
+
+
+The entry point into all functionality in Spark is the 
[`SparkSession`](api/scala/index.html#org.apache.spark.sql.SparkSession) class. 
To create a basic `SparkSession`, just use `SparkSession.builder()`:
+
+{% include_example init_session 
scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
+
+
+
+
+The entry point into all functionality in Spark is the 
[`SparkSession`](api/java/index.html#org.apache.spark.sql.SparkSession) class. 
To create a basic `SparkSession`, just use `SparkSession.builder()`:
+
+{% include_example init_session 
java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}
+
+
+
+
+The entry point into all functionality in Spark is the 
[`SparkSession`](api/python/pyspark.sql.html#pyspark.sql.SparkSession) class. 
To create a basic `SparkSession`, just use `SparkSession.builder`:
+
+{% include_example init_session python/sql/basic.py %}
+
+
+
+
+The entry point into all functionality in Spark is the 
[`SparkSession`](api/R/sparkR.session.html) class. To initialize a basic 
`SparkSession`, just call `sparkR.session()`:
+
+{% include_example init_session r/RSparkSQLExample.R %}
+
+Note that when invoked for the first time, `sparkR.session()` initializes 
a global `SparkSession` singleton instance, and always returns a reference to 
this instance for successive invocations. In this way, users only need to 
initialize the `SparkSession` once, then SparkR functions like `read.df` will 
be able to access this global instance implicitly, and users don't need to pass 
the `SparkSession` instance around.
+
+
+
+`SparkSession` in Spark 2.0 provides builtin support for Hive features 
including the ability to
+write queries using HiveQL, access to Hive UDFs, and the ability to read 
data from Hive tables.
+To use these features, you do not need to have an existing Hive setup.
+
+## Creating DataFrames
+
+
+
+With a `SparkSession`, applications can create DataFrames from an 
[existing `RDD`](#interoperating-with-rdds),
+from a Hive table, or from [Spark data sources](#data-sources).
--- End diff --

Sorry for the missing, will check all inner link by `

[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22608
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22608
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97477/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22608
  
**[Test build #97477 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97477/testReport)**
 for PR 22608 at commit 
[`5d270f1`](https://github.com/apache/spark/commit/5d270f17dccbb2eac6d3c2ab8c12987e3d992086).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...

2018-10-16 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20433
  
@maropu Thanks! This is great to make our Spark SQL parser fully compatible 
with ANSI SQL. Please continue the efforts! 

cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional...

2018-10-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20433#discussion_r225784123
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -335,6 +335,12 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val ANSI_SQL_PARSER =
+buildConf("spark.sql.parser.ansi.enabled")
+  .doc("When true, tries to conform to ANSI SQL syntax.")
+  .booleanConf
+  .createWithDefault(false)
--- End diff --

Since the next is the 3.0 release, we will turn this on by default. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional...

2018-10-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20433#discussion_r225783980
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -335,6 +335,12 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val ANSI_SQL_PARSER =
--- End diff --

The legacy flag will be removed in 3.0 release. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...

2018-10-16 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22746#discussion_r225783658
  
--- Diff: docs/sql-reference.md ---
@@ -0,0 +1,641 @@
+---
+layout: global
+title: Reference
+displayTitle: Reference
+---
+
+* Table of contents
+{:toc}
+
+## Data Types
+
+Spark SQL and DataFrames support the following data types:
+
+* Numeric types
+- `ByteType`: Represents 1-byte signed integer numbers.
--- End diff --

nit: use 2 space indent.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22749
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22749
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4052/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22749
  
**[Test build #97480 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97480/testReport)**
 for PR 22749 at commit 
[`25a6162`](https://github.com/apache/spark/commit/25a616286075ca4f0a7d528095b387172b05c6c3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22219: [SPARK-25224][SQL] Improvement of Spark SQL ThriftServer...

2018-10-16 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22219
  
cc @srinathshankar @yuchenhuo 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...

2018-10-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22746#discussion_r225780740
  
--- Diff: docs/sql-getting-started.md ---
@@ -0,0 +1,369 @@
+---
+layout: global
+title: Getting Started
+displayTitle: Getting Started
+---
+
+* Table of contents
+{:toc}
+
+## Starting Point: SparkSession
+
+
+
+
+The entry point into all functionality in Spark is the 
[`SparkSession`](api/scala/index.html#org.apache.spark.sql.SparkSession) class. 
To create a basic `SparkSession`, just use `SparkSession.builder()`:
+
+{% include_example init_session 
scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
+
+
+
+
+The entry point into all functionality in Spark is the 
[`SparkSession`](api/java/index.html#org.apache.spark.sql.SparkSession) class. 
To create a basic `SparkSession`, just use `SparkSession.builder()`:
+
+{% include_example init_session 
java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}
+
+
+
+
+The entry point into all functionality in Spark is the 
[`SparkSession`](api/python/pyspark.sql.html#pyspark.sql.SparkSession) class. 
To create a basic `SparkSession`, just use `SparkSession.builder`:
+
+{% include_example init_session python/sql/basic.py %}
+
+
+
+
+The entry point into all functionality in Spark is the 
[`SparkSession`](api/R/sparkR.session.html) class. To initialize a basic 
`SparkSession`, just call `sparkR.session()`:
+
+{% include_example init_session r/RSparkSQLExample.R %}
+
+Note that when invoked for the first time, `sparkR.session()` initializes 
a global `SparkSession` singleton instance, and always returns a reference to 
this instance for successive invocations. In this way, users only need to 
initialize the `SparkSession` once, then SparkR functions like `read.df` will 
be able to access this global instance implicitly, and users don't need to pass 
the `SparkSession` instance around.
+
+
+
+`SparkSession` in Spark 2.0 provides builtin support for Hive features 
including the ability to
+write queries using HiveQL, access to Hive UDFs, and the ability to read 
data from Hive tables.
+To use these features, you do not need to have an existing Hive setup.
+
+## Creating DataFrames
+
+
+
+With a `SparkSession`, applications can create DataFrames from an 
[existing `RDD`](#interoperating-with-rdds),
+from a Hive table, or from [Spark data sources](#data-sources).
--- End diff --

The link `[Spark data sources](#data-sources)` does not work after this 
change. Could you fix all the similar cases? Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22694: [SQL][CATALYST][MINOR] update some error comments

2018-10-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22694


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22694: [SQL][CATALYST][MINOR] update some error comments

2018-10-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22694
  
Merged to master and branch-2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...

2018-10-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22503
  
@justinuang, okay. Mind rebasing this please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22263
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22263
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97476/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22263: [SPARK-25269][SQL] SQL interface support specify Storage...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22263
  
**[Test build #97476 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97476/testReport)**
 for PR 22263 at commit 
[`5e088b8`](https://github.com/apache/spark/commit/5e088b86822dd6b1bf4c3bb085fde3c96af03658).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22295: [SPARK-25255][PYTHON]Add getActiveSession to SparkSessio...

2018-10-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22295
  
@huaxingao, thanks for addressing comments. Would you mind rebasing it and 
resolving the conflicts?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingListener...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22752
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97474/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingListener...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22752
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22752: [SPARK-24787][CORE] Revert hsync in EventLoggingListener...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22752
  
**[Test build #97474 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97474/testReport)**
 for PR 22752 at commit 
[`a3f53c4`](https://github.com/apache/spark/commit/a3f53c41879e28d71d4dbd79d80a51e50d82ecee).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97475/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #97475 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97475/testReport)**
 for PR 22482 at commit 
[`5c74609`](https://github.com/apache/spark/commit/5c746090a8d5560f043754383656d54653a315dc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22729: [SPARK-25737][CORE] Remove JavaSparkContextVarargsWorkar...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22729
  
**[Test build #4380 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4380/testReport)**
 for PR 22729 at commit 
[`0860d27`](https://github.com/apache/spark/commit/0860d27a205d3dd3d94e6bbe2c9db49b7e432ef4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22749
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22749
  
**[Test build #97479 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97479/testReport)**
 for PR 22749 at commit 
[`6a6fa45`](https://github.com/apache/spark/commit/6a6fa454e22728cc2ad8e5515cd587fe0be84b26).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22749
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97479/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...

2018-10-16 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22708#discussion_r225769471
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java ---
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package test.org.apache.spark.sql;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Encoder;
+import org.apache.spark.sql.Encoders;
+import org.apache.spark.sql.test.TestSparkSession;
+import org.apache.spark.sql.types.ArrayType;
+import org.apache.spark.sql.types.DataType;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+
+public class JavaBeanWithArraySuite {
+
+private static final List RECORDS = new ArrayList<>();
+
+static {
+RECORDS.add(new Record(1,
+Arrays.asList(new Interval(111, 211), new Interval(121, 
221)),
+Arrays.asList(11, 21, 31, 41)
+));
+RECORDS.add(new Record(2,
+Arrays.asList(new Interval(112, 212), new Interval(122, 
222)),
+Arrays.asList(12, 22, 32, 42)
+));
+RECORDS.add(new Record(3,
+Arrays.asList(new Interval(113, 213), new Interval(123, 
223)),
+Arrays.asList(13, 23, 33, 43)
+));
+}
+
+private TestSparkSession spark;
+
+@Before
+public void setUp() {
+spark = new TestSparkSession();
+}
+
+@After
+public void tearDown() {
+spark.stop();
+spark = null;
+}
+
+@Test
+public void testBeanWithArrayFieldsDeserialization() {
+
+StructType schema = createSchema();
+Encoder encoder = Encoders.bean(Record.class);
+
+Dataset dataset = spark
+.read()
+.format("json")
+.schema(schema)
+.load("src/test/resources/test-data/with-array-fields")
+.as(encoder);
+
+List records = dataset.collectAsList();
+
+Assert.assertTrue(Util.equals(records, RECORDS));
+}
+
+private static StructType createSchema() {
+StructField[] intervalFields = {
+new StructField("startTime", DataTypes.LongType, true, 
Metadata.empty()),
+new StructField("endTime", DataTypes.LongType, true, 
Metadata.empty())
+};
+DataType intervalType = new StructType(intervalFields);
+
+DataType intervalsType = new ArrayType(intervalType, true);
+
+DataType valuesType = new ArrayType(DataTypes.IntegerType, true);
+
+StructField[] fields = {
+new StructField("id", DataTypes.IntegerType, true, 
Metadata.empty()),
+new StructField("intervals", intervalsType, true, 
Metadata.empty()),
+new StructField("values", valuesType, true, 
Metadata.empty())
+};
+return new StructType(fields);
+}
+
+public static class Record {
+
+private int id;
+private List intervals;
+private List values;
+
+public Record() { }
+
+Record(int id, List intervals, List values) {
+this.id = id;
+this.intervals = intervals;
+this.values = values;
+}
+
+public int getId() {
+return id;
+}
+
+public void setId(int id) {
  

[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...

2018-10-16 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22708#discussion_r225768857
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java ---
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package test.org.apache.spark.sql;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Encoder;
+import org.apache.spark.sql.Encoders;
+import org.apache.spark.sql.test.TestSparkSession;
+import org.apache.spark.sql.types.ArrayType;
+import org.apache.spark.sql.types.DataType;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+
+public class JavaBeanWithArraySuite {
+
+private static final List RECORDS = new ArrayList<>();
+
+static {
+RECORDS.add(new Record(1,
+Arrays.asList(new Interval(111, 211), new Interval(121, 
221)),
+Arrays.asList(11, 21, 31, 41)
+));
+RECORDS.add(new Record(2,
+Arrays.asList(new Interval(112, 212), new Interval(122, 
222)),
+Arrays.asList(12, 22, 32, 42)
+));
+RECORDS.add(new Record(3,
+Arrays.asList(new Interval(113, 213), new Interval(123, 
223)),
+Arrays.asList(13, 23, 33, 43)
+));
+}
+
+private TestSparkSession spark;
+
+@Before
+public void setUp() {
+spark = new TestSparkSession();
+}
+
+@After
+public void tearDown() {
+spark.stop();
+spark = null;
+}
+
+@Test
+public void testBeanWithArrayFieldsDeserialization() {
+
+StructType schema = createSchema();
+Encoder encoder = Encoders.bean(Record.class);
+
+Dataset dataset = spark
+.read()
+.format("json")
+.schema(schema)
+.load("src/test/resources/test-data/with-array-fields")
+.as(encoder);
+
+List records = dataset.collectAsList();
+
+Assert.assertTrue(Util.equals(records, RECORDS));
+}
+
+private static StructType createSchema() {
+StructField[] intervalFields = {
+new StructField("startTime", DataTypes.LongType, true, 
Metadata.empty()),
+new StructField("endTime", DataTypes.LongType, true, 
Metadata.empty())
+};
+DataType intervalType = new StructType(intervalFields);
+
+DataType intervalsType = new ArrayType(intervalType, true);
+
+DataType valuesType = new ArrayType(DataTypes.IntegerType, true);
+
+StructField[] fields = {
+new StructField("id", DataTypes.IntegerType, true, 
Metadata.empty()),
+new StructField("intervals", intervalsType, true, 
Metadata.empty()),
+new StructField("values", valuesType, true, 
Metadata.empty())
+};
+return new StructType(fields);
+}
+
+public static class Record {
+
+private int id;
+private List intervals;
+private List values;
--- End diff --

Will this list of int affect the test? If no, maybe we can get rid of it to 
simplify the test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22745: [SPARK-21402][SQL][FOLLOW-UP] Fix java map of str...

2018-10-16 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22745#discussion_r225768707
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithMapSuite.java ---
@@ -0,0 +1,257 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package test.org.apache.spark.sql;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Encoder;
+import org.apache.spark.sql.Encoders;
+import org.apache.spark.sql.test.TestSparkSession;
+import org.apache.spark.sql.types.DataType;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.sql.types.MapType;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+
+public class JavaBeanWithMapSuite {
+
+private static final List RECORDS = new ArrayList<>();
+
+static {
+RECORDS.add(new Record(1,
+toMap(
+Arrays.asList("a", "b"),
+Arrays.asList(new Interval(111, 211), new 
Interval(121, 221))
+),
+toMap(Arrays.asList("a", "b", "c"), Arrays.asList(11, 21, 
31))
+));
+RECORDS.add(new Record(2,
+toMap(
+Arrays.asList("a", "b"),
+Arrays.asList(new Interval(112, 212), new 
Interval(122, 222))
+),
+toMap(Arrays.asList("a", "b", "c"), Arrays.asList(12, 22, 
32))
+));
+RECORDS.add(new Record(3,
+toMap(
+Arrays.asList("a", "b"),
+Arrays.asList(new Interval(113, 213), new 
Interval(123, 223))
+),
+toMap(Arrays.asList("a", "b", "c"), Arrays.asList(13, 23, 
33))
+));
+}
+
+private static  Map toMap(Collection keys, 
Collection values) {
+Map map = new HashMap<>();
+Iterator keyI = keys.iterator();
+Iterator valueI = values.iterator();
+while (keyI.hasNext() && valueI.hasNext()) {
+map.put(keyI.next(), valueI.next());
+}
+return map;
+}
+
+private TestSparkSession spark;
+
+@Before
+public void setUp() {
+spark = new TestSparkSession();
+}
+
+@After
+public void tearDown() {
+spark.stop();
+spark = null;
+}
+
+@Test
+public void testBeanWithMapFieldsDeserialization() {
+
+StructType schema = createSchema();
+Encoder encoder = Encoders.bean(Record.class);
+
+Dataset dataset = spark
+.read()
+.format("json")
+.schema(schema)
+.load("src/test/resources/test-data/with-map-fields")
+.as(encoder);
+
+List records = dataset.collectAsList();
+
+Assert.assertTrue(Util.equals(records, RECORDS));
+}
+
+private static StructType createSchema() {
+StructField[] intervalFields = {
+new StructField("startTime", DataTypes.LongType, true, 
Metadata.empty()),
+new StructField("endTime", DataTypes.LongType, true, 
Metadata.empty())
+};
+DataType intervalType = new StructType(intervalFields);
+
+DataType intervalsType = new MapType(DataTypes.StringType, 
intervalType, true);
+
+DataType valuesType = new MapType(DataTypes.StringType, 
DataTypes.IntegerT

[GitHub] spark pull request #22724: [SPARK-25734][SQL] Literal should have a value co...

2018-10-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22724


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22724: [SPARK-25734][SQL] Literal should have a value correspon...

2018-10-16 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22724
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...

2018-10-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22708#discussion_r225767103
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java ---
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package test.org.apache.spark.sql;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Encoder;
+import org.apache.spark.sql.Encoders;
+import org.apache.spark.sql.test.TestSparkSession;
+import org.apache.spark.sql.types.ArrayType;
+import org.apache.spark.sql.types.DataType;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+
+public class JavaBeanWithArraySuite {
+
+private static final List RECORDS = new ArrayList<>();
+
+static {
+RECORDS.add(new Record(1,
+Arrays.asList(new Interval(111, 211), new Interval(121, 
221)),
+Arrays.asList(11, 21, 31, 41)
+));
+RECORDS.add(new Record(2,
+Arrays.asList(new Interval(112, 212), new Interval(122, 
222)),
+Arrays.asList(12, 22, 32, 42)
+));
+RECORDS.add(new Record(3,
+Arrays.asList(new Interval(113, 213), new Interval(123, 
223)),
+Arrays.asList(13, 23, 33, 43)
+));
+}
+
+private TestSparkSession spark;
+
+@Before
+public void setUp() {
+spark = new TestSparkSession();
+}
+
+@After
+public void tearDown() {
+spark.stop();
+spark = null;
+}
+
+@Test
+public void testBeanWithArrayFieldsDeserialization() {
+
+StructType schema = createSchema();
+Encoder encoder = Encoders.bean(Record.class);
+
+Dataset dataset = spark
+.read()
+.format("json")
+.schema(schema)
+.load("src/test/resources/test-data/with-array-fields")
+.as(encoder);
+
+List records = dataset.collectAsList();
+
+Assert.assertTrue(Util.equals(records, RECORDS));
+}
+
+private static StructType createSchema() {
+StructField[] intervalFields = {
+new StructField("startTime", DataTypes.LongType, true, 
Metadata.empty()),
+new StructField("endTime", DataTypes.LongType, true, 
Metadata.empty())
+};
+DataType intervalType = new StructType(intervalFields);
+
+DataType intervalsType = new ArrayType(intervalType, true);
+
+DataType valuesType = new ArrayType(DataTypes.IntegerType, true);
+
+StructField[] fields = {
+new StructField("id", DataTypes.IntegerType, true, 
Metadata.empty()),
+new StructField("intervals", intervalsType, true, 
Metadata.empty()),
+new StructField("values", valuesType, true, 
Metadata.empty())
+};
+return new StructType(fields);
+}
+
+public static class Record {
+
+private int id;
+private List intervals;
+private List values;
+
+public Record() { }
+
+Record(int id, List intervals, List values) {
+this.id = id;
+this.intervals = intervals;
+this.values = values;
+}
+
+public int getId() {
+return id;
+}
+
+public void setId(int id) {

[GitHub] spark issue #22745: [SPARK-21402][SQL][FOLLOW-UP] Fix java map of structs de...

2018-10-16 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22745
  
It's a different issue, I think it worth a new ticket


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-16 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22732#discussion_r225764876
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -81,11 +81,11 @@ case class UserDefinedFunction protected[sql] (
   f,
   dataType,
   exprs.map(_.expr),
+  nullableTypes.map(_.map(!_)).getOrElse(exprs.map(_ => false)),
--- End diff --

Hm, but we can't use getParameterTypes anymore. It won't work in Scala 
2.12. Where the nullability info is definitely not available, be conservative 
and assume it all needs null handling?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22746: [SPARK-24499][SQL][DOC] Split the page of sql-programmin...

2018-10-16 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/22746
  
@gatorsmile Sorry for the late on this, please have a look when you have 
time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22749
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22608
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22749
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4051/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22608
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4050/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22608
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4050/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF construc...

2018-10-16 Thread maryannxue
Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/22732#discussion_r225762708
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -81,11 +81,11 @@ case class UserDefinedFunction protected[sql] (
   f,
   dataType,
   exprs.map(_.expr),
+  nullableTypes.map(_.map(!_)).getOrElse(exprs.map(_ => false)),
--- End diff --

In addition to what I just pointed out, which is when we did try to get 
`inputSchemas` through `ScalaReflection.schemaFor` and got an exception for 
unrecognized types, there's another case where we could get an unspecified 
`nullableTypes`, and that is when `UserDefinedFunction` is instantiated calling 
the constructor but not the `create` method.
Then I assume it's created by an earlier version, and we should use the old 
logic, i.e., `ScalaReflection.getParameterTypes` 
(https://github.com/apache/spark/pull/22259/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2L2153)
 to get the correct information for `nullableTypes`. Is that right, @cloud-fan 
@srowen ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22749
  
**[Test build #97479 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97479/testReport)**
 for PR 22749 at commit 
[`6a6fa45`](https://github.com/apache/spark/commit/6a6fa454e22728cc2ad8e5515cd587fe0be84b26).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22608
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4050/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-16 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21990#discussion_r225762148
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -1136,4 +1121,27 @@ object SparkSession extends Logging {
   SparkSession.clearDefaultSession()
 }
   }
+
+  /**
+   * Initialize extensions if the user has defined a configurator class in 
their SparkConf.
+   * This class will be applied to the extensions passed into this 
function.
+   */
+  private[sql] def applyExtensionsFromConf(conf: SparkConf, extensions: 
SparkSessionExtensions) {
--- End diff --

Oh, I see, moving to the default constructor was not a good idea.
How about the first suggestion?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22263: [SPARK-25269][SQL] SQL interface support specify ...

2018-10-16 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22263#discussion_r225762035
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -288,6 +297,65 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 }
   }
 
+  test("SQL interface support storageLevel(DISK_ONLY)") {
--- End diff --

How about this:
```scala
Seq("LAZY", "").foreach { isLazy =>
  Seq(true, false).foreach { withInvalidOptions =>
Seq(true, false).foreach { withCacheTempView =>
  Map("DISK_ONLY" -> Disk, "MEMORY_ONLY" -> Memory).foreach {
case (storageLevel, dataReadMethod) =>
  val testName = s"SQL interface support option: storageLevel: 
$storageLevel, " +
s"isLazy: ${isLazy.equals("LAZY")}, " +
s"withInvalidOptions: $withInvalidOptions, withCacheTempView: 
$withCacheTempView"
  val cacheOption = if (withInvalidOptions) {
s"OPTIONS('storageLevel' '$storageLevel', 'a' '1', 'b' '2')"
  } else {
s"OPTIONS('storageLevel' '$storageLevel')"
  }
  test(testName) {
if (withCacheTempView) {
  withTempView("testSelect") {
sql(s"CACHE $isLazy TABLE testSelect $cacheOption SELECT * 
FROM testData")
assertCached(spark.table("testSelect"))
val rddId = rddIdOf("testSelect")
if (isLazy.equals("LAZY")) {
  sql("SELECT COUNT(*) FROM testSelect").collect()
}
assert(isExpectStorageLevel(rddId, dataReadMethod))
  }
} else {
  sql(s"CACHE $isLazy TABLE testData $cacheOption")
  assertCached(spark.table("testData"))
  val rddId = rddIdOf("testData")
  if (isLazy.equals("LAZY")) {
sql("SELECT COUNT(*) FROM testData").collect()
  }
  assert(isExpectStorageLevel(rddId, dataReadMethod))
}
  }
  }
}
  }
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21588
  
@rxin and @gatorsmile, WDYT?

I already had to argue about Hadoop 3 support here and there (for instance 
see [SPARK-18112|https://issues.apache.org/jira/browse/SPARK-18112] and 
[SPARK-18673|https://issues.apache.org/jira/browse/SPARK-18673]), and explain 
what's going on.

Looks ideally we should go ahead 2. 
(https://github.com/apache/spark/pull/21588#issuecomment-429272279) if I am not 
mistaken. If there are some more concerns we should address before going ahead, 
definitely I am willing to help investigating as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...

2018-10-16 Thread rezasafi
Github user rezasafi commented on the issue:

https://github.com/apache/spark/pull/22612
  
Jenkins retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22608
  
**[Test build #97478 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97478/testReport)**
 for PR 22608 at commit 
[`4c9b886`](https://github.com/apache/spark/commit/4c9b886c1f23bbdd3d8e1ec7df25f03e45892d88).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22608
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22608
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4049/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22608
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4049/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22608
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4049/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...

2018-10-16 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22707#discussion_r225759293
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -227,18 +227,22 @@ case class InsertIntoHiveTable(
   // Newer Hive largely improves insert overwrite performance. As 
Spark uses older Hive
   // version and we may not want to catch up new Hive version 
every time. We delete the
   // Hive partition first and then load data file into the Hive 
partition.
-  if (oldPart.nonEmpty && overwrite) {
-oldPart.get.storage.locationUri.foreach { uri =>
-  val partitionPath = new Path(uri)
-  val fs = partitionPath.getFileSystem(hadoopConf)
-  if (fs.exists(partitionPath)) {
-if (!fs.delete(partitionPath, true)) {
-  throw new RuntimeException(
-"Cannot remove partition directory '" + 
partitionPath.toString)
-}
-// Don't let Hive do overwrite operation since it is 
slower.
-doHiveOverwrite = false
+  if (overwrite) {
+val oldPartitionPath = 
oldPart.flatMap(_.storage.locationUri.map(new Path(_)))
+  .getOrElse {
+ExternalCatalogUtils.generatePartitionPath(
+  partitionSpec,
+  partitionColumnNames,
+  HiveClientImpl.toHiveTable(table).getDataLocation)
--- End diff --

Looks correct as I saw we assign `CatalogTable.storage.locationUr` to 
HiveTable's data location.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22608
  
**[Test build #97477 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97477/testReport)**
 for PR 22608 at commit 
[`5d270f1`](https://github.com/apache/spark/commit/5d270f17dccbb2eac6d3c2ab8c12987e3d992086).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22379


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-16 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/21588
  
Thanks @HyukjinKwon 
Upgrade Hive to 2.3.2 can fix 
[SPARK-12014](https://issues.apache.org/jira/browse/SPARK-12014), 
[SPARK-18673](https://issues.apache.org/jira/browse/SPARK-18673), 
[SPARK-24766](https://issues.apache.org/jira/browse/SPARK-24766) and 
[SPARK-25193](https://issues.apache.org/jira/browse/SPARK-25193).
Also, can improve the performance of the 
[SPARK-18107](https://issues.apache.org/jira/browse/SPARK-18107).
Seems it doesn’t break backward compatibility. I have verified it in our 
production environment (Hive 1.2.1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22666
  
Woah .. let me resolve the conflicts tonight.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22379
  
Thanks all!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22379
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22707: [SPARK-25717][SQL] Insert overwrite a recreated e...

2018-10-16 Thread fjh100456
Github user fjh100456 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22707#discussion_r225756219
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -227,18 +227,22 @@ case class InsertIntoHiveTable(
   // Newer Hive largely improves insert overwrite performance. As 
Spark uses older Hive
   // version and we may not want to catch up new Hive version 
every time. We delete the
   // Hive partition first and then load data file into the Hive 
partition.
-  if (oldPart.nonEmpty && overwrite) {
-oldPart.get.storage.locationUri.foreach { uri =>
-  val partitionPath = new Path(uri)
-  val fs = partitionPath.getFileSystem(hadoopConf)
-  if (fs.exists(partitionPath)) {
-if (!fs.delete(partitionPath, true)) {
-  throw new RuntimeException(
-"Cannot remove partition directory '" + 
partitionPath.toString)
-}
-// Don't let Hive do overwrite operation since it is 
slower.
-doHiveOverwrite = false
+  if (overwrite) {
+val oldPartitionPath = 
oldPart.flatMap(_.storage.locationUri.map(new Path(_)))
+  .getOrElse {
+ExternalCatalogUtils.generatePartitionPath(
+  partitionSpec,
+  partitionColumnNames,
+  HiveClientImpl.toHiveTable(table).getDataLocation)
--- End diff --

> 
> 
> `HiveClientImpl.toHiveTable(table).getDataLocation` -> `new 
Path(table.location)`?

Yes, they get the same value. I'll change it , thank you very much.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22748: [SPARK-25745][K8S] Improve docker-image-tool.sh script

2018-10-16 Thread ifilonenko
Github user ifilonenko commented on the issue:

https://github.com/apache/spark/pull/22748
  
There seems to be overlapping logic between this PR and 
https://github.com/apache/spark/pull/22681


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21990
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-10-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21990
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97472/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-10-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21990
  
**[Test build #97472 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97472/testReport)**
 for PR 21990 at commit 
[`d9b2a55`](https://github.com/apache/spark/commit/d9b2a55275b74c406d9f9c435bf1b53a6ef4b35a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22745: [SPARK-21402][SQL][FOLLOW-UP] Fix java map of structs de...

2018-10-16 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22745
  
Is this a separate PR because this part is pretty separable, and you think 
could be considered separately? if it's all part of one logical change that 
should go in together or not at all, they can be in the original PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...

2018-10-16 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/spark/pull/22598#discussion_r225752604
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/TokenUtil.scala
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.text.SimpleDateFormat
+import java.util.Properties
+
+import org.apache.hadoop.io.Text
+import org.apache.hadoop.security.token.{Token, TokenIdentifier}
+import 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier
+import org.apache.kafka.clients.CommonClientConfigs
+import org.apache.kafka.clients.admin.{AdminClient, 
CreateDelegationTokenOptions}
+import org.apache.kafka.common.config.SaslConfigs
+import org.apache.kafka.common.security.token.delegation.DelegationToken
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+
+private[kafka010] object TokenUtil extends Logging {
+  private[kafka010] val TOKEN_KIND = new Text("KAFKA_DELEGATION_TOKEN")
+  private[kafka010] val TOKEN_SERVICE = new 
Text("kafka.server.delegation.token")
+
+  private[kafka010] class KafkaDelegationTokenIdentifier extends 
AbstractDelegationTokenIdentifier {
+override def getKind: Text = TOKEN_KIND;
+  }
+
+  private def printToken(token: DelegationToken): Unit = {
+if (log.isDebugEnabled) {
+  val dateFormat = new SimpleDateFormat("-MM-dd'T'HH:mm")
+  logDebug("%-15s %-30s %-15s %-25s %-15s %-15s %-15s".format(
+"TOKENID", "HMAC", "OWNER", "RENEWERS", "ISSUEDATE", "EXPIRYDATE", 
"MAXDATE"))
+  val tokenInfo = token.tokenInfo
+  logDebug("%-15s [hidden] %-15s %-25s %-15s %-15s %-15s".format(
+tokenInfo.tokenId,
+tokenInfo.owner,
+tokenInfo.renewersAsString,
+dateFormat.format(tokenInfo.issueTimestamp),
+dateFormat.format(tokenInfo.expiryTimestamp),
+dateFormat.format(tokenInfo.maxTimestamp)))
+}
+  }
+
+  private[kafka010] def createAdminClientProperties(sparkConf: SparkConf): 
Properties = {
+val adminClientProperties = new Properties
+
+val bootstrapServers = sparkConf.get(KAFKA_BOOTSTRAP_SERVERS)
+require(bootstrapServers.nonEmpty, s"Tried to obtain kafka delegation 
token but bootstrap " +
+  "servers not configured.")
+
adminClientProperties.put(CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, 
bootstrapServers.get)
+
+val protocol = sparkConf.get(KAFKA_SECURITY_PROTOCOL)
+
adminClientProperties.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, 
protocol)
+if (protocol.endsWith("SSL")) {
+  logInfo("SSL protocol detected.")
+  sparkConf.get(KAFKA_TRUSTSTORE_LOCATION).foreach { 
truststoreLocation =>
+adminClientProperties.put("ssl.truststore.location", 
truststoreLocation)
+  }
+  sparkConf.get(KAFKA_TRUSTSTORE_PASSWORD).foreach { 
truststorePassword =>
+adminClientProperties.put("ssl.truststore.password", 
truststorePassword)
+  }
+} else {
+  logWarning("Obtaining kafka delegation token through plain 
communication channel. Please " +
+"consider the security impact.")
+}
+
+// There are multiple possibilities to log in:
+// - Keytab is provided -> try to log in with kerberos module using 
kafka's dynamic JAAS
+//   configuration.
+// - Keytab not provided -> try to log in with JVM global security 
configuration
+//   which can be configured for example with 
'java.security.auth.login.config'.
+//   For this no additional parameter needed.
+KafkaSecurityHelper.getKeytabJaasParams(sparkConf).foreach { 
jaasParams =>
+  logInfo("Keytab detected, using it for login.")
+  adminClientProperties.put(SaslConfigs.SASL_MECHANISM, 
SaslConfigs.GSSAPI_MECHANISM)
+

[GitHub] spark issue #22725: [SPARK-24610][[CORE][FOLLOW-UP]fix reading small files v...

2018-10-16 Thread 10110346
Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/22725
  
@tgravescs ok, I will do it ,thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...

2018-10-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22708#discussion_r225752208
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java ---
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package test.org.apache.spark.sql;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Encoder;
+import org.apache.spark.sql.Encoders;
+import org.apache.spark.sql.test.TestSparkSession;
+import org.apache.spark.sql.types.ArrayType;
+import org.apache.spark.sql.types.DataType;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
--- End diff --

If we remove `createSchema`, we can remove Line 35 ~ 40 , too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...

2018-10-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22708#discussion_r225751969
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java ---
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package test.org.apache.spark.sql;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Encoder;
+import org.apache.spark.sql.Encoders;
+import org.apache.spark.sql.test.TestSparkSession;
+import org.apache.spark.sql.types.ArrayType;
+import org.apache.spark.sql.types.DataType;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+
+public class JavaBeanWithArraySuite {
+
+private static final List RECORDS = new ArrayList<>();
+
+static {
+RECORDS.add(new Record(1,
+Arrays.asList(new Interval(111, 211), new Interval(121, 
221)),
+Arrays.asList(11, 21, 31, 41)
+));
+RECORDS.add(new Record(2,
+Arrays.asList(new Interval(112, 212), new Interval(122, 
222)),
+Arrays.asList(12, 22, 32, 42)
+));
+RECORDS.add(new Record(3,
+Arrays.asList(new Interval(113, 213), new Interval(123, 
223)),
+Arrays.asList(13, 23, 33, 43)
+));
+}
+
+private TestSparkSession spark;
+
+@Before
+public void setUp() {
+spark = new TestSparkSession();
+}
+
+@After
+public void tearDown() {
+spark.stop();
+spark = null;
+}
+
+@Test
+public void testBeanWithArrayFieldsDeserialization() {
+
+StructType schema = createSchema();
+Encoder encoder = Encoders.bean(Record.class);
+
+Dataset dataset = spark
+.read()
+.format("json")
+.schema(schema)
+.load("src/test/resources/test-data/with-array-fields")
+.as(encoder);
+
+List records = dataset.collectAsList();
+
+Assert.assertTrue(Util.equals(records, RECORDS));
+}
+
+private static StructType createSchema() {
+StructField[] intervalFields = {
+new StructField("startTime", DataTypes.LongType, true, 
Metadata.empty()),
+new StructField("endTime", DataTypes.LongType, true, 
Metadata.empty())
+};
+DataType intervalType = new StructType(intervalFields);
+
+DataType intervalsType = new ArrayType(intervalType, true);
+
+DataType valuesType = new ArrayType(DataTypes.IntegerType, true);
+
+StructField[] fields = {
+new StructField("id", DataTypes.IntegerType, true, 
Metadata.empty()),
+new StructField("intervals", intervalsType, true, 
Metadata.empty()),
+new StructField("values", valuesType, true, 
Metadata.empty())
+};
+return new StructType(fields);
+}
+
+public static class Record {
+
+private int id;
+private List intervals;
+private List values;
+
+public Record() { }
+
+Record(int id, List intervals, List values) {
+this.id = id;
+this.intervals = intervals;
+this.values = values;
+}
+
+public int getId() {
+return id;
+}
+
+public void setId(int i

[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...

2018-10-16 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/22655
  
Thanks @viirya !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22708: [SPARK-21402][SQL] Fix java array of structs dese...

2018-10-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22708#discussion_r225751513
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanWithArraySuite.java ---
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package test.org.apache.spark.sql;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Encoder;
+import org.apache.spark.sql.Encoders;
+import org.apache.spark.sql.test.TestSparkSession;
+import org.apache.spark.sql.types.ArrayType;
+import org.apache.spark.sql.types.DataType;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+
+public class JavaBeanWithArraySuite {
+
+private static final List RECORDS = new ArrayList<>();
+
+static {
+RECORDS.add(new Record(1,
+Arrays.asList(new Interval(111, 211), new Interval(121, 
221)),
+Arrays.asList(11, 21, 31, 41)
+));
+RECORDS.add(new Record(2,
+Arrays.asList(new Interval(112, 212), new Interval(122, 
222)),
+Arrays.asList(12, 22, 32, 42)
+));
+RECORDS.add(new Record(3,
+Arrays.asList(new Interval(113, 213), new Interval(123, 
223)),
+Arrays.asList(13, 23, 33, 43)
+));
+}
+
+private TestSparkSession spark;
+
+@Before
+public void setUp() {
+spark = new TestSparkSession();
+}
+
+@After
+public void tearDown() {
+spark.stop();
+spark = null;
+}
+
+@Test
+public void testBeanWithArrayFieldsDeserialization() {
+
+StructType schema = createSchema();
+Encoder encoder = Encoders.bean(Record.class);
+
+Dataset dataset = spark
+.read()
+.format("json")
+.schema(schema)
+.load("src/test/resources/test-data/with-array-fields")
+.as(encoder);
+
+List records = dataset.collectAsList();
+
+Assert.assertTrue(Util.equals(records, RECORDS));
+}
+
+private static StructType createSchema() {
+StructField[] intervalFields = {
+new StructField("startTime", DataTypes.LongType, true, 
Metadata.empty()),
+new StructField("endTime", DataTypes.LongType, true, 
Metadata.empty())
+};
+DataType intervalType = new StructType(intervalFields);
+
+DataType intervalsType = new ArrayType(intervalType, true);
+
+DataType valuesType = new ArrayType(DataTypes.IntegerType, true);
+
+StructField[] fields = {
+new StructField("id", DataTypes.IntegerType, true, 
Metadata.empty()),
+new StructField("intervals", intervalsType, true, 
Metadata.empty()),
+new StructField("values", valuesType, true, 
Metadata.empty())
+};
+return new StructType(fields);
+}
+
+public static class Record {
+
+private int id;
+private List intervals;
+private List values;
+
+public Record() { }
+
+Record(int id, List intervals, List values) {
+this.id = id;
+this.intervals = intervals;
+this.values = values;
+}
+
+public int getId() {
+return id;
+}
+
+public void setId(int i

  1   2   3   4   5   6   >