[GitHub] spark issue #20403: [SPARK-23238][SQL] Externalize SQLConf spark.sql.executi...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20403
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86742/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20403: [SPARK-23238][SQL] Externalize SQLConf spark.sql.executi...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20403
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20403: [SPARK-23238][SQL] Externalize SQLConf spark.sql.executi...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20403
  
**[Test build #86742 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86742/testReport)**
 for PR 20403 at commit 
[`0c05526`](https://github.com/apache/spark/commit/0c0552625eecd984d268c8bed2903c87b5adce58).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20408: [SPARK-23189][Core][Web UI] Reflect stage level b...

2018-01-27 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/20408#discussion_r164292079
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
@@ -594,12 +606,24 @@ private[spark] class AppStatusListener(
 
   stage.executorSummaries.values.foreach(update(_, now))
   update(stage, now, last = true)
+
+  val executorIdsForStage = stage.executorSummaries.keySet
+  executorIdsForStage.foreach { executorId =>
+liveExecutors.get(executorId).foreach { exec =>
+  removeBlackListedStageFrom(exec, event.stageInfo.stageId, now)
--- End diff --

I guess github diff collapse tricked us here. This changes belongs to the 
method onStageCompleted (and definitely not for onExecutorUnblacklisted). This 
is where I remove completed stages from the live executors.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20146
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20146
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86740/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20146
  
**[Test build #86740 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86740/testReport)**
 for PR 20146 at commit 
[`b884fb5`](https://github.com/apache/spark/commit/b884fb5c0ce1e627390d08d8425721ea8e4d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20402: [SPARK-23223][SQL] Make stacking dataset transforms more...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20402
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20402: [SPARK-23223][SQL] Make stacking dataset transforms more...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20402
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86741/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20402: [SPARK-23223][SQL] Make stacking dataset transforms more...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20402
  
**[Test build #86741 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86741/testReport)**
 for PR 20402 at commit 
[`efe9eaf`](https://github.com/apache/spark/commit/efe9eaf775e325909cbb9639f64c9099b90b2f99).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20409: [SPARK-23233][PYTHON] Reset the cache in asNondeterminis...

2018-01-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20409
  
Thank you @gatorsmile and @viirya.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20403: [SPARK-23238][SQL] Externalize SQLConf spark.sql.executi...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20403
  
**[Test build #86742 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86742/testReport)**
 for PR 20403 at commit 
[`0c05526`](https://github.com/apache/spark/commit/0c0552625eecd984d268c8bed2903c87b5adce58).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20403: [SPARK-23238][SQL] Externalize SQLConf spark.sql.executi...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20403
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/313/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20403: [SPARK-23238][SQL] Externalize SQLConf spark.sql.executi...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20403
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20068: [SPARK-17916][SQL] Fix empty string being parsed as null...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20068
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86739/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20068: [SPARK-17916][SQL] Fix empty string being parsed as null...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20068
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.s...

2018-01-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20403#discussion_r164288478
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1043,11 +1043,11 @@ object SQLConf {
 
   val ARROW_EXECUTION_ENABLE =
 buildConf("spark.sql.execution.arrow.enabled")
-  .internal()
-  .doc("Make use of Apache Arrow for columnar data transfers. 
Currently available " +
-"for use with pyspark.sql.DataFrame.toPandas with the following 
data types: " +
-"StringType, BinaryType, BooleanType, DoubleType, FloatType, 
ByteType, IntegerType, " +
-"LongType, ShortType")
+  .doc("When true, make use of Apache Arrow for columnar data 
transfers. Currently available " +
+"for use with pyspark.sql.DataFrame.toPandas, and " +
+"pyspark.sql.SparkSession.createDataFrame when its input is a 
Pandas DataFrame. " +
+"The following data types are unsupported: " +
+"MapType, ArrayType of TimestampType, and nested StructType.")
   .booleanConf
   .createWithDefault(false)
--- End diff --

Yup. Let me 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20068: [SPARK-17916][SQL] Fix empty string being parsed as null...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20068
  
**[Test build #86739 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86739/testReport)**
 for PR 20068 at commit 
[`156d755`](https://github.com/apache/spark/commit/156d755d5a734a00c4c69dfc3565364f3843fca1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20402: [SPARK-23223][SQL] Make stacking dataset transforms more...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20402
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/312/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20402: [SPARK-23223][SQL] Make stacking dataset transforms more...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20402
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20402: [SPARK-23223][SQL] Make stacking dataset transforms more...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20402
  
**[Test build #86741 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86741/testReport)**
 for PR 20402 at commit 
[`efe9eaf`](https://github.com/apache/spark/commit/efe9eaf775e325909cbb9639f64c9099b90b2f99).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.s...

2018-01-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20403#discussion_r164287467
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1043,11 +1043,11 @@ object SQLConf {
 
   val ARROW_EXECUTION_ENABLE =
 buildConf("spark.sql.execution.arrow.enabled")
-  .internal()
-  .doc("Make use of Apache Arrow for columnar data transfers. 
Currently available " +
-"for use with pyspark.sql.DataFrame.toPandas with the following 
data types: " +
-"StringType, BinaryType, BooleanType, DoubleType, FloatType, 
ByteType, IntegerType, " +
-"LongType, ShortType")
+  .doc("When true, make use of Apache Arrow for columnar data 
transfers. Currently available " +
+"for use with pyspark.sql.DataFrame.toPandas, and " +
+"pyspark.sql.SparkSession.createDataFrame when its input is a 
Pandas DataFrame. " +
+"The following data types are unsupported: " +
+"MapType, ArrayType of TimestampType, and nested StructType.")
   .booleanConf
   .createWithDefault(false)
--- End diff --

`spark.sql.execution.arrow.maxRecordsPerBatch` is also mentioned in the doc 
change at #19575. Shall we also externalize it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20383: [SPARK-23200] Reset Kubernetes-specific config on Checkp...

2018-01-27 Thread ssaavedra
Github user ssaavedra commented on the issue:

https://github.com/apache/spark/pull/20383
  
I can probably take a look at testing this over 2.3.0-rc2 on Monday. I did 
not test this on a clean 2.3.0-ish branch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20146
  
**[Test build #86740 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86740/testReport)**
 for PR 20146 at commit 
[`b884fb5`](https://github.com/apache/spark/commit/b884fb5c0ce1e627390d08d8425721ea8e4d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20146
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/311/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20146
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20146
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20403
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20403
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86737/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20146
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20146
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86738/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20403
  
**[Test build #86737 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86737/testReport)**
 for PR 20403 at commit 
[`1f4d288`](https://github.com/apache/spark/commit/1f4d2884ba5b56e06427ce3d91cb6ac5f8f2b7b6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20146
  
**[Test build #86738 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86738/testReport)**
 for PR 20146 at commit 
[`b884fb5`](https://github.com/apache/spark/commit/b884fb5c0ce1e627390d08d8425721ea8e4d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20417: [SPARK-23250][DOCS] Typo in JavaDoc/ScalaDoc for DataFra...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20417
  
**[Test build #4081 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4081/testReport)**
 for PR 20417 at commit 
[`9ef6939`](https://github.com/apache/spark/commit/9ef6939a35981f70253501d19599d93207042370).
 * This patch **fails Scala style tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20417: [SPARK-23250][DOCS] Typo in JavaDoc/ScalaDoc for DataFra...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20417
  
**[Test build #4081 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4081/testReport)**
 for PR 20417 at commit 
[`9ef6939`](https://github.com/apache/spark/commit/9ef6939a35981f70253501d19599d93207042370).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20068: [SPARK-17916][SQL] Fix empty string being parsed as null...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20068
  
**[Test build #86739 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86739/testReport)**
 for PR 20068 at commit 
[`156d755`](https://github.com/apache/spark/commit/156d755d5a734a00c4c69dfc3565364f3843fca1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20068: [SPARK-17916][SQL] Fix empty string being parsed as null...

2018-01-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20068
  
ping @aa8y 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20068: [SPARK-17916][SQL] Fix empty string being parsed as null...

2018-01-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20068
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-27 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/20415
  
@heary-cao have you benchmarked this? The reason I am asking is because 
Spark SQL chains iterators, these are pipelined and only materialized when we 
need to. Your PR effectively removes two virtual calls (hasNext/next) per 
tuple, so I don't see too much benefit here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20068: [SPARK-17916][SQL] Fix empty string being parsed as null...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20068
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20372: [SPARK-23249] Improved block merging logic for partition...

2018-01-27 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20372
  
@cloud-fan @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20416: [SPARK-23248][PYTHON][EXAMPLES] Relocate module d...

2018-01-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20416


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20416: [SPARK-23248][PYTHON][EXAMPLES] Relocate module docstrin...

2018-01-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20416
  
Merged to master and branch-2.3.

Thank you @srowen, @viirya and @felixcheung.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20417: [SPARK-23250][DOCS] Typo in JavaDoc/ScalaDoc for DataFra...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20417
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19575: [SPARK-22221][DOCS] Adding User Documentation for...

2018-01-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19575#discussion_r164286075
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1640,6 +1640,133 @@ Configuration of Hive is done by placing your 
`hive-site.xml`, `core-site.xml` a
 You may run `./bin/spark-sql --help` for a complete list of all available
 options.
 
+# PySpark Usage Guide for Pandas with Apache Arrow
+
+## Apache Arrow in Spark
+
+Apache Arrow is an in-memory columnar data format that is used in Spark to 
efficiently transfer
+data between JVM and Python processes. This currently is most beneficial 
to Python users that
+work with Pandas/NumPy data. Its usage is not automatic and might require 
some minor
+changes to configuration or code to take full advantage and ensure 
compatibility. This guide will
+give a high-level description of how to use Arrow in Spark and highlight 
any differences when
+working with Arrow-enabled data.
+
+### Ensure PyArrow Installed
+
+If you install PySpark using pip, then PyArrow can be brought in as an 
extra dependency of the
+SQL module with the command `pip install pyspark[sql]`. Otherwise, you 
must ensure that PyArrow
+is installed and available on all cluster nodes. The current supported 
version is 0.8.0.
+You can install using pip or conda from the conda-forge channel. See 
PyArrow
+[installation](https://arrow.apache.org/docs/python/install.html) for 
details.
+
+## Enabling for Conversion to/from Pandas
+
+Arrow is available as an optimization when converting a Spark DataFrame to 
Pandas using the call
+`toPandas()` and when creating a Spark DataFrame from Pandas with 
`createDataFrame(pandas_df)`.
+To use Arrow when executing these calls, users need to first set the Spark 
configuration
+'spark.sql.execution.arrow.enabled' to 'true'. This is disabled by default.
+
+
+
+{% include_example dataframe_with_arrow python/sql/arrow.py %}
+
+
+
+Using the above optimizations with Arrow will produce the same results as 
when Arrow is not
+enabled. Note that even with Arrow, `toPandas()` results in the collection 
of all records in the
+DataFrame to the driver program and should be done on a small subset of 
the data. Not all Spark
+data types are currently supported and an error can be raised if a column 
has an unsupported type,
+see [Supported Types](#supported-sql-arrow-types). If an error occurs 
during `createDataFrame()`,
+Spark will fall back to create the DataFrame without Arrow.
+
+## Pandas UDFs (a.k.a. Vectorized UDFs)
+
+Pandas UDFs are user defined functions that are executed by Spark using 
Arrow to transfer data and
+Pandas to work with the data. A Pandas UDF is defined using the keyword 
`pandas_udf` as a decorator
+or to wrap the function, no additional configuration is required. 
Currently, there are two types of
+Pandas UDF: Scalar and Group Map.
+
+### Scalar
--- End diff --

`Scalar Vectorized UDFs`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20417: [SPARK-23250][DOCS] Typo in JavaDoc/ScalaDoc for DataFra...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20417
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19575: [SPARK-22221][DOCS] Adding User Documentation for...

2018-01-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19575#discussion_r164286074
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1640,6 +1640,133 @@ Configuration of Hive is done by placing your 
`hive-site.xml`, `core-site.xml` a
 You may run `./bin/spark-sql --help` for a complete list of all available
 options.
 
+# PySpark Usage Guide for Pandas with Apache Arrow
+
+## Apache Arrow in Spark
+
+Apache Arrow is an in-memory columnar data format that is used in Spark to 
efficiently transfer
+data between JVM and Python processes. This currently is most beneficial 
to Python users that
+work with Pandas/NumPy data. Its usage is not automatic and might require 
some minor
+changes to configuration or code to take full advantage and ensure 
compatibility. This guide will
+give a high-level description of how to use Arrow in Spark and highlight 
any differences when
+working with Arrow-enabled data.
+
+### Ensure PyArrow Installed
+
+If you install PySpark using pip, then PyArrow can be brought in as an 
extra dependency of the
+SQL module with the command `pip install pyspark[sql]`. Otherwise, you 
must ensure that PyArrow
+is installed and available on all cluster nodes. The current supported 
version is 0.8.0.
+You can install using pip or conda from the conda-forge channel. See 
PyArrow
+[installation](https://arrow.apache.org/docs/python/install.html) for 
details.
+
+## Enabling for Conversion to/from Pandas
+
+Arrow is available as an optimization when converting a Spark DataFrame to 
Pandas using the call
+`toPandas()` and when creating a Spark DataFrame from Pandas with 
`createDataFrame(pandas_df)`.
+To use Arrow when executing these calls, users need to first set the Spark 
configuration
+'spark.sql.execution.arrow.enabled' to 'true'. This is disabled by default.
+
+
+
+{% include_example dataframe_with_arrow python/sql/arrow.py %}
+
+
+
+Using the above optimizations with Arrow will produce the same results as 
when Arrow is not
+enabled. Note that even with Arrow, `toPandas()` results in the collection 
of all records in the
+DataFrame to the driver program and should be done on a small subset of 
the data. Not all Spark
+data types are currently supported and an error can be raised if a column 
has an unsupported type,
+see [Supported Types](#supported-sql-arrow-types). If an error occurs 
during `createDataFrame()`,
+Spark will fall back to create the DataFrame without Arrow.
+
+## Pandas UDFs (a.k.a. Vectorized UDFs)
+
+Pandas UDFs are user defined functions that are executed by Spark using 
Arrow to transfer data and
+Pandas to work with the data. A Pandas UDF is defined using the keyword 
`pandas_udf` as a decorator
+or to wrap the function, no additional configuration is required. 
Currently, there are two types of
+Pandas UDF: Scalar and Group Map.
+
+### Scalar
+
+Scalar Pandas UDFs are used for vectorizing scalar operations. They can be 
used with functions such
+as `select` and `withColumn`. The Python function should take 
`pandas.Series` as inputs and return
+a `pandas.Series` of the same length. Internally, Spark will execute a 
Pandas UDF by splitting
+columns into batches and calling the function for each batch as a subset 
of the data, then
+concatenating the results together.
+
+The following example shows how to create a scalar Pandas UDF that 
computes the product of 2 columns.
+
+
+
+{% include_example scalar_pandas_udf python/sql/arrow.py %}
+
+
+
+### Group Map
+Group map Pandas UDFs are used with `groupBy().apply()` which implements 
the "split-apply-combine" pattern.
--- End diff --

`Grouped Vectorized UDFs`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20417: [SPARK-23250][DOCS] Typo in JavaDoc/ScalaDoc for ...

2018-01-27 Thread CCInCharge
GitHub user CCInCharge opened a pull request:

https://github.com/apache/spark/pull/20417

[SPARK-23250][DOCS] Typo in JavaDoc/ScalaDoc for DataFrameWriter

## What changes were proposed in this pull request?

Fix typo in ScalaDoc for DataFrameWriter - originally stated "This is 
applicable for all file-based data sources (e.g. Parquet, JSON) staring Spark 
2.1.0", should be "starting with Spark 2.1.0".

## How was this patch tested?

Check of correct spelling in ScalaDoc

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/CCInCharge/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20417


commit 9ef6939a35981f70253501d19599d93207042370
Author: CCInCharge 
Date:   2018-01-28T01:21:07Z

Fix typo in ScalaDoc for DataFrameWriter




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20414: [SPARK-23243][SQL] Shuffle+Repartition on an RDD could l...

2018-01-27 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20414
  
Just for context, I'm seeing RDD.repartition being used *a lot*.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20406: [SPARK-23230][SQL]Error by creating a data table when us...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20406
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86736/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20406: [SPARK-23230][SQL]Error by creating a data table when us...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20406
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20406: [SPARK-23230][SQL]Error by creating a data table when us...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20406
  
**[Test build #86736 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86736/testReport)**
 for PR 20406 at commit 
[`f370dd6`](https://github.com/apache/spark/commit/f370dd6217cf8a590ef52ecc970e4dc33c235631).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20403
  
**[Test build #86737 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86737/testReport)**
 for PR 20403 at commit 
[`1f4d288`](https://github.com/apache/spark/commit/1f4d2884ba5b56e06427ce3d91cb6ac5f8f2b7b6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20403
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/309/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20146
  
**[Test build #86738 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86738/testReport)**
 for PR 20146 at commit 
[`b884fb5`](https://github.com/apache/spark/commit/b884fb5c0ce1e627390d08d8425721ea8e4d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20146
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/310/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20403
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20146
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-27 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20403
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20146
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20414: [SPARK-23243][SQL] Shuffle+Repartition on an RDD could l...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20414
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86728/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20414: [SPARK-23243][SQL] Shuffle+Repartition on an RDD could l...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20414
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20414: [SPARK-23243][SQL] Shuffle+Repartition on an RDD could l...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20414
  
**[Test build #86728 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86728/testReport)**
 for PR 20414 at commit 
[`6910ed6`](https://github.com/apache/spark/commit/6910ed62c272bedfa251cab589bb52bed36be3ed).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20146
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20146
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86734/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20146
  
**[Test build #86734 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86734/testReport)**
 for PR 20146 at commit 
[`b884fb5`](https://github.com/apache/spark/commit/b884fb5c0ce1e627390d08d8425721ea8e4d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20369: [SPARK-23196] Unify continuous and microbatch V2 sinks

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20369
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20369: [SPARK-23196] Unify continuous and microbatch V2 sinks

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20369
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86735/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20369: [SPARK-23196] Unify continuous and microbatch V2 sinks

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20369
  
**[Test build #86735 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86735/testReport)**
 for PR 20369 at commit 
[`d311d56`](https://github.com/apache/spark/commit/d311d5639b3af9123e0c6dbe38468f0172e06712).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20369: [SPARK-23196] Unify continuous and microbatch V2 sinks

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20369
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20369: [SPARK-23196] Unify continuous and microbatch V2 sinks

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20369
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86733/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20369: [SPARK-23196] Unify continuous and microbatch V2 sinks

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20369
  
**[Test build #86733 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86733/testReport)**
 for PR 20369 at commit 
[`d311d56`](https://github.com/apache/spark/commit/d311d5639b3af9123e0c6dbe38468f0172e06712).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20403
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86730/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20403
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20403
  
**[Test build #86730 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86730/testReport)**
 for PR 20403 at commit 
[`1f4d288`](https://github.com/apache/spark/commit/1f4d2884ba5b56e06427ce3d91cb6ac5f8f2b7b6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20375: [SPARK-23199][SQL]improved Removes repetition from group...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20375
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86732/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20375: [SPARK-23199][SQL]improved Removes repetition from group...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20375
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20375: [SPARK-23199][SQL]improved Removes repetition from group...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20375
  
**[Test build #86732 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86732/testReport)**
 for PR 20375 at commit 
[`caf581f`](https://github.com/apache/spark/commit/caf581f7f171912af4cebbc3a96887c7bb4a87e5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20385: [SPARK-21396][SQL] Fixes MatchError when UDTs are passed...

2018-01-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20385
  
@atallahhezbor Yeah! Please help us improve the test coverage. We do not 
have a clear way to test the functionality in `SparkExecuteStatementOperation`

Adding unit test cases for `HiveUtils.toHiveString` is enough if we move 
the code changes to `HiveUtils.toHiveString`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-01-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20345
  
Also cc @wzhfy Do you have a bandwidth to review PRs?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCD...

2018-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20343#discussion_r164279795
  
--- Diff: sql/core/src/test/resources/tpcds-v2.7.0/q11.sql ---
@@ -0,0 +1,78 @@
+with year_total as (
+ select c_customer_id customer_id
+   ,c_first_name customer_first_name
+   ,c_last_name customer_last_name
+   ,c_preferred_cust_flag customer_preferred_cust_flag
+   ,c_birth_country customer_birth_country
+   ,c_login customer_login
+   ,c_email_address customer_email_address
+   ,d_year dyear
+   ,sum(ss_ext_list_price-ss_ext_discount_amt) year_total
+   ,'s' sale_type
+ from customer
+ ,store_sales
+ ,date_dim
+ where c_customer_sk = ss_customer_sk
+   and ss_sold_date_sk = d_date_sk
+ group by c_customer_id
+ ,c_first_name
+ ,c_last_name
+ ,c_preferred_cust_flag 
+ ,c_birth_country
+ ,c_login
+ ,c_email_address
+ ,d_year 
+ union all
+ select c_customer_id customer_id
+   ,c_first_name customer_first_name
+   ,c_last_name customer_last_name
+   ,c_preferred_cust_flag customer_preferred_cust_flag
+   ,c_birth_country customer_birth_country
+   ,c_login customer_login
+   ,c_email_address customer_email_address
+   ,d_year dyear
+   ,sum(ws_ext_list_price-ws_ext_discount_amt) year_total
+   ,'w' sale_type
+ from customer
+ ,web_sales
+ ,date_dim
+ where c_customer_sk = ws_bill_customer_sk
+   and ws_sold_date_sk = d_date_sk
+ group by c_customer_id
+ ,c_first_name
+ ,c_last_name
+ ,c_preferred_cust_flag 
+ ,c_birth_country
+ ,c_login
+ ,c_email_address
+ ,d_year
+ )
+  select  
+  t_s_secyear.customer_id
+ ,t_s_secyear.customer_first_name
+ ,t_s_secyear.customer_last_name
+ ,t_s_secyear.customer_email_address
--- End diff --

Regarding a keywords capitalization rule, this is just for readability. We 
do not enforce it, but it is preferred. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20370: Changing JDBC relation to better process quotes

2018-01-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20370
  
ping @conorbmurphy 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20375: [SPARK-23199][SQL]improved Removes repetition from group...

2018-01-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20375
  
LGTM pending Jenkins


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20406: [SPARK-23230][SQL]Error by creating a data table when us...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20406
  
**[Test build #86736 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86736/testReport)**
 for PR 20406 at commit 
[`f370dd6`](https://github.com/apache/spark/commit/f370dd6217cf8a590ef52ecc970e4dc33c235631).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20406: [SPARK-23230][SQL]Error by creating a data table when us...

2018-01-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20406
  
Also cc @dongjoon-hyun 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20406: [SPARK-23230][SQL]Error by creating a data table when us...

2018-01-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20406
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20396: [SPARK-23217][ML] Add cosine distance measure to Cluster...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20396
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20396: [SPARK-23217][ML] Add cosine distance measure to Cluster...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20396
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86731/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20396: [SPARK-23217][ML] Add cosine distance measure to Cluster...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20396
  
**[Test build #86731 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86731/testReport)**
 for PR 20396 at commit 
[`8a68f75`](https://github.com/apache/spark/commit/8a68f758a7a41f6c2a9a58f54a982745665be6a6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20409: [SPARK-23233][PYTHON] Reset the cache in asNondet...

2018-01-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20409


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20409: [SPARK-23233][PYTHON] Reset the cache in asNondeterminis...

2018-01-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20409
  
Thanks! Merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20409: [SPARK-23233][PYTHON] Reset the cache in asNondeterminis...

2018-01-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20409
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20409: [SPARK-23233][PYTHON] Reset the cache in asNondeterminis...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20409
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86729/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20409: [SPARK-23233][PYTHON] Reset the cache in asNondeterminis...

2018-01-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20409
  
**[Test build #86729 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86729/testReport)**
 for PR 20409 at commit 
[`b23ff02`](https://github.com/apache/spark/commit/b23ff02f543ecc92db574b808ea00f9ff7d236f8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20409: [SPARK-23233][PYTHON] Reset the cache in asNondeterminis...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20409
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20393: [SPARK-23207][SQL] Shuffle+Repartition on a DataFrame co...

2018-01-27 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/20393
  
@sameeragarwal I am not sure if we can make shuffle fetch deterministic - 
without quite a lot of perf overhead; do you have any thoughts on how to do 
this in case I am missing something here ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20414: [SPARK-23243][SQL] Shuffle+Repartition on an RDD could l...

2018-01-27 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/20414
  
In addition, any use of random in spark code will get affected by this - 
unless input is an idempotent source; even if random initialization is done 
predictably with the partition index (which we were doing here anyway).
We might want to look at mllib and other places as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20383: [SPARK-23200] Reset Kubernetes-specific config on Checkp...

2018-01-27 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20383
  
So you have tested this on latest Spark 2.3.0 bit?

Test aside, do people think it is useful to include this fix in the 2.3.0 
release?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20416: [SPARK-23248][PYTHON][EXAMPLES] Relocate module docstrin...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20416
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20416: [SPARK-23248][PYTHON][EXAMPLES] Relocate module docstrin...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20416
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86727/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >