[GitHub] spark issue #22898: [SPARK-25746][SQL][followup] do not add unnecessary If e...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22898
  
**[Test build #98292 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98292/testReport)**
 for PR 22898 at commit 
[`8266443`](https://github.com/apache/spark/commit/82664439318b72d8446230515abb882b89767bb9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22898: [SPARK-25746][SQL][followup] do not add unnecessary If e...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22898
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22898: [SPARK-25746][SQL][followup] do not add unnecessary If e...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22898
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4647/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22898: [SPARK-25746][SQL][followup] do not add unnecessary If e...

2018-10-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22898
  
cc @viirya 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22898: [SPARK-25746][SQL][followup] do not add unnecessa...

2018-10-30 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/22898

[SPARK-25746][SQL][followup] do not add unnecessary If expression

## What changes were proposed in this pull request?

a followup of https://github.com/apache/spark/pull/22749.

When we construct the new serializer in `ExpressionEncoder.tuple`, we don't 
need to add `if(isnull ...)` check for each field. They are either simple 
expressions that can propagate null correctly(e.g. 
`GetStructField(GetColumnByOrdinal(0, schema), index)`), or complex expression 
that already have the isnull check.

## How was this patch tested?

existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22898.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22898


commit 82664439318b72d8446230515abb882b89767bb9
Author: Wenchen Fan 
Date:   2018-10-31T05:44:44Z

do not add unnecessary If expression




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22895: [SPARK-25886][SQL][Minor] Improve error message of `Fail...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22895
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4646/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22895: [SPARK-25886][SQL][Minor] Improve error message of `Fail...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22895
  
**[Test build #98291 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98291/testReport)**
 for PR 22895 at commit 
[`24244cc`](https://github.com/apache/spark/commit/24244ccd09cde9146173cc967ce783fd80aecd37).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22895: [SPARK-25886][SQL][Minor] Improve error message of `Fail...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22895
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22847
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22847
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98288/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22847
  
**[Test build #98288 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98288/testReport)**
 for PR 22847 at commit 
[`ba392c8`](https://github.com/apache/spark/commit/ba392c89f6c96b3fa64d22a564ac1742b7d0ff54).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22895: [SPARK-25886][SQL][Minor] Improve error message o...

2018-10-30 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22895#discussion_r229564121
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala 
---
@@ -100,9 +100,14 @@ case class AvroDataToCatalyst(
   case NonFatal(e) => parseMode match {
 case PermissiveMode => nullResultRow
 case FailFastMode =>
-  throw new SparkException("Malformed records are detected in 
record parsing. " +
+  val msg = if (e.getMessage != null) {
+e.getMessage + "\n"
+  } else {
+""
+  }
+  throw new SparkException(msg + "Malformed records are detected 
in record parsing. " +
 s"Current parse Mode: ${FailFastMode.name}. To process 
malformed records as null " +
-"result, try setting the option 'mode' as 'PERMISSIVE'.", 
e.getCause)
+"result, try setting the option 'mode' as 'PERMISSIVE'.", e)
--- End diff --

Agree, When I post the output, I have the same thought..Haha.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22891: [SPARK-25881][pyspark] df.toPandas() convert deci...

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22891#discussion_r229564044
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -2064,6 +2064,7 @@ def toDF(self, *cols):
 @since(1.3)
 def toPandas(self):
 """
+:param coerce_float: default False, if Ture, will handle decimal 
type to np.float64 instand of type object.
--- End diff --

There's no such param. Also, we don't convert it to string. Also, Arrow 
optimization code path should also be fixed BTW.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22895: [SPARK-25886][SQL][Minor] Improve error message o...

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22895#discussion_r229563551
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala 
---
@@ -100,9 +100,14 @@ case class AvroDataToCatalyst(
   case NonFatal(e) => parseMode match {
 case PermissiveMode => nullResultRow
 case FailFastMode =>
-  throw new SparkException("Malformed records are detected in 
record parsing. " +
+  val msg = if (e.getMessage != null) {
+e.getMessage + "\n"
+  } else {
+""
+  }
+  throw new SparkException(msg + "Malformed records are detected 
in record parsing. " +
 s"Current parse Mode: ${FailFastMode.name}. To process 
malformed records as null " +
-"result, try setting the option 'mode' as 'PERMISSIVE'.", 
e.getCause)
+"result, try setting the option 'mode' as 'PERMISSIVE'.", e)
--- End diff --

How about we just remove `msg` change? If I am not mistaken, usually it's 
just added at the cause alone.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22895: [SPARK-25886][SQL][Minor] Improve error message o...

2018-10-30 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22895#discussion_r229563086
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala 
---
@@ -100,9 +100,14 @@ case class AvroDataToCatalyst(
   case NonFatal(e) => parseMode match {
 case PermissiveMode => nullResultRow
 case FailFastMode =>
-  throw new SparkException("Malformed records are detected in 
record parsing. " +
+  val msg = if (e.getMessage != null) {
+e.getMessage + "\n"
+  } else {
+""
+  }
+  throw new SparkException(msg + "Malformed records are detected 
in record parsing. " +
 s"Current parse Mode: ${FailFastMode.name}. To process 
malformed records as null " +
-"result, try setting the option 'mode' as 'PERMISSIVE'.", 
e.getCause)
+"result, try setting the option 'mode' as 'PERMISSIVE'.", e)
--- End diff --

I see. It would be like
```
org.apache.spark.SparkException: 
com.fasterxml.jackson.core.JsonParseException: Unexpected character ('1' (code 
49)): was expecting a colon to separate field name and value
 at [Source: (InputStreamReader); line: 1, column: 7]
Malformed records are detected in record parsing. Parse Mode: FAILFAST. To 
process malformed records as null result, try setting the option 'mode' as 
'PERMISSIVE'.
at 
org.apache.spark.sql.catalyst.util.FailureSafeParser.parse(FailureSafeParser.scala:80)
at 
org.apache.spark.sql.catalyst.expressions.JsonToStructs.nullSafeEval(jsonExpressions.scala:594)
...
Caused by: org.apache.spark.sql.catalyst.util.BadRecordException: 
com.fasterxml.jackson.core.JsonParseException: Unexpected character ('1' (code 
49)): was expecting a colon to separate field name and value
 at [Source: (InputStreamReader); line: 1, column: 7]
...
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22891: [SPARK-25881][pyspark] df.toPandas() convert decimal to ...

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22891
  
I don't think we should map decimals to float. It will loses precisions and 
it's a breaking change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22895: [SPARK-25886][SQL][Minor] Improve error message o...

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22895#discussion_r229562284
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala 
---
@@ -100,9 +100,14 @@ case class AvroDataToCatalyst(
   case NonFatal(e) => parseMode match {
 case PermissiveMode => nullResultRow
 case FailFastMode =>
-  throw new SparkException("Malformed records are detected in 
record parsing. " +
+  val msg = if (e.getMessage != null) {
+e.getMessage + "\n"
+  } else {
+""
+  }
+  throw new SparkException(msg + "Malformed records are detected 
in record parsing. " +
 s"Current parse Mode: ${FailFastMode.name}. To process 
malformed records as null " +
-"result, try setting the option 'mode' as 'PERMISSIVE'.", 
e.getCause)
+"result, try setting the option 'mode' as 'PERMISSIVE'.", e)
--- End diff --

Adding causes looks fine. I was actually wondering adding `msg` is 
necessary.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22896: [SPARKR]found some extra whitespace in the R tests

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22896
  
@shaneknapp, I already merged this into master branch 
(https://github.com/apache/spark/commit/243ce319a06f20365d5b08d479642d75748645d9)
 :-). AppVeyor tests were passed 
(https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/19932574).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22870: [SPARK-25862][SQL] Remove rangeBetween APIs intro...

2018-10-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22870


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...

2018-10-30 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22883
  
Seems reasonable. Ping @vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22870: [SPARK-25862][SQL] Remove rangeBetween APIs introduced i...

2018-10-30 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22870
  
LGTM Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21860
  
**[Test build #98290 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98290/testReport)**
 for PR 21860 at commit 
[`5deb39b`](https://github.com/apache/spark/commit/5deb39b838f8156eae5e9cdc7371c6129e66ac6b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-10-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21860
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #98289 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98289/testReport)**
 for PR 22482 at commit 
[`c03c946`](https://github.com/apache/spark/commit/c03c946fa0ef9baedf4c5ab9575fff46402b9fba).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22896: [SPARKR]found some extra whitespace in the R tests

2018-10-30 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/22896
  
actually can we merge this?  it's causing spurious lintr errors.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22895: [SPARK-25886][SQL][Minor] Improve error message o...

2018-10-30 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22895#discussion_r229553580
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala 
---
@@ -100,9 +100,14 @@ case class AvroDataToCatalyst(
   case NonFatal(e) => parseMode match {
 case PermissiveMode => nullResultRow
 case FailFastMode =>
-  throw new SparkException("Malformed records are detected in 
record parsing. " +
+  val msg = if (e.getMessage != null) {
+e.getMessage + "\n"
+  } else {
+""
+  }
--- End diff --

Yes, but as there is such code `e.getMessage + "\n`, personally I prefer 
not to inline it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22895: [SPARK-25886][SQL][Minor] Improve error message o...

2018-10-30 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22895#discussion_r229553381
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala 
---
@@ -100,9 +100,14 @@ case class AvroDataToCatalyst(
   case NonFatal(e) => parseMode match {
 case PermissiveMode => nullResultRow
 case FailFastMode =>
-  throw new SparkException("Malformed records are detected in 
record parsing. " +
+  val msg = if (e.getMessage != null) {
+e.getMessage + "\n"
+  } else {
+""
+  }
+  throw new SparkException(msg + "Malformed records are detected 
in record parsing. " +
 s"Current parse Mode: ${FailFastMode.name}. To process 
malformed records as null " +
-"result, try setting the option 'mode' as 'PERMISSIVE'.", 
e.getCause)
+"result, try setting the option 'mode' as 'PERMISSIVE'.", e)
--- End diff --

After the change, the backtrace will contain:
```
Caused by: org.apache.spark.sql.catalyst.util.BadRecordException: 
com.fasterxml.jackson.core.JsonParseException: Unexpected character ('1' (code 
49)): was expecting a colon to separate field name and value
 at [Source: (InputStreamReader); line: 1, column: 7]
at 
org.apache.spark.sql.catalyst.json.JacksonParser.parse(JacksonParser.scala:414)
at 
org.apache.spark.sql.catalyst.expressions.JsonToStructs$$anonfun$parser$1.apply(jsonExpressions.scala:581)
at 
org.apache.spark.sql.catalyst.expressions.JsonToStructs$$anonfun$parser$1.apply(jsonExpressions.scala:581)
at 
org.apache.spark.sql.catalyst.util.FailureSafeParser.parse(FailureSafeParser.scala:66)
... 20 more
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22275
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22275
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98285/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22275
  
**[Test build #98285 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98285/testReport)**
 for PR 22275 at commit 
[`6457e42`](https://github.com/apache/spark/commit/6457e420e3b8366c1373e7adb0bf56df03b9cc19).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use ...

2018-10-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22844
  
Thank you, @heary-cao , @yucai , @wangyum , and @HyukjinKwon .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22864: [SPARK-25861][Minor][WEBUI] Remove unused refreshInterva...

2018-10-30 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22864
  
Hi @gengliangwang , Yes the parameter is not used from the beginning. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21860
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98287/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22896: [SPARKR]found some extra whitespace in the R tests

2018-10-30 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/22896
  
thanks @HyukjinKwon !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21860
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21860
  
**[Test build #98287 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98287/testReport)**
 for PR 21860 at commit 
[`8f5eb2b`](https://github.com/apache/spark/commit/8f5eb2b5da332ab3ba2228155cf0c48534119315).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22275
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98284/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22275
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22275
  
**[Test build #98284 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98284/testReport)**
 for PR 22275 at commit 
[`7d19977`](https://github.com/apache/spark/commit/7d1997738c8c76f15068d306c858d6127733db7d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22896: [SPARKR]found some extra whitespace in the R test...

2018-10-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22896


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22896: [SPARKR]found some extra whitespace in the R tests

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22896
  
@shaneknapp, I just got this in since that's orthogonal to this PR :-) .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22896: [SPARKR]found some extra whitespace in the R tests

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22896
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22844


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use ...

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22844
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22879: [SPARK-25872][SQL][TEST] Add an optimizer tracker for TP...

2018-10-30 Thread yucai
Github user yucai commented on the issue:

https://github.com/apache/spark/pull/22879
  
`tpcdsQueries` and `tpcdsQueriesV2_7` are duplicated to 
`TPCDSQueryBenchmark`'s, should we maintain them together?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22608
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98281/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22608
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22608
  
**[Test build #98281 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98281/testReport)**
 for PR 22608 at commit 
[`0de8c87`](https://github.com/apache/spark/commit/0de8c87f971d9dccb681678faffe92231a1f0c38).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait UnevaluableAggregate extends DeclarativeAggregate `
  * `case class Average(child: Expression) extends DeclarativeAggregate 
with ImplicitCastInputTypes `
  * `case class Count(children: Seq[Expression]) extends 
DeclarativeAggregate `
  * `abstract class UnevaluableBooleanAggBase(arg: Expression)`
  * `case class EveryAgg(arg: Expression) extends 
UnevaluableBooleanAggBase(arg) `
  * `case class AnyAgg(arg: Expression) extends 
UnevaluableBooleanAggBase(arg) `
  * `case class SomeAgg(arg: Expression) extends 
UnevaluableBooleanAggBase(arg) `
  * `case class UnresolvedCatalystToExternalMap(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-10-30 Thread yucai
Github user yucai commented on the issue:

https://github.com/apache/spark/pull/22847
  
@cloud-fan @dongjoon-hyun @kiszk I just add a negative check, maybe we need 
another PR to figure better value later if it is not easy to decide now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22847
  
**[Test build #98288 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98288/testReport)**
 for PR 22847 at commit 
[`ba392c8`](https://github.com/apache/spark/commit/ba392c89f6c96b3fa64d22a564ac1742b7d0ff54).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22713: [SPARK-25691][SQL] Use semantic equality in Alias...

2018-10-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22713


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22895: [SPARK-25886][SQL][Minor] Improve error message o...

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22895#discussion_r229539289
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala 
---
@@ -100,9 +100,14 @@ case class AvroDataToCatalyst(
   case NonFatal(e) => parseMode match {
 case PermissiveMode => nullResultRow
 case FailFastMode =>
-  throw new SparkException("Malformed records are detected in 
record parsing. " +
+  val msg = if (e.getMessage != null) {
+e.getMessage + "\n"
+  } else {
+""
+  }
--- End diff --

I think it's okay to make this if-else inlined


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22883: [SPARK-25837] [Core] Fix potential slowdown in Ap...

2018-10-30 Thread patrickbrownsync
Github user patrickbrownsync commented on a diff in the pull request:

https://github.com/apache/spark/pull/22883#discussion_r229539016
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
@@ -1105,6 +1095,15 @@ private[spark] class AppStatusListener(
 
   cleanupCachedQuantiles(key)
 }
+
+// Delete tasks for all stages in one pass, as deleting them for each 
stage individually is slow
--- End diff --

Sure

Take a look at the implementation of InMemoryView at 
spark/common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java
 line 179

specifically the implementation of iterator on line 193, here is an excerpt:

```
Collections.sort(sorted, (e1, e2) -> modifier * compare(e1, e2, getter));
Stream stream = sorted.stream();

if (first != null) {
stream = stream.filter(e -> modifier * compare(e, getter, first) >= 0);
}

if (last != null) {
stream = stream.filter(e -> modifier * compare(e, getter, last) <= 0);
}
```

and the original, in loop deletion code:

```
val tasks = kvstore.view(classOf[TaskDataWrapper])  
.index("stage") 
.first(key) 
.last(key)  
.asScala
  
tasks.foreach { t =>
kvstore.delete(t.getClass(), t.taskId)  
}
```

So you can see, if we do this each loop we actually sort the whole 
collection of TaskDataWrapper which are currently in the store, then go through 
and check each item based on the key set (the stage). Assuming we have a large 
number of stages and tasks this is an O(n^2) operation, which is what happens 
on my production application and the repro code.

If we do this in one pass for all stages, we only sort and iterate the list 
of tasks one time.

This same pattern happens fairly frequently using the KVStoreView interface 
and InMemoryView implementation. Since I am new to contributing to Spark I did 
not undertake a massive refactor, but I would suggest that this interface and 
implementation should be looked at and re-designed with efficiency in mind. The 
current implementation favors flexibility in terms of how the dataset is sorted 
and filtered, but enforcing a single sort order via something like a SortedSet 
would hopefully make it clear when the operation being performed was 
efficiently searching inside the collection, and when you were using an 
inefficient access pattern.

I hope that explains the reasoning, if you have any more questions let me 
know.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22895: [SPARK-25886][SQL][Minor] Improve error message o...

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22895#discussion_r229539055
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala 
---
@@ -100,9 +100,14 @@ case class AvroDataToCatalyst(
   case NonFatal(e) => parseMode match {
 case PermissiveMode => nullResultRow
 case FailFastMode =>
-  throw new SparkException("Malformed records are detected in 
record parsing. " +
+  val msg = if (e.getMessage != null) {
+e.getMessage + "\n"
+  } else {
+""
+  }
+  throw new SparkException(msg + "Malformed records are detected 
in record parsing. " +
 s"Current parse Mode: ${FailFastMode.name}. To process 
malformed records as null " +
-"result, try setting the option 'mode' as 'PERMISSIVE'.", 
e.getCause)
+"result, try setting the option 'mode' as 'PERMISSIVE'.", e)
--- End diff --

@gengliangwang, looks okay. How does the error message look like 
before/after btw? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

2018-10-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22713
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r229538148
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We cannot know 
how many bytecode will " +
+  "be generated, so use the code length as metric. When running on 
HotSpot, a function's " +
+  "bytecode should not go beyond 8KB, otherwise it will not be JITted; 
it also should not " +
+  "be too small, otherwise there will be many function calls.")
+.intConf
+.createWithDefault(1024)
--- End diff --

let's add a check value to make sure the value is positive. We can figure 
out a lower and upper bound later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22788: [SPARK-25769][SQL]escape nested columns by backti...

2018-10-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22788#discussion_r229538040
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/columnresolution-negative.sql.out 
---
@@ -161,7 +161,7 @@ SELECT db1.t1.i1 FROM t1, mydb2.t1
 struct<>
 -- !query 18 output
 org.apache.spark.sql.AnalysisException
-cannot resolve '`db1.t1.i1`' given input columns: [mydb2.t1.i1, 
mydb2.t1.i1]; line 1 pos 7
+cannot resolve '`db1`.`t1`.`i1`' given input columns: [mydb2.t1.i1, 
mydb2.t1.i1]; line 1 pos 7
--- End diff --

LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

2018-10-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22857
  
LGTM except the end-to-end test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22857#discussion_r229537395
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -2585,4 +2585,45 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 
 checkAnswer(swappedDf.filter($"key"($"map") > "a"), Row(2, Map(2 -> 
"b")))
   }
+
+  test("SPARK-25860: Replace Literal(null, _) with FalseLiteral whenever 
possible") {
+
+def checkPlanIsEmptyLocalScan(df: DataFrame): Unit = 
df.queryExecution.executedPlan match {
+  case s: LocalTableScanExec => assert(s.rows.isEmpty)
+  case p => fail(s"$p is not LocalTableScanExec")
+}
+
+val df1 = Seq((1, true), (2, false)).toDF("l", "b")
+val df2 = Seq(2, 3).toDF("l")
+
+val q1 = df1.where("IF(l > 10, false, b AND null)")
+checkAnswer(q1, Seq.empty)
+checkPlanIsEmptyLocalScan(q1)
+
+val q2 = df1.where("CASE WHEN l < 10 THEN null WHEN l > 40 THEN false 
ELSE null END")
+checkAnswer(q2, Seq.empty)
+checkPlanIsEmptyLocalScan(q2)
+
+val q3 = df1.join(df2, when(df1("l") > df2("l"), 
lit(null)).otherwise(df1("b") && lit(null)))
+checkAnswer(q3, Seq.empty)
+checkPlanIsEmptyLocalScan(q3)
+
+val q4 = df1.where("IF(IF(b, null, false), true, null)")
+checkAnswer(q4, Seq.empty)
+checkPlanIsEmptyLocalScan(q4)
+
+val q5 = df1.selectExpr("IF(l > 1 AND null, 5, 1) AS out")
+checkAnswer(q5, Row(1) :: Row(1) :: Nil)
+q5.queryExecution.executedPlan.foreach { p =>
+  assert(p.expressions.forall(e => e.find(_.isInstanceOf[If]).isEmpty))
--- End diff --

This test can pass without the optimization. The `ConvertToLocalRelation` 
rule will eliminate the `Project`.

Can we use a table as input data? e.g.
```
withTable("t1", "t2") {
  Seq((1, true), (2, false)).toDF("l", "b").write.saveAsTable("t1")
  Seq(2, 3).toDF("l").write.saveAsTable("t2")
  val df1 = spark.table("t1")
  val df2 = spark.table("t2")
  ...
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22864: [SPARK-25861][Minor][WEBUI] Remove unused refreshInterva...

2018-10-30 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22864
  
Looks like the parameter is not used from the first day?

https://github.com/apache/spark/pull/290/files#diff-b8adb646ef90f616c34eb5c98d1ebd16R140


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22857#discussion_r229537117
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -2585,4 +2585,45 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 
 checkAnswer(swappedDf.filter($"key"($"map") > "a"), Row(2, Map(2 -> 
"b")))
   }
+
+  test("SPARK-25860: Replace Literal(null, _) with FalseLiteral whenever 
possible") {
--- End diff --

it's weird to put optimizer end-to-end test in `DataFrameSuite`. Can we 
create a `ReplaceNullWithFalseEndToEndSuite`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22883: [SPARK-25837] [Core] Fix potential slowdown in Ap...

2018-10-30 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22883#discussion_r229535706
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
@@ -1105,6 +1095,15 @@ private[spark] class AppStatusListener(
 
   cleanupCachedQuantiles(key)
 }
+
+// Delete tasks for all stages in one pass, as deleting them for each 
stage individually is slow
--- End diff --

Hi @patrickbrownsync , can you explain more about why deleting them in the 
loop above is slower?  I don't quite understand it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21860
  
**[Test build #98287 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98287/testReport)**
 for PR 21860 at commit 
[`8f5eb2b`](https://github.com/apache/spark/commit/8f5eb2b5da332ab3ba2228155cf0c48534119315).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22897
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4645/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22897
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22897
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4645/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-10-30 Thread gjhkael
Github user gjhkael commented on the issue:

https://github.com/apache/spark/pull/22887
  
> can you explain more about why you make the change?
   Some hadoop configuration set it in spark-default.conf, we want it to be 
global, but in some cases, user need to override the configuration but cannot 
works, for the sparkContext's conf fill the hadoopConf again finally before 
broadcast this hadoop conf. 
> Did you try `spark.SessionState.newHadoopConf()`?
   We have this problem is in Spark-sql, not use the datafame api 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22897
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4645/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22881: [SPARK-25855][CORE] Don't use erasure coding for event l...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22881
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22881: [SPARK-25855][CORE] Don't use erasure coding for event l...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22881
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98280/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22897
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98286/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22897
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22897
  
**[Test build #98286 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98286/testReport)**
 for PR 22897 at commit 
[`91a4741`](https://github.com/apache/spark/commit/91a47418d7ee8c4d564a79fae6ffa3bc384b09cc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22881: [SPARK-25855][CORE] Don't use erasure coding for event l...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22881
  
**[Test build #98280 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98280/testReport)**
 for PR 22881 at commit 
[`0a54c7d`](https://github.com/apache/spark/commit/0a54c7d0ad4f088f7b8adef6cfa6fa773fecd463).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Average(child: Expression) extends DeclarativeAggregate 
with ImplicitCastInputTypes `
  * `case class Count(children: Seq[Expression]) extends 
DeclarativeAggregate `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-10-30 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22887
  
Hi @gjhkael ,
can you explain more about why you make the change?
Did you try `spark.SessionState.newHadoopConf()`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22880#discussion_r229530694
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
 ---
@@ -49,34 +49,82 @@ import org.apache.spark.sql.types._
  * Due to this reason, we no longer rely on [[ReadContext]] to pass 
requested schema from [[init()]]
  * to [[prepareForRead()]], but use a private `var` for simplicity.
  */
-private[parquet] class ParquetReadSupport(val convertTz: Option[TimeZone])
+private[parquet] class ParquetReadSupport(val convertTz: Option[TimeZone],
+  usingVectorizedReader: Boolean)
 extends ReadSupport[UnsafeRow] with Logging {
   private var catalystRequestedSchema: StructType = _
 
   def this() {
 // We need a zero-arg constructor for SpecificParquetRecordReaderBase. 
 But that is only
 // used in the vectorized reader, where we get the convertTz value 
directly, and the value here
 // is ignored.
-this(None)
+this(None, usingVectorizedReader = true)
   }
 
   /**
* Called on executor side before [[prepareForRead()]] and instantiating 
actual Parquet record
* readers.  Responsible for figuring out Parquet requested schema used 
for column pruning.
*/
   override def init(context: InitContext): ReadContext = {
+val conf = context.getConfiguration
 catalystRequestedSchema = {
-  val conf = context.getConfiguration
   val schemaString = 
conf.get(ParquetReadSupport.SPARK_ROW_REQUESTED_SCHEMA)
   assert(schemaString != null, "Parquet requested schema not set.")
   StructType.fromString(schemaString)
 }
 
-val caseSensitive = 
context.getConfiguration.getBoolean(SQLConf.CASE_SENSITIVE.key,
+val schemaPruningEnabled = 
conf.getBoolean(SQLConf.NESTED_SCHEMA_PRUNING_ENABLED.key,
+  SQLConf.NESTED_SCHEMA_PRUNING_ENABLED.defaultValue.get)
+val caseSensitive = conf.getBoolean(SQLConf.CASE_SENSITIVE.key,
   SQLConf.CASE_SENSITIVE.defaultValue.get)
-val parquetRequestedSchema = ParquetReadSupport.clipParquetSchema(
-  context.getFileSchema, catalystRequestedSchema, caseSensitive)
-
+val parquetFileSchema = context.getFileSchema
+val parquetClippedSchema = 
ParquetReadSupport.clipParquetSchema(parquetFileSchema,
+  catalystRequestedSchema, caseSensitive)
+
+// As part of schema clipping, we add fields in 
catalystRequestedSchema which are missing
+// from parquetFileSchema to parquetClippedSchema. However, nested 
schema pruning requires
+// we ignore unrequested field data when reading from a Parquet file. 
Therefore we pass two
+// schema to ParquetRecordMaterializer: the schema of the file data we 
want to read
+// (parquetRequestedSchema), and the schema of the rows we want to 
return
+// (catalystRequestedSchema). The reader is responsible for 
reconciling the differences between
+// the two.
+//
+// Aside from checking whether schema pruning is enabled 
(schemaPruningEnabled), there
+// is an additional complication to constructing 
parquetRequestedSchema. The manner in which
+// Spark's two Parquet readers reconcile the differences between 
parquetRequestedSchema and
+// catalystRequestedSchema differ. Spark's vectorized reader does not 
(currently) support
+// reading Parquet files with complex types in their schema. Further, 
it assumes that
+// parquetRequestedSchema includes all fields requested in 
catalystRequestedSchema. It includes
+// logic in its read path to skip fields in parquetRequestedSchema 
which are not present in the
+// file.
+//
+// Spark's parquet-mr based reader supports reading Parquet files of 
any kind of complex
+// schema, and it supports nested schema pruning as well. Unlike the 
vectorized reader, the
+// parquet-mr reader requires that parquetRequestedSchema include only 
those fields present in
+// the underlying parquetFileSchema. Therefore, in the case where we 
use the parquet-mr reader
+// we intersect the parquetClippedSchema with the parquetFileSchema to 
construct the
+// parquetRequestedSchema set in the ReadContext.
--- End diff --

Ok. I see.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22880#discussion_r229530287
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
 ---
@@ -93,13 +141,14 @@ private[parquet] class ParquetReadSupport(val 
convertTz: Option[TimeZone])
 log.debug(s"Preparing for read Parquet file with message type: 
$fileSchema")
 val parquetRequestedSchema = readContext.getRequestedSchema
 
-logInfo {
-  s"""Going to read the following fields from the Parquet file:
- |
- |Parquet form:
+log.info {
+  s"""Going to read the following fields from the Parquet file with 
the following schema:
+ |Parquet file schema:
+ |$fileSchema
+ |Parquet read schema:
--- End diff --

I think it is useful for debugging this patch, but may not useful for end 
users and will increase log size. Make it as debug level sounds good to me. But 
let's wait for others opinions too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22897
  
**[Test build #98286 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98286/testReport)**
 for PR 22897 at commit 
[`91a4741`](https://github.com/apache/spark/commit/91a47418d7ee8c4d564a79fae6ffa3bc384b09cc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22880#discussion_r229530592
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
 ---
@@ -202,11 +204,15 @@ private[parquet] class ParquetRowConverter(
 
   override def start(): Unit = {
 var i = 0
-while (i < currentRow.numFields) {
+while (i < fieldConverters.length) {
   fieldConverters(i).updater.start()
   currentRow.setNullAt(i)
--- End diff --

Yea, I think it should be fine to do `setNullAt` at non-corresponding 
field, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22857#discussion_r229529772
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -736,3 +736,60 @@ object CombineConcats extends Rule[LogicalPlan] {
   flattenConcats(concat)
   }
 }
+
+/**
+ * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further 
optimizations.
+ *
+ * This rule applies to conditions in [[Filter]] and [[Join]]. Moreover, 
it transforms predicates
+ * in all [[If]] expressions as well as branch conditions in all 
[[CaseWhen]] expressions.
+ *
+ * For example, `Filter(Literal(null, _))` is equal to 
`Filter(FalseLiteral)`.
+ *
+ * Another example containing branches is `Filter(If(cond, FalseLiteral, 
Literal(null, _)))`;
+ * this can be optimized to `Filter(If(cond, FalseLiteral, 
FalseLiteral))`, and eventually
+ * `Filter(FalseLiteral)`.
+ *
+ * As this rule is not limited to conditions in [[Filter]] and [[Join]], 
arbitrary plans can
+ * benefit from it. For example, `Project(If(And(cond, Literal(null)), 
Literal(1), Literal(2)))`
+ * can be simplified into `Project(Literal(2))`.
+ *
+ * As a result, many unnecessary computations can be removed in the query 
optimization phase.
+ */
+object ReplaceNullWithFalse extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case f @ Filter(cond, _) => f.copy(condition = 
replaceNullWithFalse(cond))
+case j @ Join(_, _, _, Some(cond)) => j.copy(condition = 
Some(replaceNullWithFalse(cond)))
+case p: LogicalPlan => p transformExpressions {
+  case i @ If(pred, _, _) => i.copy(predicate = 
replaceNullWithFalse(pred))
+  case cw @ CaseWhen(branches, _) =>
+val newBranches = branches.map { case (cond, value) =>
+  replaceNullWithFalse(cond) -> value
+}
+cw.copy(branches = newBranches)
+}
+  }
+
+  /**
+   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
+   *
+   * Note that `transformExpressionsDown` can not be used here as we must 
stop as soon as we hit
+   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or 
`Literal(null, _)`.
+   */
+  private def replaceNullWithFalse(e: Expression): Expression = e match {
+case cw: CaseWhen if cw.dataType == BooleanType =>
+  val newBranches = cw.branches.map { case (cond, value) =>
+replaceNullWithFalse(cond) -> replaceNullWithFalse(value)
+  }
+  val newElseValue = cw.elseValue.map(replaceNullWithFalse)
+  CaseWhen(newBranches, newElseValue)
+case i @ If(pred, trueVal, falseVal) if i.dataType == BooleanType =>
+  If(replaceNullWithFalse(pred), replaceNullWithFalse(trueVal), 
replaceNullWithFalse(falseVal))
--- End diff --

ah, I see. `replaceNullWithFalse` should only work in boolean type cases. 
Then I think we are fine with it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22805: [SPARK-25809][K8S][TEST] New K8S integration testing bac...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22805
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98279/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22805: [SPARK-25809][K8S][TEST] New K8S integration testing bac...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22805
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22805: [SPARK-25809][K8S][TEST] New K8S integration testing bac...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22805
  
**[Test build #98279 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98279/testReport)**
 for PR 22805 at commit 
[`41af220`](https://github.com/apache/spark/commit/41af2204db8f85f4ebdad4538a26f24e269a45ee).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22857#discussion_r229528767
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -736,3 +736,60 @@ object CombineConcats extends Rule[LogicalPlan] {
   flattenConcats(concat)
   }
 }
+
+/**
+ * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further 
optimizations.
+ *
+ * This rule applies to conditions in [[Filter]] and [[Join]]. Moreover, 
it transforms predicates
+ * in all [[If]] expressions as well as branch conditions in all 
[[CaseWhen]] expressions.
+ *
+ * For example, `Filter(Literal(null, _))` is equal to 
`Filter(FalseLiteral)`.
+ *
+ * Another example containing branches is `Filter(If(cond, FalseLiteral, 
Literal(null, _)))`;
+ * this can be optimized to `Filter(If(cond, FalseLiteral, 
FalseLiteral))`, and eventually
+ * `Filter(FalseLiteral)`.
+ *
+ * As this rule is not limited to conditions in [[Filter]] and [[Join]], 
arbitrary plans can
+ * benefit from it. For example, `Project(If(And(cond, Literal(null)), 
Literal(1), Literal(2)))`
+ * can be simplified into `Project(Literal(2))`.
+ *
+ * As a result, many unnecessary computations can be removed in the query 
optimization phase.
+ */
+object ReplaceNullWithFalse extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case f @ Filter(cond, _) => f.copy(condition = 
replaceNullWithFalse(cond))
+case j @ Join(_, _, _, Some(cond)) => j.copy(condition = 
Some(replaceNullWithFalse(cond)))
+case p: LogicalPlan => p transformExpressions {
+  case i @ If(pred, _, _) => i.copy(predicate = 
replaceNullWithFalse(pred))
+  case cw @ CaseWhen(branches, _) =>
+val newBranches = branches.map { case (cond, value) =>
+  replaceNullWithFalse(cond) -> value
+}
+cw.copy(branches = newBranches)
+}
+  }
+
+  /**
+   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
+   *
+   * Note that `transformExpressionsDown` can not be used here as we must 
stop as soon as we hit
+   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or 
`Literal(null, _)`.
+   */
+  private def replaceNullWithFalse(e: Expression): Expression = e match {
+case cw: CaseWhen if cw.dataType == BooleanType =>
+  val newBranches = cw.branches.map { case (cond, value) =>
+replaceNullWithFalse(cond) -> replaceNullWithFalse(value)
+  }
+  val newElseValue = cw.elseValue.map(replaceNullWithFalse)
+  CaseWhen(newBranches, newElseValue)
+case i @ If(pred, trueVal, falseVal) if i.dataType == BooleanType =>
+  If(replaceNullWithFalse(pred), replaceNullWithFalse(trueVal), 
replaceNullWithFalse(falseVal))
--- End diff --

The general rule for `LogicalPlan` at `apply` looks at `predicate` of `If`, 
no matter its `dataType` is `BooleanType` or not.

But in `replaceNullWithFalse`, the rule for `If` only works if its 
`dataType` is `BooleanType`. `"replace null in predicates of If inside another 
If"` is a such case. The `If` inside another `If` is of `BooleanType`. If the 
inside `If` is not of `BooleanType`, this rule doesn't work. And I think it 
should be ok to replace the null with false when it is not boolean type.





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22275
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22275
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4644/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22275
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4643/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22275
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22760
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22760
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4642/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22760
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4642/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integr...

2018-10-30 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22608#discussion_r229523838
  
--- Diff: 
resource-managers/kubernetes/integration-tests/kerberos-yml/kerberos-set.yml ---
@@ -0,0 +1,49 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+apiVersion: apps/v1
+kind: StatefulSet
+metadata:
+  name: kerberos
+spec:
+  replicas: 1
+  selector:
+matchLabels:
+  name: hdfs-kerberos
+  kerberosService: kerberos
+  job: kerberostest
+  template:
+metadata:
+  annotations:
+pod.beta.kubernetes.io/hostname: kerberos
+  labels:
+name: hdfs-kerberos
+kerberosService: kerberos
+job: kerberostest
+spec:
+  containers:
+  - command: ["sh"]
+args: ["/start-kdc.sh"]
+name: kerberos
+imagePullPolicy: IfNotPresent
+volumeMounts:
+- mountPath: /var/keytabs
+  name: kerb-keytab
+  restartPolicy: Always
+  volumes:
+  - name: kerb-keytab
+persistentVolumeClaim:
--- End diff --

With a `StatefulSet`, you don't need to explicitly manage PVCs. You can use 
`.spec.persistentVolumeClaimTemplate`. The StatefulSet controller automatically 
creates the PV (or binds to the existing one it created before).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integr...

2018-10-30 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22608#discussion_r229523966
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/kerberos/KerberosPVWatcherCache.scala
 ---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.k8s.integrationtest.kerberos
+
+import io.fabric8.kubernetes.api.model.{PersistentVolume, 
PersistentVolumeClaim}
+import io.fabric8.kubernetes.client.{KubernetesClientException, Watch, 
Watcher}
+import io.fabric8.kubernetes.client.Watcher.Action
+import org.scalatest.Matchers
+import org.scalatest.concurrent.Eventually
+import scala.collection.JavaConverters._
+
+import 
org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.{INTERVAL, TIMEOUT}
+import org.apache.spark.internal.Logging
+
+/**
+ * This class is responsible for ensuring that the persistent volume 
claims are bounded
+ * to the correct persistent volume and that they are both created before 
launching the
--- End diff --

With `StatefulSet`s, you probably don't need this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22275
  
**[Test build #98285 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98285/testReport)**
 for PR 22275 at commit 
[`6457e42`](https://github.com/apache/spark/commit/6457e420e3b8366c1373e7adb0bf56df03b9cc19).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...

2018-10-30 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/22275#discussion_r229522939
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4923,6 +4923,28 @@ def test_timestamp_dst(self):
 self.assertPandasEqual(pdf, df_from_python.toPandas())
 self.assertPandasEqual(pdf, df_from_pandas.toPandas())
 
+def test_toPandas_batch_order(self):
+
+# Collects Arrow RecordBatches out of order in driver JVM then 
re-orders in Python
+def run_test(num_records, num_parts, max_records):
+df = self.spark.range(num_records, 
numPartitions=num_parts).toDF("a")
+with 
self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": max_records}):
+pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
+self.assertPandasEqual(pdf, pdf_arrow)
+
+cases = [
+(1024, 512, 2),  # Try large num partitions for good chance of 
not collecting in order
+(512, 64, 2),# Try medium num partitions to test out of 
order collection
+(64, 8, 2),  # Try small number of partitions to test out 
of order collection
+(64, 64, 1), # Test single batch per partition
+(64, 1, 64), # Test single partition, single batch
+(64, 1, 8),  # Test single partition, multiple batches
+(30, 7, 2),  # Test different sized partitions
+]
--- End diff --

@holdenk and @felixcheung , I didn't do a loop but chose some different 
levels of partition numbers to be a bit more sure that partitions won't end up 
in order. I also added some other cases of different partition/batch ratios. 
Let me know if you think we need more to be sure here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22760
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4642/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

2018-10-30 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/22275
  
Apologies for the delay in circling back to this. I reorganized a little to 
simplify and expanded the comments to hopefully better describe the code.

A quick summary of the changes: I changed the ArrowStreamSerializer to not 
have any state  - that seemed to complicate things. So instead of saving the 
batch order indices, they are loaded on the last iteration of `load_stream`, 
and this was put in a special serializer `ArrowCollectSerializer` so that it is 
clear where it is used.  I also consolidated all the batch ordering calls 
within `_collectAsArrow` so it is easier to follow the whole process.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22760
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22760
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98283/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...

2018-10-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22760
  
**[Test build #98283 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98283/testReport)**
 for PR 22760 at commit 
[`cbbc557`](https://github.com/apache/spark/commit/cbbc557b1bf6b3f36ac805ccf3a2d5a4642ec1a8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >