[GitHub] spark issue #22898: [SPARK-25746][SQL][followup] do not add unnecessary If e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22898 **[Test build #98292 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98292/testReport)** for PR 22898 at commit [`8266443`](https://github.com/apache/spark/commit/82664439318b72d8446230515abb882b89767bb9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22898: [SPARK-25746][SQL][followup] do not add unnecessary If e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22898 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22898: [SPARK-25746][SQL][followup] do not add unnecessary If e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22898 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4647/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22898: [SPARK-25746][SQL][followup] do not add unnecessary If e...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22898 cc @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22898: [SPARK-25746][SQL][followup] do not add unnecessa...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/22898 [SPARK-25746][SQL][followup] do not add unnecessary If expression ## What changes were proposed in this pull request? a followup of https://github.com/apache/spark/pull/22749. When we construct the new serializer in `ExpressionEncoder.tuple`, we don't need to add `if(isnull ...)` check for each field. They are either simple expressions that can propagate null correctly(e.g. `GetStructField(GetColumnByOrdinal(0, schema), index)`), or complex expression that already have the isnull check. ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark minor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22898.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22898 commit 82664439318b72d8446230515abb882b89767bb9 Author: Wenchen Fan Date: 2018-10-31T05:44:44Z do not add unnecessary If expression --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22895: [SPARK-25886][SQL][Minor] Improve error message of `Fail...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22895 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4646/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22895: [SPARK-25886][SQL][Minor] Improve error message of `Fail...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22895 **[Test build #98291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98291/testReport)** for PR 22895 at commit [`24244cc`](https://github.com/apache/spark/commit/24244ccd09cde9146173cc967ce783fd80aecd37). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22895: [SPARK-25886][SQL][Minor] Improve error message of `Fail...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22895 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22847 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22847 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98288/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22847 **[Test build #98288 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98288/testReport)** for PR 22847 at commit [`ba392c8`](https://github.com/apache/spark/commit/ba392c89f6c96b3fa64d22a564ac1742b7d0ff54). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22895: [SPARK-25886][SQL][Minor] Improve error message o...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22895#discussion_r229564121 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala --- @@ -100,9 +100,14 @@ case class AvroDataToCatalyst( case NonFatal(e) => parseMode match { case PermissiveMode => nullResultRow case FailFastMode => - throw new SparkException("Malformed records are detected in record parsing. " + + val msg = if (e.getMessage != null) { +e.getMessage + "\n" + } else { +"" + } + throw new SparkException(msg + "Malformed records are detected in record parsing. " + s"Current parse Mode: ${FailFastMode.name}. To process malformed records as null " + -"result, try setting the option 'mode' as 'PERMISSIVE'.", e.getCause) +"result, try setting the option 'mode' as 'PERMISSIVE'.", e) --- End diff -- Agree, When I post the output, I have the same thought..Haha. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22891: [SPARK-25881][pyspark] df.toPandas() convert deci...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22891#discussion_r229564044 --- Diff: python/pyspark/sql/dataframe.py --- @@ -2064,6 +2064,7 @@ def toDF(self, *cols): @since(1.3) def toPandas(self): """ +:param coerce_float: default False, if Ture, will handle decimal type to np.float64 instand of type object. --- End diff -- There's no such param. Also, we don't convert it to string. Also, Arrow optimization code path should also be fixed BTW. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22895: [SPARK-25886][SQL][Minor] Improve error message o...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22895#discussion_r229563551 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala --- @@ -100,9 +100,14 @@ case class AvroDataToCatalyst( case NonFatal(e) => parseMode match { case PermissiveMode => nullResultRow case FailFastMode => - throw new SparkException("Malformed records are detected in record parsing. " + + val msg = if (e.getMessage != null) { +e.getMessage + "\n" + } else { +"" + } + throw new SparkException(msg + "Malformed records are detected in record parsing. " + s"Current parse Mode: ${FailFastMode.name}. To process malformed records as null " + -"result, try setting the option 'mode' as 'PERMISSIVE'.", e.getCause) +"result, try setting the option 'mode' as 'PERMISSIVE'.", e) --- End diff -- How about we just remove `msg` change? If I am not mistaken, usually it's just added at the cause alone. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22895: [SPARK-25886][SQL][Minor] Improve error message o...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22895#discussion_r229563086 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala --- @@ -100,9 +100,14 @@ case class AvroDataToCatalyst( case NonFatal(e) => parseMode match { case PermissiveMode => nullResultRow case FailFastMode => - throw new SparkException("Malformed records are detected in record parsing. " + + val msg = if (e.getMessage != null) { +e.getMessage + "\n" + } else { +"" + } + throw new SparkException(msg + "Malformed records are detected in record parsing. " + s"Current parse Mode: ${FailFastMode.name}. To process malformed records as null " + -"result, try setting the option 'mode' as 'PERMISSIVE'.", e.getCause) +"result, try setting the option 'mode' as 'PERMISSIVE'.", e) --- End diff -- I see. It would be like ``` org.apache.spark.SparkException: com.fasterxml.jackson.core.JsonParseException: Unexpected character ('1' (code 49)): was expecting a colon to separate field name and value at [Source: (InputStreamReader); line: 1, column: 7] Malformed records are detected in record parsing. Parse Mode: FAILFAST. To process malformed records as null result, try setting the option 'mode' as 'PERMISSIVE'. at org.apache.spark.sql.catalyst.util.FailureSafeParser.parse(FailureSafeParser.scala:80) at org.apache.spark.sql.catalyst.expressions.JsonToStructs.nullSafeEval(jsonExpressions.scala:594) ... Caused by: org.apache.spark.sql.catalyst.util.BadRecordException: com.fasterxml.jackson.core.JsonParseException: Unexpected character ('1' (code 49)): was expecting a colon to separate field name and value at [Source: (InputStreamReader); line: 1, column: 7] ... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22891: [SPARK-25881][pyspark] df.toPandas() convert decimal to ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22891 I don't think we should map decimals to float. It will loses precisions and it's a breaking change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22895: [SPARK-25886][SQL][Minor] Improve error message o...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22895#discussion_r229562284 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala --- @@ -100,9 +100,14 @@ case class AvroDataToCatalyst( case NonFatal(e) => parseMode match { case PermissiveMode => nullResultRow case FailFastMode => - throw new SparkException("Malformed records are detected in record parsing. " + + val msg = if (e.getMessage != null) { +e.getMessage + "\n" + } else { +"" + } + throw new SparkException(msg + "Malformed records are detected in record parsing. " + s"Current parse Mode: ${FailFastMode.name}. To process malformed records as null " + -"result, try setting the option 'mode' as 'PERMISSIVE'.", e.getCause) +"result, try setting the option 'mode' as 'PERMISSIVE'.", e) --- End diff -- Adding causes looks fine. I was actually wondering adding `msg` is necessary. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22896: [SPARKR]found some extra whitespace in the R tests
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22896 @shaneknapp, I already merged this into master branch (https://github.com/apache/spark/commit/243ce319a06f20365d5b08d479642d75748645d9) :-). AppVeyor tests were passed (https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/19932574). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22870: [SPARK-25862][SQL] Remove rangeBetween APIs intro...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22870 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22883: [SPARK-25837] [Core] Fix potential slowdown in AppStatus...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22883 Seems reasonable. Ping @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22870: [SPARK-25862][SQL] Remove rangeBetween APIs introduced i...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22870 LGTM Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21860 **[Test build #98290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98290/testReport)** for PR 21860 at commit [`5deb39b`](https://github.com/apache/spark/commit/5deb39b838f8156eae5e9cdc7371c6129e66ac6b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21860 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22482 **[Test build #98289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98289/testReport)** for PR 22482 at commit [`c03c946`](https://github.com/apache/spark/commit/c03c946fa0ef9baedf4c5ab9575fff46402b9fba). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22896: [SPARKR]found some extra whitespace in the R tests
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/22896 actually can we merge this? it's causing spurious lintr errors. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22895: [SPARK-25886][SQL][Minor] Improve error message o...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22895#discussion_r229553580 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala --- @@ -100,9 +100,14 @@ case class AvroDataToCatalyst( case NonFatal(e) => parseMode match { case PermissiveMode => nullResultRow case FailFastMode => - throw new SparkException("Malformed records are detected in record parsing. " + + val msg = if (e.getMessage != null) { +e.getMessage + "\n" + } else { +"" + } --- End diff -- Yes, but as there is such code `e.getMessage + "\n`, personally I prefer not to inline it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22895: [SPARK-25886][SQL][Minor] Improve error message o...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22895#discussion_r229553381 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala --- @@ -100,9 +100,14 @@ case class AvroDataToCatalyst( case NonFatal(e) => parseMode match { case PermissiveMode => nullResultRow case FailFastMode => - throw new SparkException("Malformed records are detected in record parsing. " + + val msg = if (e.getMessage != null) { +e.getMessage + "\n" + } else { +"" + } + throw new SparkException(msg + "Malformed records are detected in record parsing. " + s"Current parse Mode: ${FailFastMode.name}. To process malformed records as null " + -"result, try setting the option 'mode' as 'PERMISSIVE'.", e.getCause) +"result, try setting the option 'mode' as 'PERMISSIVE'.", e) --- End diff -- After the change, the backtrace will contain: ``` Caused by: org.apache.spark.sql.catalyst.util.BadRecordException: com.fasterxml.jackson.core.JsonParseException: Unexpected character ('1' (code 49)): was expecting a colon to separate field name and value at [Source: (InputStreamReader); line: 1, column: 7] at org.apache.spark.sql.catalyst.json.JacksonParser.parse(JacksonParser.scala:414) at org.apache.spark.sql.catalyst.expressions.JsonToStructs$$anonfun$parser$1.apply(jsonExpressions.scala:581) at org.apache.spark.sql.catalyst.expressions.JsonToStructs$$anonfun$parser$1.apply(jsonExpressions.scala:581) at org.apache.spark.sql.catalyst.util.FailureSafeParser.parse(FailureSafeParser.scala:66) ... 20 more ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98285/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #98285 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98285/testReport)** for PR 22275 at commit [`6457e42`](https://github.com/apache/spark/commit/6457e420e3b8366c1373e7adb0bf56df03b9cc19). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22844 Thank you, @heary-cao , @yucai , @wangyum , and @HyukjinKwon . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22864: [SPARK-25861][Minor][WEBUI] Remove unused refreshInterva...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/22864 Hi @gengliangwang , Yes the parameter is not used from the beginning. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21860 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98287/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22896: [SPARKR]found some extra whitespace in the R tests
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/22896 thanks @HyukjinKwon ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21860 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21860 **[Test build #98287 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98287/testReport)** for PR 21860 at commit [`8f5eb2b`](https://github.com/apache/spark/commit/8f5eb2b5da332ab3ba2228155cf0c48534119315). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98284/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #98284 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98284/testReport)** for PR 22275 at commit [`7d19977`](https://github.com/apache/spark/commit/7d1997738c8c76f15068d306c858d6127733db7d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22896: [SPARKR]found some extra whitespace in the R test...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22896 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22896: [SPARKR]found some extra whitespace in the R tests
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22896 @shaneknapp, I just got this in since that's orthogonal to this PR :-) . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22896: [SPARKR]found some extra whitespace in the R tests
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22896 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22844 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22844 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22879: [SPARK-25872][SQL][TEST] Add an optimizer tracker for TP...
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22879 `tpcdsQueries` and `tpcdsQueriesV2_7` are duplicated to `TPCDSQueryBenchmark`'s, should we maintain them together? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98281/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22608 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22608 **[Test build #98281 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98281/testReport)** for PR 22608 at commit [`0de8c87`](https://github.com/apache/spark/commit/0de8c87f971d9dccb681678faffe92231a1f0c38). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait UnevaluableAggregate extends DeclarativeAggregate ` * `case class Average(child: Expression) extends DeclarativeAggregate with ImplicitCastInputTypes ` * `case class Count(children: Seq[Expression]) extends DeclarativeAggregate ` * `abstract class UnevaluableBooleanAggBase(arg: Expression)` * `case class EveryAgg(arg: Expression) extends UnevaluableBooleanAggBase(arg) ` * `case class AnyAgg(arg: Expression) extends UnevaluableBooleanAggBase(arg) ` * `case class SomeAgg(arg: Expression) extends UnevaluableBooleanAggBase(arg) ` * `case class UnresolvedCatalystToExternalMap(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22847 @cloud-fan @dongjoon-hyun @kiszk I just add a negative check, maybe we need another PR to figure better value later if it is not easy to decide now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22847 **[Test build #98288 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98288/testReport)** for PR 22847 at commit [`ba392c8`](https://github.com/apache/spark/commit/ba392c89f6c96b3fa64d22a564ac1742b7d0ff54). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22713: [SPARK-25691][SQL] Use semantic equality in Alias...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22713 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22895: [SPARK-25886][SQL][Minor] Improve error message o...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22895#discussion_r229539289 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala --- @@ -100,9 +100,14 @@ case class AvroDataToCatalyst( case NonFatal(e) => parseMode match { case PermissiveMode => nullResultRow case FailFastMode => - throw new SparkException("Malformed records are detected in record parsing. " + + val msg = if (e.getMessage != null) { +e.getMessage + "\n" + } else { +"" + } --- End diff -- I think it's okay to make this if-else inlined --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22883: [SPARK-25837] [Core] Fix potential slowdown in Ap...
Github user patrickbrownsync commented on a diff in the pull request: https://github.com/apache/spark/pull/22883#discussion_r229539016 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala --- @@ -1105,6 +1095,15 @@ private[spark] class AppStatusListener( cleanupCachedQuantiles(key) } + +// Delete tasks for all stages in one pass, as deleting them for each stage individually is slow --- End diff -- Sure Take a look at the implementation of InMemoryView at spark/common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java line 179 specifically the implementation of iterator on line 193, here is an excerpt: ``` Collections.sort(sorted, (e1, e2) -> modifier * compare(e1, e2, getter)); Stream stream = sorted.stream(); if (first != null) { stream = stream.filter(e -> modifier * compare(e, getter, first) >= 0); } if (last != null) { stream = stream.filter(e -> modifier * compare(e, getter, last) <= 0); } ``` and the original, in loop deletion code: ``` val tasks = kvstore.view(classOf[TaskDataWrapper]) .index("stage") .first(key) .last(key) .asScala tasks.foreach { t => kvstore.delete(t.getClass(), t.taskId) } ``` So you can see, if we do this each loop we actually sort the whole collection of TaskDataWrapper which are currently in the store, then go through and check each item based on the key set (the stage). Assuming we have a large number of stages and tasks this is an O(n^2) operation, which is what happens on my production application and the repro code. If we do this in one pass for all stages, we only sort and iterate the list of tasks one time. This same pattern happens fairly frequently using the KVStoreView interface and InMemoryView implementation. Since I am new to contributing to Spark I did not undertake a massive refactor, but I would suggest that this interface and implementation should be looked at and re-designed with efficiency in mind. The current implementation favors flexibility in terms of how the dataset is sorted and filtered, but enforcing a single sort order via something like a SortedSet would hopefully make it clear when the operation being performed was efficiently searching inside the collection, and when you were using an inefficient access pattern. I hope that explains the reasoning, if you have any more questions let me know. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22895: [SPARK-25886][SQL][Minor] Improve error message o...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22895#discussion_r229539055 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala --- @@ -100,9 +100,14 @@ case class AvroDataToCatalyst( case NonFatal(e) => parseMode match { case PermissiveMode => nullResultRow case FailFastMode => - throw new SparkException("Malformed records are detected in record parsing. " + + val msg = if (e.getMessage != null) { +e.getMessage + "\n" + } else { +"" + } + throw new SparkException(msg + "Malformed records are detected in record parsing. " + s"Current parse Mode: ${FailFastMode.name}. To process malformed records as null " + -"result, try setting the option 'mode' as 'PERMISSIVE'.", e.getCause) +"result, try setting the option 'mode' as 'PERMISSIVE'.", e) --- End diff -- @gengliangwang, looks okay. How does the error message look like before/after btw? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22713 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22847#discussion_r229538148 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -812,6 +812,17 @@ object SQLConf { .intConf .createWithDefault(65535) + val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold") +.internal() +.doc("The threshold of source code length without comment of a single Java function by " + + "codegen to be split. When the generated Java function source code exceeds this threshold" + + ", it will be split into multiple small functions. We cannot know how many bytecode will " + + "be generated, so use the code length as metric. When running on HotSpot, a function's " + + "bytecode should not go beyond 8KB, otherwise it will not be JITted; it also should not " + + "be too small, otherwise there will be many function calls.") +.intConf +.createWithDefault(1024) --- End diff -- let's add a check value to make sure the value is positive. We can figure out a lower and upper bound later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22788: [SPARK-25769][SQL]escape nested columns by backti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22788#discussion_r229538040 --- Diff: sql/core/src/test/resources/sql-tests/results/columnresolution-negative.sql.out --- @@ -161,7 +161,7 @@ SELECT db1.t1.i1 FROM t1, mydb2.t1 struct<> -- !query 18 output org.apache.spark.sql.AnalysisException -cannot resolve '`db1.t1.i1`' given input columns: [mydb2.t1.i1, mydb2.t1.i1]; line 1 pos 7 +cannot resolve '`db1`.`t1`.`i1`' given input columns: [mydb2.t1.i1, mydb2.t1.i1]; line 1 pos 7 --- End diff -- LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22857 LGTM except the end-to-end test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r229537395 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2585,4 +2585,45 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { checkAnswer(swappedDf.filter($"key"($"map") > "a"), Row(2, Map(2 -> "b"))) } + + test("SPARK-25860: Replace Literal(null, _) with FalseLiteral whenever possible") { + +def checkPlanIsEmptyLocalScan(df: DataFrame): Unit = df.queryExecution.executedPlan match { + case s: LocalTableScanExec => assert(s.rows.isEmpty) + case p => fail(s"$p is not LocalTableScanExec") +} + +val df1 = Seq((1, true), (2, false)).toDF("l", "b") +val df2 = Seq(2, 3).toDF("l") + +val q1 = df1.where("IF(l > 10, false, b AND null)") +checkAnswer(q1, Seq.empty) +checkPlanIsEmptyLocalScan(q1) + +val q2 = df1.where("CASE WHEN l < 10 THEN null WHEN l > 40 THEN false ELSE null END") +checkAnswer(q2, Seq.empty) +checkPlanIsEmptyLocalScan(q2) + +val q3 = df1.join(df2, when(df1("l") > df2("l"), lit(null)).otherwise(df1("b") && lit(null))) +checkAnswer(q3, Seq.empty) +checkPlanIsEmptyLocalScan(q3) + +val q4 = df1.where("IF(IF(b, null, false), true, null)") +checkAnswer(q4, Seq.empty) +checkPlanIsEmptyLocalScan(q4) + +val q5 = df1.selectExpr("IF(l > 1 AND null, 5, 1) AS out") +checkAnswer(q5, Row(1) :: Row(1) :: Nil) +q5.queryExecution.executedPlan.foreach { p => + assert(p.expressions.forall(e => e.find(_.isInstanceOf[If]).isEmpty)) --- End diff -- This test can pass without the optimization. The `ConvertToLocalRelation` rule will eliminate the `Project`. Can we use a table as input data? e.g. ``` withTable("t1", "t2") { Seq((1, true), (2, false)).toDF("l", "b").write.saveAsTable("t1") Seq(2, 3).toDF("l").write.saveAsTable("t2") val df1 = spark.table("t1") val df2 = spark.table("t2") ... } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22864: [SPARK-25861][Minor][WEBUI] Remove unused refreshInterva...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22864 Looks like the parameter is not used from the first day? https://github.com/apache/spark/pull/290/files#diff-b8adb646ef90f616c34eb5c98d1ebd16R140 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r229537117 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2585,4 +2585,45 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { checkAnswer(swappedDf.filter($"key"($"map") > "a"), Row(2, Map(2 -> "b"))) } + + test("SPARK-25860: Replace Literal(null, _) with FalseLiteral whenever possible") { --- End diff -- it's weird to put optimizer end-to-end test in `DataFrameSuite`. Can we create a `ReplaceNullWithFalseEndToEndSuite`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22883: [SPARK-25837] [Core] Fix potential slowdown in Ap...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22883#discussion_r229535706 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala --- @@ -1105,6 +1095,15 @@ private[spark] class AppStatusListener( cleanupCachedQuantiles(key) } + +// Delete tasks for all stages in one pass, as deleting them for each stage individually is slow --- End diff -- Hi @patrickbrownsync , can you explain more about why deleting them in the loop above is slower? I don't quite understand it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21860 **[Test build #98287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98287/testReport)** for PR 21860 at commit [`8f5eb2b`](https://github.com/apache/spark/commit/8f5eb2b5da332ab3ba2228155cf0c48534119315). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22897 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4645/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22897 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22897 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4645/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user gjhkael commented on the issue: https://github.com/apache/spark/pull/22887 > can you explain more about why you make the change? Some hadoop configuration set it in spark-default.conf, we want it to be global, but in some cases, user need to override the configuration but cannot works, for the sparkContext's conf fill the hadoopConf again finally before broadcast this hadoop conf. > Did you try `spark.SessionState.newHadoopConf()`? We have this problem is in Spark-sql, not use the datafame api --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22897 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4645/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22881: [SPARK-25855][CORE] Don't use erasure coding for event l...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22881 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22881: [SPARK-25855][CORE] Don't use erasure coding for event l...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22881 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98280/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22897 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98286/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22897 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22897 **[Test build #98286 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98286/testReport)** for PR 22897 at commit [`91a4741`](https://github.com/apache/spark/commit/91a47418d7ee8c4d564a79fae6ffa3bc384b09cc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22881: [SPARK-25855][CORE] Don't use erasure coding for event l...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22881 **[Test build #98280 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98280/testReport)** for PR 22881 at commit [`0a54c7d`](https://github.com/apache/spark/commit/0a54c7d0ad4f088f7b8adef6cfa6fa773fecd463). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Average(child: Expression) extends DeclarativeAggregate with ImplicitCastInputTypes ` * `case class Count(children: Seq[Expression]) extends DeclarativeAggregate ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22887 Hi @gjhkael , can you explain more about why you make the change? Did you try `spark.SessionState.newHadoopConf()`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229530694 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -49,34 +49,82 @@ import org.apache.spark.sql.types._ * Due to this reason, we no longer rely on [[ReadContext]] to pass requested schema from [[init()]] * to [[prepareForRead()]], but use a private `var` for simplicity. */ -private[parquet] class ParquetReadSupport(val convertTz: Option[TimeZone]) +private[parquet] class ParquetReadSupport(val convertTz: Option[TimeZone], + usingVectorizedReader: Boolean) extends ReadSupport[UnsafeRow] with Logging { private var catalystRequestedSchema: StructType = _ def this() { // We need a zero-arg constructor for SpecificParquetRecordReaderBase. But that is only // used in the vectorized reader, where we get the convertTz value directly, and the value here // is ignored. -this(None) +this(None, usingVectorizedReader = true) } /** * Called on executor side before [[prepareForRead()]] and instantiating actual Parquet record * readers. Responsible for figuring out Parquet requested schema used for column pruning. */ override def init(context: InitContext): ReadContext = { +val conf = context.getConfiguration catalystRequestedSchema = { - val conf = context.getConfiguration val schemaString = conf.get(ParquetReadSupport.SPARK_ROW_REQUESTED_SCHEMA) assert(schemaString != null, "Parquet requested schema not set.") StructType.fromString(schemaString) } -val caseSensitive = context.getConfiguration.getBoolean(SQLConf.CASE_SENSITIVE.key, +val schemaPruningEnabled = conf.getBoolean(SQLConf.NESTED_SCHEMA_PRUNING_ENABLED.key, + SQLConf.NESTED_SCHEMA_PRUNING_ENABLED.defaultValue.get) +val caseSensitive = conf.getBoolean(SQLConf.CASE_SENSITIVE.key, SQLConf.CASE_SENSITIVE.defaultValue.get) -val parquetRequestedSchema = ParquetReadSupport.clipParquetSchema( - context.getFileSchema, catalystRequestedSchema, caseSensitive) - +val parquetFileSchema = context.getFileSchema +val parquetClippedSchema = ParquetReadSupport.clipParquetSchema(parquetFileSchema, + catalystRequestedSchema, caseSensitive) + +// As part of schema clipping, we add fields in catalystRequestedSchema which are missing +// from parquetFileSchema to parquetClippedSchema. However, nested schema pruning requires +// we ignore unrequested field data when reading from a Parquet file. Therefore we pass two +// schema to ParquetRecordMaterializer: the schema of the file data we want to read +// (parquetRequestedSchema), and the schema of the rows we want to return +// (catalystRequestedSchema). The reader is responsible for reconciling the differences between +// the two. +// +// Aside from checking whether schema pruning is enabled (schemaPruningEnabled), there +// is an additional complication to constructing parquetRequestedSchema. The manner in which +// Spark's two Parquet readers reconcile the differences between parquetRequestedSchema and +// catalystRequestedSchema differ. Spark's vectorized reader does not (currently) support +// reading Parquet files with complex types in their schema. Further, it assumes that +// parquetRequestedSchema includes all fields requested in catalystRequestedSchema. It includes +// logic in its read path to skip fields in parquetRequestedSchema which are not present in the +// file. +// +// Spark's parquet-mr based reader supports reading Parquet files of any kind of complex +// schema, and it supports nested schema pruning as well. Unlike the vectorized reader, the +// parquet-mr reader requires that parquetRequestedSchema include only those fields present in +// the underlying parquetFileSchema. Therefore, in the case where we use the parquet-mr reader +// we intersect the parquetClippedSchema with the parquetFileSchema to construct the +// parquetRequestedSchema set in the ReadContext. --- End diff -- Ok. I see. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229530287 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -93,13 +141,14 @@ private[parquet] class ParquetReadSupport(val convertTz: Option[TimeZone]) log.debug(s"Preparing for read Parquet file with message type: $fileSchema") val parquetRequestedSchema = readContext.getRequestedSchema -logInfo { - s"""Going to read the following fields from the Parquet file: - | - |Parquet form: +log.info { + s"""Going to read the following fields from the Parquet file with the following schema: + |Parquet file schema: + |$fileSchema + |Parquet read schema: --- End diff -- I think it is useful for debugging this patch, but may not useful for end users and will increase log size. Make it as debug level sounds good to me. But let's wait for others opinions too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22897 **[Test build #98286 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98286/testReport)** for PR 22897 at commit [`91a4741`](https://github.com/apache/spark/commit/91a47418d7ee8c4d564a79fae6ffa3bc384b09cc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229530592 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet] class ParquetRowConverter( override def start(): Unit = { var i = 0 -while (i < currentRow.numFields) { +while (i < fieldConverters.length) { fieldConverters(i).updater.start() currentRow.setNullAt(i) --- End diff -- Yea, I think it should be fine to do `setNullAt` at non-corresponding field, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r229529772 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -736,3 +736,60 @@ object CombineConcats extends Rule[LogicalPlan] { flattenConcats(concat) } } + +/** + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations. + * + * This rule applies to conditions in [[Filter]] and [[Join]]. Moreover, it transforms predicates + * in all [[If]] expressions as well as branch conditions in all [[CaseWhen]] expressions. + * + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`. + * + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`; + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually + * `Filter(FalseLiteral)`. + * + * As this rule is not limited to conditions in [[Filter]] and [[Join]], arbitrary plans can + * benefit from it. For example, `Project(If(And(cond, Literal(null)), Literal(1), Literal(2)))` + * can be simplified into `Project(Literal(2))`. + * + * As a result, many unnecessary computations can be removed in the query optimization phase. + */ +object ReplaceNullWithFalse extends Rule[LogicalPlan] { + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond)) +case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond))) +case p: LogicalPlan => p transformExpressions { + case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred)) + case cw @ CaseWhen(branches, _) => +val newBranches = branches.map { case (cond, value) => + replaceNullWithFalse(cond) -> value +} +cw.copy(branches = newBranches) +} + } + + /** + * Recursively replaces `Literal(null, _)` with `FalseLiteral`. + * + * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit + * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`. + */ + private def replaceNullWithFalse(e: Expression): Expression = e match { +case cw: CaseWhen if cw.dataType == BooleanType => + val newBranches = cw.branches.map { case (cond, value) => +replaceNullWithFalse(cond) -> replaceNullWithFalse(value) + } + val newElseValue = cw.elseValue.map(replaceNullWithFalse) + CaseWhen(newBranches, newElseValue) +case i @ If(pred, trueVal, falseVal) if i.dataType == BooleanType => + If(replaceNullWithFalse(pred), replaceNullWithFalse(trueVal), replaceNullWithFalse(falseVal)) --- End diff -- ah, I see. `replaceNullWithFalse` should only work in boolean type cases. Then I think we are fine with it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22805: [SPARK-25809][K8S][TEST] New K8S integration testing bac...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22805 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98279/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22805: [SPARK-25809][K8S][TEST] New K8S integration testing bac...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22805 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22805: [SPARK-25809][K8S][TEST] New K8S integration testing bac...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22805 **[Test build #98279 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98279/testReport)** for PR 22805 at commit [`41af220`](https://github.com/apache/spark/commit/41af2204db8f85f4ebdad4538a26f24e269a45ee). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r229528767 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -736,3 +736,60 @@ object CombineConcats extends Rule[LogicalPlan] { flattenConcats(concat) } } + +/** + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations. + * + * This rule applies to conditions in [[Filter]] and [[Join]]. Moreover, it transforms predicates + * in all [[If]] expressions as well as branch conditions in all [[CaseWhen]] expressions. + * + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`. + * + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`; + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually + * `Filter(FalseLiteral)`. + * + * As this rule is not limited to conditions in [[Filter]] and [[Join]], arbitrary plans can + * benefit from it. For example, `Project(If(And(cond, Literal(null)), Literal(1), Literal(2)))` + * can be simplified into `Project(Literal(2))`. + * + * As a result, many unnecessary computations can be removed in the query optimization phase. + */ +object ReplaceNullWithFalse extends Rule[LogicalPlan] { + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond)) +case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond))) +case p: LogicalPlan => p transformExpressions { + case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred)) + case cw @ CaseWhen(branches, _) => +val newBranches = branches.map { case (cond, value) => + replaceNullWithFalse(cond) -> value +} +cw.copy(branches = newBranches) +} + } + + /** + * Recursively replaces `Literal(null, _)` with `FalseLiteral`. + * + * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit + * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`. + */ + private def replaceNullWithFalse(e: Expression): Expression = e match { +case cw: CaseWhen if cw.dataType == BooleanType => + val newBranches = cw.branches.map { case (cond, value) => +replaceNullWithFalse(cond) -> replaceNullWithFalse(value) + } + val newElseValue = cw.elseValue.map(replaceNullWithFalse) + CaseWhen(newBranches, newElseValue) +case i @ If(pred, trueVal, falseVal) if i.dataType == BooleanType => + If(replaceNullWithFalse(pred), replaceNullWithFalse(trueVal), replaceNullWithFalse(falseVal)) --- End diff -- The general rule for `LogicalPlan` at `apply` looks at `predicate` of `If`, no matter its `dataType` is `BooleanType` or not. But in `replaceNullWithFalse`, the rule for `If` only works if its `dataType` is `BooleanType`. `"replace null in predicates of If inside another If"` is a such case. The `If` inside another `If` is of `BooleanType`. If the inside `If` is not of `BooleanType`, this rule doesn't work. And I think it should be ok to replace the null with false when it is not boolean type. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4644/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4643/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22760 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22760 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4642/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22760 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4642/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integr...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/22608#discussion_r229523838 --- Diff: resource-managers/kubernetes/integration-tests/kerberos-yml/kerberos-set.yml --- @@ -0,0 +1,49 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +apiVersion: apps/v1 +kind: StatefulSet +metadata: + name: kerberos +spec: + replicas: 1 + selector: +matchLabels: + name: hdfs-kerberos + kerberosService: kerberos + job: kerberostest + template: +metadata: + annotations: +pod.beta.kubernetes.io/hostname: kerberos + labels: +name: hdfs-kerberos +kerberosService: kerberos +job: kerberostest +spec: + containers: + - command: ["sh"] +args: ["/start-kdc.sh"] +name: kerberos +imagePullPolicy: IfNotPresent +volumeMounts: +- mountPath: /var/keytabs + name: kerb-keytab + restartPolicy: Always + volumes: + - name: kerb-keytab +persistentVolumeClaim: --- End diff -- With a `StatefulSet`, you don't need to explicitly manage PVCs. You can use `.spec.persistentVolumeClaimTemplate`. The StatefulSet controller automatically creates the PV (or binds to the existing one it created before). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integr...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/22608#discussion_r229523966 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/kerberos/KerberosPVWatcherCache.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.k8s.integrationtest.kerberos + +import io.fabric8.kubernetes.api.model.{PersistentVolume, PersistentVolumeClaim} +import io.fabric8.kubernetes.client.{KubernetesClientException, Watch, Watcher} +import io.fabric8.kubernetes.client.Watcher.Action +import org.scalatest.Matchers +import org.scalatest.concurrent.Eventually +import scala.collection.JavaConverters._ + +import org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.{INTERVAL, TIMEOUT} +import org.apache.spark.internal.Logging + +/** + * This class is responsible for ensuring that the persistent volume claims are bounded + * to the correct persistent volume and that they are both created before launching the --- End diff -- With `StatefulSet`s, you probably don't need this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #98285 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98285/testReport)** for PR 22275 at commit [`6457e42`](https://github.com/apache/spark/commit/6457e420e3b8366c1373e7adb0bf56df03b9cc19). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/22275#discussion_r229522939 --- Diff: python/pyspark/sql/tests.py --- @@ -4923,6 +4923,28 @@ def test_timestamp_dst(self): self.assertPandasEqual(pdf, df_from_python.toPandas()) self.assertPandasEqual(pdf, df_from_pandas.toPandas()) +def test_toPandas_batch_order(self): + +# Collects Arrow RecordBatches out of order in driver JVM then re-orders in Python +def run_test(num_records, num_parts, max_records): +df = self.spark.range(num_records, numPartitions=num_parts).toDF("a") +with self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": max_records}): +pdf, pdf_arrow = self._toPandas_arrow_toggle(df) +self.assertPandasEqual(pdf, pdf_arrow) + +cases = [ +(1024, 512, 2), # Try large num partitions for good chance of not collecting in order +(512, 64, 2),# Try medium num partitions to test out of order collection +(64, 8, 2), # Try small number of partitions to test out of order collection +(64, 64, 1), # Test single batch per partition +(64, 1, 64), # Test single partition, single batch +(64, 1, 8), # Test single partition, multiple batches +(30, 7, 2), # Test different sized partitions +] --- End diff -- @holdenk and @felixcheung , I didn't do a loop but chose some different levels of partition numbers to be a bit more sure that partitions won't end up in order. I also added some other cases of different partition/batch ratios. Let me know if you think we need more to be sure here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22760 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4642/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22275 Apologies for the delay in circling back to this. I reorganized a little to simplify and expanded the comments to hopefully better describe the code. A quick summary of the changes: I changed the ArrowStreamSerializer to not have any state - that seemed to complicate things. So instead of saving the batch order indices, they are loaded on the last iteration of `load_stream`, and this was put in a special serializer `ArrowCollectSerializer` so that it is clear where it is used. I also consolidated all the batch ordering calls within `_collectAsArrow` so it is easier to follow the whole process. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22760 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22760 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98283/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22760: [SPARK-25751][K8S][TEST] Unit Testing for Kerberos Suppo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22760 **[Test build #98283 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98283/testReport)** for PR 22760 at commit [`cbbc557`](https://github.com/apache/spark/commit/cbbc557b1bf6b3f36ac805ccf3a2d5a4642ec1a8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org