[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23124 **[Test build #99310 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99310/testReport)** for PR 23124 at commit [`abd0ec5`](https://github.com/apache/spark/commit/abd0ec543a944fa02320337f4fab7fff6ffe9667). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23124 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23124 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5393/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23145: [MINOR][Docs] "a R interpreter" -> "an R interpre...
Github user kjmrknsn commented on a diff in the pull request: https://github.com/apache/spark/pull/23145#discussion_r236549454 --- Diff: docs/index.md --- @@ -67,7 +67,7 @@ Example applications are also provided in Python. For example, ./bin/spark-submit examples/src/main/python/pi.py 10 Spark also provides an experimental [R API](sparkr.html) since 1.4 (only DataFrames APIs included). --- End diff -- Should we remove the "experimental" adjective? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user gjhkael commented on the issue: https://github.com/apache/spark/pull/22887 @vanzin Thanks for you review, I add a new commit to let the user's "set" command take effect. Let me know if you have an easier way. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23144: [SPARK-26172][ML][WIP] Unify String Params' case-insensi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23144 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23144: [SPARK-26172][ML][WIP] Unify String Params' case-insensi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23144 **[Test build #99307 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99307/testReport)** for PR 23144 at commit [`27429fb`](https://github.com/apache/spark/commit/27429fb5ae95d9ee68dc0f0a769fd3412c54ebfc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23144: [SPARK-26172][ML][WIP] Unify String Params' case-insensi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23144 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99307/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23124 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99304/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23124 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23124 **[Test build #99304 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99304/testReport)** for PR 23124 at commit [`0c77c41`](https://github.com/apache/spark/commit/0c77c4173606edb09d4d2c7a721dee48c976233f). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23145: [MINOR][Docs] "a R interpreter" -> "an R interpre...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/23145#discussion_r236546043 --- Diff: docs/index.md --- @@ -67,7 +67,7 @@ Example applications are also provided in Python. For example, ./bin/spark-submit examples/src/main/python/pi.py 10 Spark also provides an experimental [R API](sparkr.html) since 1.4 (only DataFrames APIs included). --- End diff -- tbh, I'm not sure this should be called "an experimental [R API]" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23144: [SPARK-26172][ML][WIP] Unify String Params' case-insensi...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/23144 I am not sure about `$$` or `%%`, we can replace them with other names. I want to resolve the confusion of case-insensitivity, and wonder whether a new flag can do this. If we want to keep the return of getter identical to value passed to setter, we will need two version of 'getter', one to return the original value, the other to return the lower/upper value. Another issue is that, there is no `StringParam` trait, so we have to modify `Param` trait directly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23144: [SPARK-26172][ML][WIP] Unify String Params' case-insensi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23144 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23128 **[Test build #99308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99308/testReport)** for PR 23128 at commit [`8689acb`](https://github.com/apache/spark/commit/8689acbc6c1f106d7b2d906bcd627147f649e6ff). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23144: [SPARK-26172][ML][WIP] Unify String Params' case-insensi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23144 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5390/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23083 **[Test build #99309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99309/testReport)** for PR 23083 at commit [`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23128 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5391/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23128 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23083 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23144: [SPARK-26172][ML][WIP] Unify String Params' case-insensi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23144 **[Test build #99307 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99307/testReport)** for PR 23144 at commit [`27429fb`](https://github.com/apache/spark/commit/27429fb5ae95d9ee68dc0f0a769fd3412c54ebfc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23083 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5392/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23083 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23128: [SPARK-26142][SQL] Support passing shuffle metrics to ex...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23128 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23122: [MINOR][ML] add missing params to Instr
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23122 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23122: [MINOR][ML] add missing params to Instr
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23122 **[Test build #99306 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99306/testReport)** for PR 23122 at commit [`b74a766`](https://github.com/apache/spark/commit/b74a76639ce9db1c6d002a2361e7be598cbd7dc7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23122: [MINOR][ML] add missing params to Instr
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23122 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5389/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23122: [MINOR][ML] add missing params to Instr
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/23122#discussion_r236537309 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -671,7 +671,7 @@ class ALS(@Since("1.4.0") override val uid: String) extends Estimator[ALSModel] instr.logDataset(dataset) instr.logParams(this, rank, numUserBlocks, numItemBlocks, implicitPrefs, alpha, userCol, itemCol, ratingCol, predictionCol, maxIter, regParam, nonnegative, checkpointInterval, - seed, intermediateStorageLevel, finalStorageLevel) + seed, intermediateStorageLevel, finalStorageLevel, coldStartStrategy) --- End diff -- Yes, we could let coldStartStrategy alone. BTW, I made a rapid scan and found that some algs do not log the columns. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23106: [SPARK-26141] Enable custom metrics implementatio...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23106 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23106: [SPARK-26141] Enable custom metrics implementation in sh...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/23106 Merging in master. Thanks @squito. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23152 **[Test build #99305 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99305/testReport)** for PR 23152 at commit [`1c1edcc`](https://github.com/apache/spark/commit/1c1edccc43ac9cdb6da53ad93ac998b06bf8b6dd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23152 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5388/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23152 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/23152 [SPARK-26181][SQL] the `hasMinMaxStats` method of `ColumnStatsMap` is not correct ## What changes were proposed in this pull request? For now the `hasMinMaxStats` will return the same as `hasCountStats`, which is obviously not as expected. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark minmaxstats Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23152.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23152 commit 1c1edccc43ac9cdb6da53ad93ac998b06bf8b6dd Author: Daoyuan Wang Date: 2018-11-27T06:18:34Z fix ColumnStatsMap.hasMinMaxStats --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23148: [SPARK-26177] Automated formatting for Scala code
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23148 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23148: [SPARK-26177] Automated formatting for Scala code
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99299/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23148: [SPARK-26177] Automated formatting for Scala code
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23148 **[Test build #99299 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99299/testReport)** for PR 23148 at commit [`92dd105`](https://github.com/apache/spark/commit/92dd105b4f2d3be2d2059f5a983dab01339e81a9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule Repla...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23139 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullW...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/23139 Thanks. Merged into master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23130: [SPARK-26161][SQL] Ignore empty files in load
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23130#discussion_r236515927 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -388,7 +388,7 @@ case class FileSourceScanExec( logInfo(s"Planning with ${bucketSpec.numBuckets} buckets") val filesGroupedToBuckets = selectedPartitions.flatMap { p => -p.files.map { f => +p.files.filter(_.getLen > 0).map { f => --- End diff -- yes, and the same change is also in `createNonBucketedReadRDD` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23124 **[Test build #99304 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99304/testReport)** for PR 23124 at commit [`0c77c41`](https://github.com/apache/spark/commit/0c77c4173606edb09d4d2c7a721dee48c976233f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23124 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5387/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23124 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23143: [SPARK-24762][SQL][Followup] Enable Option of Pro...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23143 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23138 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23138 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99300/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23138 **[Test build #99300 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99300/testReport)** for PR 23138 at commit [`5ef0035`](https://github.com/apache/spark/commit/5ef0035cb64ede974cf0dbdbd095f6ee207c6e76). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23151 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir functi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23151 **[Test build #99303 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99303/testReport)** for PR 23151 at commit [`6402d2d`](https://github.com/apache/spark/commit/6402d2d35fd1588a730441755b3aa47c11b08f1c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23151 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5386/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23143: [SPARK-24762][SQL][Followup] Enable Option of Product en...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23143 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23106: [SPARK-26141] Enable custom metrics implementation in sh...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23106 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23106: [SPARK-26141] Enable custom metrics implementation in sh...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23106 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99298/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236514039 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; + +import org.apache.spark.annotation.Evolving; +import org.apache.spark.sql.sources.v2.reader.Scan; +import org.apache.spark.sql.sources.v2.reader.ScanBuilder; +import org.apache.spark.sql.types.StructType; + +/** + * An interface representing a logical structured data set of a data source. For example, the + * implementation can be a directory on the file system, a topic of Kafka, or a table in the + * catalog, etc. + * + * This interface can mixin the following interfaces to support different operations: + * + * {@link SupportsBatchRead}: this table can be read in batch queries. + * + */ +@Evolving +public interface Table { + + /** + * Returns the schema of this table. + */ + StructType schema(); + + /** + * Returns a {@link ScanBuilder} which can be used to build a {@link Scan} later. Spark will call + * this method for each data scanning query. + * + * The builder can take some query specific information to do operators pushdown, and keep these + * information in the created {@link Scan}. + */ + ScanBuilder newScanBuilder(DataSourceOptions options); --- End diff -- I agree with it. Since `CaseInsensitiveStringMap` is not in the code base yet, shall we do it in the followup? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23106: [SPARK-26141] Enable custom metrics implementation in sh...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23106 **[Test build #99298 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99298/testReport)** for PR 23106 at commit [`4e9b756`](https://github.com/apache/spark/commit/4e9b7565a4e78d52cc042a2768043b9db0b21ff6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236513622 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; --- End diff -- why does this `Table` API need to be in catalyst? It's not even a plan. We can define a table plan interface in catalyst, and implement it in the SQL module with this `Table` API. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22991 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22991 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99301/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22991 **[Test build #99301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99301/testReport)** for PR 22991 at commit [`74cc277`](https://github.com/apache/spark/commit/74cc277dc5668ad59efd19fbf47d4cfa824ba9bf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236512196 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I mean, even if it succeeds to allocate in Yarn and Python worker doesn't have the control on that, what's the point? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23127 there are still 2 golden file test failures because of the plan change... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23151 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23151 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99302/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir functi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23151 **[Test build #99302 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99302/testReport)** for PR 23151 at commit [`d29d4ec`](https://github.com/apache/spark/commit/d29d4ece73dd17ee3ba5e1e85e2d35096f524810). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir functi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23151 **[Test build #99302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99302/testReport)** for PR 23151 at commit [`d29d4ec`](https://github.com/apache/spark/commit/d29d4ece73dd17ee3ba5e1e85e2d35096f524810). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23151 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23151 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5385/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23151 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23151: [SPARK-26180][CORE][TEST] Add a withCreateTempDir...
GitHub user heary-cao opened a pull request: https://github.com/apache/spark/pull/23151 [SPARK-26180][CORE][TEST] Add a withCreateTempDir function to the SparkCore test case ## What changes were proposed in this pull request? Currently, the common `withTempDir` function is used in Spark SQL test cases. To handle `val dir = Utils. createTempDir()` and `Utils. deleteRecursively (dir)`. Unfortunately, the `withTempDir` function cannot be used in the Spark Core test case. This PR adds a common `withCreateTempDir` function to clean up SparkCore test cases. thanks. ## How was this patch tested? N / A You can merge this pull request into a Git repository by running: $ git pull https://github.com/heary-cao/spark withCreateTempDir Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23151.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23151 commit d29d4ece73dd17ee3ba5e1e85e2d35096f524810 Author: caoxuewen Date: 2018-11-27T03:00:08Z Add a withCreateTempDir function to the SparkCore test case --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23019: [SPARK-26025][k8s] Speed up docker image build on dev re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23019 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23019: [SPARK-26025][k8s] Speed up docker image build on dev re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23019 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99293/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22995: [SPARK-25998] [CORE] Change TorrentBroadcast to hold wea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22995 **[Test build #4443 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4443/testReport)** for PR 22995 at commit [`09ae762`](https://github.com/apache/spark/commit/09ae762962098e58be7ba8f777f9dffde2f81d81). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23019: [SPARK-26025][k8s] Speed up docker image build on dev re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23019 **[Test build #99293 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99293/testReport)** for PR 23019 at commit [`4439543`](https://github.com/apache/spark/commit/4439543cc8e320542cd8f6826662c541d18e0a72). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22991 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22991 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5384/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22991 **[Test build #99301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99301/testReport)** for PR 22991 at commit [`74cc277`](https://github.com/apache/spark/commit/74cc277dc5668ad59efd19fbf47d4cfa824ba9bf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23149: [SPARK-25451][HOTFIX] Call stage.attemptNumber instead o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23149 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99292/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23149: [SPARK-25451][HOTFIX] Call stage.attemptNumber instead o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23149 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23149: [SPARK-25451][HOTFIX] Call stage.attemptNumber instead o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23149 **[Test build #99292 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99292/testReport)** for PR 23149 at commit [`9c84f18`](https://github.com/apache/spark/commit/9c84f181a460b7b22afca942f38ffb3eee6d545c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullW...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23139 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullW...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23139 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99294/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236500170 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; + +import org.apache.spark.annotation.Evolving; +import org.apache.spark.sql.sources.v2.reader.Scan; +import org.apache.spark.sql.sources.v2.reader.ScanBuilder; +import org.apache.spark.sql.types.StructType; + +/** + * An interface representing a logical structured data set of a data source. For example, the + * implementation can be a directory on the file system, a topic of Kafka, or a table in the + * catalog, etc. + * + * This interface can mixin the following interfaces to support different operations: + * + * {@link SupportsBatchRead}: this table can be read in batch queries. + * + */ +@Evolving +public interface Table { + + /** + * Returns the schema of this table. + */ + StructType schema(); + + /** + * Returns a {@link ScanBuilder} which can be used to build a {@link Scan} later. Spark will call + * this method for each data scanning query. + * + * The builder can take some query specific information to do operators pushdown, and keep these + * information in the created {@link Scan}. + */ + ScanBuilder newScanBuilder(DataSourceOptions options); --- End diff -- Makes sense to me - `DataSourceOptions` was carrying along identifiers that really belong to a table identifier and that should be interpreted at the catalog level, not the data read level. In other words the implementation of this `Table` should already know _what_ locations to look up (e.g. "files comprising dataset D"), now it's a matter of _how_ (e.g. pushdown, filter predicates). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullW...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23139 **[Test build #99294 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99294/testReport)** for PR 23139 at commit [`8b0401c`](https://github.com/apache/spark/commit/8b0401c4440136e33d4580a3f8da80164de3d4b4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236494667 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I mean .. what I am disabling here is setting Python memory limit on Windows; looks Yarn side still can allocate more but not tested. I'm thinking we should disable whole feature on Windows ideally but I couldn't either test it on Windows because I don't have Windows Yarn cluster (I only have one Windows machine that has dev environment). I was trying to at least document that it doesn't work because it's going to work differently comparing to other OSes that don't have `resource` module (For current status, PySpark apps that uses Python worker do not work at all and we don't necessarily document it works on Windows). I think it's more correct to say it does not work because it's not tested (or at least just simply say it's not guaranteed on Windows). For the reason that I prefer to check it on JVM side instead is, 1. It's really usual to check it on JVM side when it's possible and make the worker simple. It could make it coupled to JVM but it's already strongly coupled 2. If Yarn code can also be tested, and if it doesn't work, it should also be disabled in JVM side. If we can find a workaround on Windows, we can fix the Python worker and enable it. 3. If it's checked JVM side ahead, it reduces one state in JVM (`memoryMb`) is enabled in JVM side but Python skips it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236492408 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; --- End diff -- Everything in catalyst is considered private (although public visibility for debugging) and it's best to stay that way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236491385 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; + +import org.apache.spark.annotation.Evolving; +import org.apache.spark.sql.sources.v2.reader.Scan; +import org.apache.spark.sql.sources.v2.reader.ScanBuilder; +import org.apache.spark.sql.types.StructType; + +/** + * An interface representing a logical structured data set of a data source. For example, the + * implementation can be a directory on the file system, a topic of Kafka, or a table in the + * catalog, etc. + * + * This interface can mixin the following interfaces to support different operations: + * + * {@link SupportsBatchRead}: this table can be read in batch queries. + * + */ +@Evolving +public interface Table { + + /** + * Returns the schema of this table. + */ + StructType schema(); + + /** + * Returns a {@link ScanBuilder} which can be used to build a {@link Scan} later. Spark will call + * this method for each data scanning query. + * + * The builder can take some query specific information to do operators pushdown, and keep these + * information in the created {@link Scan}. + */ + ScanBuilder newScanBuilder(DataSourceOptions options); --- End diff -- `DataSourceOptions` isn't simply a map for two main reasons that I can tell: first, it forces options to be case insensitive, and second, it exposes helper methods to identify tables, like `tableName`, `databaseName`, and `paths`. In the new abstraction, the second use of `DataSourceOptions` is no longer needed. The table is already instantiated by the time that this is called. We should to reconsider `DataSourceOptions`. The `tableName` methods aren't needed and we also no longer need to forward properties from the session config because the way tables are configured has changed (catalogs handle that). I think we should remove this class and instead use the more direct implementation, `CaseInsensitiveStringMap` from #21306. The behavior of that class is obvious from its name and it would be shared between the v2 APIs, both catalog and data source. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236490800 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; --- End diff -- Moving this to the Catalyst package would set a precedent for user-overridable behavior to live in the catalyst project. I'm not aware of anything in the Catalyst package being considered as public API right now. Are we allowed to start such a convention at this juncture? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...
Github user shaneknapp commented on a diff in the pull request: https://github.com/apache/spark/pull/23117#discussion_r236488706 --- Diff: dev/run-tests.py --- @@ -434,6 +434,63 @@ def run_python_tests(test_modules, parallelism): run_cmd(command) +def run_python_tests_with_coverage(test_modules, parallelism): +set_title_and_block("Running PySpark tests with coverage report", "BLOCK_PYSPARK_UNIT_TESTS") + +command = [os.path.join(SPARK_HOME, "python", "run-tests-with-coverage")] +if test_modules != [modules.root]: +command.append("--modules=%s" % ','.join(m.name for m in test_modules)) +command.append("--parallelism=%i" % parallelism) +run_cmd(command) +post_python_tests_results() + + +def post_python_tests_results(): +if "SPARK_TEST_KEY" not in os.environ: +print("[error] 'SPARK_TEST_KEY' environment variable was not set. Unable to post" + "PySpark coverage results.") +sys.exit(1) --- End diff -- sure, i can do that tomorrow (currently heading out for the day). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236488396 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- @vanzin, I mean the first change I did at https://github.com/apache/spark/pull/23055/commits/2d3315a7dab429abc4d9ef5ed7f8f5484e8421f1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23136: [SPARK-25515][K8s] Adds a config option to keep e...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/23136#discussion_r236488306 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala --- @@ -86,11 +88,14 @@ private[spark] class ExecutorPodsAllocator( s" cluster after $podCreationTimeout milliseconds despite the fact that a" + " previous allocation attempt tried to create it. The executor may have been" + " deleted but the application missed the deletion event.") -Utils.tryLogNonFatalError { - kubernetesClient -.pods() -.withLabel(SPARK_EXECUTOR_ID_LABEL, execId.toString) -.delete() + +if (shouldDeleteExecutors) { + Utils.tryLogNonFatalError { +kubernetesClient + .pods() + .withLabel(SPARK_EXECUTOR_ID_LABEL, execId.toString) --- End diff -- I just run multiple jobs in parallel: ``` NAMEREADY STATUS RESTARTS AGE spark-pi-1543281035622-driver 1/1 Running 0 2m spark-pi-1543281035622-exec-1 1/1 Running 0 1m spark-pi-1543281098418-driver 0/1 Completed 0 1m spark-pi-1543281107591-driver 0/1 Completed 0 57s ``` The long running was not terminated... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r236487464 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; --- End diff -- #21306 (TableCatalog support) adds this class as `org.apache.spark.sql.catalog.v2.Table` in the `spark-catalyst` module. I think it needs to be in the catalyst module and should probably be in the `o.a.s.sql.catalog.v2` package as well. The important one is moving this to the catalyst module. The analyzer is in catalyst and all of the v2 logical plans and analysis rules will be in catalyst as well, because we are standardizing behavior. The standard validation rules should be in catalyst, not in a source-specific or hive-specific package in the sql-core or hive modules. Because the logical plans and validation rules are in the catalyst package, the `TableCatalog` API needs to be there as well. For example, when a [catalog table identifier](https://github.com/apache/spark/pull/21978) is resolved for a read query, one of the results is a `TableCatalog` instance for the catalog portion of the identifier. That catalog is used to load the v2 table, which is then wrapped in a v2 relation for further analysis. Similarly, the write path should also validate that the catalog exists during analysis by loading it, and would then pass the catalog in a v2 logical plan for `CreateTable` or `CreateTableAsSelect`. I also think that it makes sense to use the `org.apache.spark.sql.catalog.v2` package for `Table` because `Table` is more closely tied to the `TableCatalog` API than to the data source API. The link to DSv2 is that `Table` carries `newScanBuilder`, but the rest of the methods exposed by `Table` are for catalog functions, like inspecting a table's partitioning or table properties. Moving this class would make adding `TableCatalog` less intrusive. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23147: [SPARK-26140] followup: rename ShuffleMetricsRepo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23147 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236487153 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- > The point Ryan and I are making is that there's no point in having this check in the Java code, because the Python code already handles it. It doesn't work on Windows, and the updated Python code handles that. My point is we should remove the Python condition. Also, I mean I wonder if this whole feature itself introduced at https://github.com/apache/spark/pull/21977 works or not since it looks not ever tested. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23117 > does this add time to running the tests? Given my local tests, the time diff looked slightly increasing .. I want to see how it works in Jenkins .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23117: [WIP][SPARK-7721][INFRA] Run and generate test co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23117#discussion_r236486298 --- Diff: dev/run-tests.py --- @@ -434,6 +434,63 @@ def run_python_tests(test_modules, parallelism): run_cmd(command) +def run_python_tests_with_coverage(test_modules, parallelism): +set_title_and_block("Running PySpark tests with coverage report", "BLOCK_PYSPARK_UNIT_TESTS") + +command = [os.path.join(SPARK_HOME, "python", "run-tests-with-coverage")] +if test_modules != [modules.root]: +command.append("--modules=%s" % ','.join(m.name for m in test_modules)) +command.append("--parallelism=%i" % parallelism) +run_cmd(command) +post_python_tests_results() + + +def post_python_tests_results(): +if "SPARK_TEST_KEY" not in os.environ: +print("[error] 'SPARK_TEST_KEY' environment variable was not set. Unable to post" + "PySpark coverage results.") +sys.exit(1) --- End diff -- @shaneknapp can you add another environment variable that indicates PR builder and spark-master-test-sbt-hadoop-2.7 where we're going to run Python coverage? I can check it and explicitly enable it only in that condition. True, if the condition below (which I checked before at #17669): ```python os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE", "") == "hadoop2.7" and os.environ.get("SPARK_BRANCH", "") == "master" and os.environ.get("AMPLAB_JENKINS", "") == "true" and os.environ.get("AMPLAB_JENKINS_BUILD_TOOL", "") == "sbt") ``` is `True` in Jenkins build or other users environment, it might cause some problems (even though looks quite unlikely). For similar instance, if `AMPLAB_JENKINS` is set in users environment who run the tests locally, it wouldn't work anyway tho. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23136: [SPARK-25515][K8s] Adds a config option to keep e...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/23136#discussion_r236485835 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala --- @@ -86,11 +88,14 @@ private[spark] class ExecutorPodsAllocator( s" cluster after $podCreationTimeout milliseconds despite the fact that a" + " previous allocation attempt tried to create it. The executor may have been" + " deleted but the application missed the deletion event.") -Utils.tryLogNonFatalError { - kubernetesClient -.pods() -.withLabel(SPARK_EXECUTOR_ID_LABEL, execId.toString) -.delete() + +if (shouldDeleteExecutors) { + Utils.tryLogNonFatalError { +kubernetesClient + .pods() + .withLabel(SPARK_EXECUTOR_ID_LABEL, execId.toString) --- End diff -- Correct here is a sample output: ```Labels:spark-app-selector=spark-application-1543242814683 spark-exec-id=2 spark-role=executor ``` Weird that didnt see one job interfering with the other before. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23150 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99296/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23150 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23150 **[Test build #99296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99296/testReport)** for PR 23150 at commit [`f287b77`](https://github.com/apache/spark/commit/f287b77d94de9e9f466c0ff2c2370f22a46b48f7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r236485186 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) // each python worker gets an equal part of the allocation. the worker pool will grow to the // number of concurrent tasks, which is determined by the number of cores in this executor. - private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + private val memoryMb = if (Utils.isWindows) { --- End diff -- I don't understand what you're getting at here. The point Ryan and I are making is that there's no point in having this check in the Java code, because the Python code already handles it. It doesn't work on Windows, and the updated Python code handles that. The only shortcoming here is passing an env variable to Python that won't be used if you're on Windows. That's really super trivial and not a big deal at all. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org