[GitHub] spark pull request #21570: [SPARK-24564][TEST] Add test suite for RecordBina...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21570#discussion_r198710399 --- Diff: core/src/test/java/org/apache/spark/memory/TestMemoryConsumer.java --- @@ -43,6 +47,12 @@ void free(long size) { used -= size; taskMemoryManager.releaseExecutionMemory(size, this); } + + @VisibleForTesting --- End diff -- it's already in the test package, we don't need this tag. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21597: [SPARK-24603][SQL] Fix findTightestCommonType reference ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21597 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92405/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21597: [SPARK-24603][SQL] Fix findTightestCommonType reference ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21597 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21597: [SPARK-24603][SQL] Fix findTightestCommonType reference ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21597 **[Test build #92405 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92405/testReport)** for PR 21597 at commit [`9a65366`](https://github.com/apache/spark/commit/9a65366a0c9d9e7e57ecdaa0d437af01cbc0d006). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21546 Hey @BryanCutler, btw, mind i fI ask move the benchmarks into the PR description? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21474: [SPARK-24297][CORE] Fetch-to-disk by default for ...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21474#discussion_r198709276 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -429,7 +429,11 @@ package object config { "external shuffle service, this feature can only be worked when external shuffle" + "service is newer than Spark 2.2.") .bytesConf(ByteUnit.BYTE) - .createWithDefault(Long.MaxValue) + // fetch-to-mem is guaranteed to fail if the message is bigger than 2 GB, so we might + // as well use fetch-to-disk in that case. The message includes some metadata in addition + // to the block data itself (in particular UploadBlock has a lot of metadata), so we leave + // extra room. + .createWithDefault(Int.MaxValue - 500) --- End diff -- Actually I prefer 512 to 500 :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21589 Are you maybe able to manually test this in other cluster like standalone or yarn too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r198708461 --- Diff: python/pyspark/context.py --- @@ -406,6 +406,22 @@ def defaultMinPartitions(self): """ return self._jsc.sc().defaultMinPartitions() +@property +def numCores(self): +""" +Total number of CPU cores of all executors registered in the cluster at the moment. +The number reflects current status of the cluster and can change in the future. +""" --- End diff -- Let's add a version information here too. It should have added versions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21542 from the error log it seems we need to include the test tag module in the pom.xml somewhere. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21533 **[Test build #92410 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92410/testReport)** for PR 21533 at commit [`eb46ccf`](https://github.com/apache/spark/commit/eb46ccfec084c2439a26eee38015381f091fe164). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21533 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21533 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/534/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r198705671 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1519,7 +1519,12 @@ class SparkContext(config: SparkConf) extends Logging { def addFile(path: String, recursive: Boolean): Unit = { val uri = new Path(path).toUri val schemeCorrectedPath = uri.getScheme match { - case null | "local" => new File(path).getCanonicalFile.toURI.toString + case null => new File(path).getCanonicalFile.toURI.toString + case "local" => +logWarning("We do not support add a local file here because file with local scheme is " + + "already existed on every node, there is no need to call addFile to add it again. " + + "(See more discussion about this in SPARK-24195.)") --- End diff -- Got it, rephrase done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21653 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21653 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21653 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...
Github user hthuynh2 commented on the issue: https://github.com/apache/spark/pull/21653 cc @tgravescs --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21653: [SPARK-13343] speculative tasks that didn't commi...
GitHub user hthuynh2 opened a pull request: https://github.com/apache/spark/pull/21653 [SPARK-13343] speculative tasks that didn't commit shouldn't be marked as success **Description** Currently Speculative tasks that didn't commit can show up as success of failures (depending on timing of commit). This is a bit confusing because that task didn't really succeed in the sense it didn't write anything. I think these tasks should be marked as KILLED or something that is more obvious to the user exactly what happened. it is happened to hit the timing where it got a commit denied exception then it shows up as failed and counts against your task failures. It shouldn't count against task failures since that failure really doesn't matter. MapReduce handles these situation so perhaps we can look there for a model. https://user-images.githubusercontent.com/15680678/42013170-99db48c2-7a61-11e8-8c7b-ef94c84e36ea.png;> **How can this issue happen?** When both attempts of a task finish before the driver sends command to kill one of them, both of them send the status update FINISHED to the driver. The driver calls TaskSchedulerImpl to handle one successful task at a time. When it handles the first successful task, it sends the command to kill the other copy of the task, however, because that task is already finished, the executor will ignore the command. After finishing handling the first attempt, it processes the second one, although all actions on the result of this task are skipped, this copy of the task is still marked as SUCCESS. As a result, even though this issue does not affect the result of the job, it might cause confusing to user because both of them appear to be successful. **How does this PR fix the issue?** The simple way to fix this issue is that when taskSetManager handles successful task, it checks if any other attempt succeeded. If this is the case, it will call handleFailedTask with state==KILLED and reason==TaskKilled(âanother attempt succeededâ) to handle this task as begin killed. **How was this patch tested?** I tested this manually by running applications, that caused the issue before, a few times, and observed that the issue does not happen again. Also, I added a unit test in TaskSetManagerSuite to test that if we call handleSuccessfulTask to handle status update for 2 copies of a task, only the one that is handled first will be mark as SUCCESS You can merge this pull request into a Git repository by running: $ git pull https://github.com/hthuynh2/spark SPARK_13343 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21653.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21653 commit 8f7d98177816e11659cf79a2b28f96bd4b7173d5 Author: Hieu Huynh <âhieu.huynh@...> Date: 2018-06-28T04:19:14Z Fixed issue and added unit test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21652 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21652 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/533/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21652 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/533/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21652 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92409/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21652 **[Test build #92409 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92409/testReport)** for PR 21652 at commit [`7602dbc`](https://github.com/apache/spark/commit/7602dbc40779bc1972f5387eb2524e093b2c7a5e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21652 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21652 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/533/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21652 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92408/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21652 **[Test build #92408 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92408/testReport)** for PR 21652 at commit [`f0d59cc`](https://github.com/apache/spark/commit/f0d59cc2f6cd966e28e9dfe37922ecba69445c83). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21652 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21652 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/532/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21652 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/532/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21652 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21652 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/532/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21652 **[Test build #92409 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92409/testReport)** for PR 21652 at commit [`7602dbc`](https://github.com/apache/spark/commit/7602dbc40779bc1972f5387eb2524e093b2c7a5e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8s] Add integration tests for secrets
Github user skonto commented on the issue: https://github.com/apache/spark/pull/21652 @foxish @liyinan926 pls review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8s] Add integration tests for secrets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21652 **[Test build #92408 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92408/testReport)** for PR 21652 at commit [`f0d59cc`](https://github.com/apache/spark/commit/f0d59cc2f6cd966e28e9dfe37922ecba69445c83). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21652: [SPARK-24551][K8s] Add integration tests for secr...
GitHub user skonto opened a pull request: https://github.com/apache/spark/pull/21652 [SPARK-24551][K8s] Add integration tests for secrets ## What changes were proposed in this pull request? - Adds integration tests for env and mount secrets. ## How was this patch tested? Manually by checking that secrets were added to the containers and by tuning the tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/skonto/spark add-secret-its Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21652.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21652 commit 9814eefe7f5a02e24b4750d8bf522e0e711db28f Author: Stavros Kontopoulos Date: 2018-06-28T03:49:32Z add secret tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624] Support mixture of Python UDF and Scalar P...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624] Support mixture of Python UDF and Scalar P...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92400/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624] Support mixture of Python UDF and Scalar P...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #92400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92400/testReport)** for PR 21650 at commit [`6b47b69`](https://github.com/apache/spark/commit/6b47b69305257e9ee9f5135968913a4f92731ef5). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21589 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21589 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92402/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21589 **[Test build #92402 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92402/testReport)** for PR 21589 at commit [`1405daf`](https://github.com/apache/spark/commit/1405daf18f9ae907f36c64e426bf65a3a9e567e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20739: [SPARK-23603][SQL]When the length of the json is ...
Github user cxzl25 closed the pull request at: https://github.com/apache/spark/pull/20739 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20738: [SPARK-23603][SQL]When the length of the json is ...
Github user cxzl25 closed the pull request at: https://github.com/apache/spark/pull/20738 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21451 **[Test build #92407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92407/testReport)** for PR 21451 at commit [`fa1928a`](https://github.com/apache/spark/commit/fa1928aa48655ca2fb036759260cfa71324ed37c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21451 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21451 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/531/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21167: [SPARK-24100][PYSPARK]Add the CompressionCodec to...
Github user WzRaCai closed the pull request at: https://github.com/apache/spark/pull/21167 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21440 **[Test build #92406 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92406/testReport)** for PR 21440 at commit [`4b53667`](https://github.com/apache/spark/commit/4b5366794acc7ef792ecf1a06e9697db79268a67). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21440 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/530/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21440 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624] Support mixture of Python UDF and Scalar P...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624] Support mixture of Python UDF and Scalar P...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92401/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21650: [SPARK-24624] Support mixture of Python UDF and Scalar P...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21650 **[Test build #92401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92401/testReport)** for PR 21650 at commit [`be3b99c`](https://github.com/apache/spark/commit/be3b99c951c3df77eace0a6a124f8f9a94ac804c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21611: [SPARK-24569][SQL] Aggregator with output type Option sh...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21611 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21611: [SPARK-24569][SQL] Aggregator with output type Option sh...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21611 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92403/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21611: [SPARK-24569][SQL] Aggregator with output type Option sh...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21611 **[Test build #92403 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92403/testReport)** for PR 21611 at commit [`f04efa4`](https://github.com/apache/spark/commit/f04efa484e7b5dfbe709f65845bea58e53611604). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21440 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21440 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92399/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21440: [SPARK-24307][CORE] Support reading remote cached partit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21440 **[Test build #92399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92399/testReport)** for PR 21440 at commit [`6c57e4d`](https://github.com/apache/spark/commit/6c57e4d35d76d5f2b618a24bd56d83899eea567e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21597: [SPARK-24603][SQL] Fix findTightestCommonType ref...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21597 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21597: [SPARK-24603][SQL] Fix findTightestCommonType reference ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21597 Merged to master, branch-2.3 and branch-2.2. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21557 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92404/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21557 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21557 **[Test build #92404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92404/testReport)** for PR 21557 at commit [`7ca733b`](https://github.com/apache/spark/commit/7ca733beeb18808e145dc2786f9c2c6c1ec40031). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r198686913 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala --- @@ -25,8 +25,13 @@ import org.apache.spark.util.{Benchmark, Utils} /** * The benchmarks aims to measure performance of JSON parsing when encoding is set and isn't. - * To run this: - * spark-submit --class --jars + * To run: + * mvn clean package -pl sql/core -DskipTests + * ./dev/make-distribution.sh --name local-dist + * cd dist/ + * ./bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks \ + * ../sql/core/target/spark-sql_2.11-2.4.0-SNAPSHOT-tests.jar > /tmp/output.txt --- End diff -- Let's take out other comments like `make-distribution.sh` and `cd dist/` too since they can be varied by how to build. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21451 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92398/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21451 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21451 **[Test build #92398 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92398/testReport)** for PR 21451 at commit [`1cc0f3f`](https://github.com/apache/spark/commit/1cc0f3ffa2b563c54771a38c4dd9f2598b29f0db). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class UploadBlockStream extends BlockTransferMessage ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21597: [SPARK-24603][SQL] Fix findTightestCommonType reference ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21597 **[Test build #92405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92405/testReport)** for PR 21597 at commit [`9a65366`](https://github.com/apache/spark/commit/9a65366a0c9d9e7e57ecdaa0d437af01cbc0d006). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21597: [SPARK-24603][SQL] Fix findTightestCommonType reference ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21597 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21625: [SPARK-24206][SQL][FOLLOW-UP] Update DataSourceRe...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21625 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPru...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21631 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21625: [SPARK-24206][SQL][FOLLOW-UP] Update DataSourceReadBench...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21625 LGTM too Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21631 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21631 LGTM. @MaxGekk please take a following action. Will help and check if it's needed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation
Github user tedyu commented on the issue: https://github.com/apache/spark/pull/21651 cc @tdas --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21533 I think maybe we could: 1) either ignore the files with "local" scheme, and let user to decide how to fetch the files, like what current fix. 2) or copy the 'local' scheme files to the `SparkFiles#getRootDirectory` both in driver and executor. The change would be in `Utils#fetchFile`. @jiangxb1987 @vanzin what's your option? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21557 **[Test build #92404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92404/testReport)** for PR 21557 at commit [`7ca733b`](https://github.com/apache/spark/commit/7ca733beeb18808e145dc2786f9c2c6c1ec40031). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21557 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/529/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to Bisectin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21557 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r198684121 --- Diff: project/MimaExcludes.scala --- @@ -89,7 +89,13 @@ object MimaExcludes { ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasValidationIndicatorCol.validationIndicatorCol"), ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasValidationIndicatorCol.getValidationIndicatorCol"), ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasValidationIndicatorCol.org$apache$spark$ml$param$shared$HasValidationIndicatorCol$_setter_$validationIndicatorCol_="), - ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasValidationIndicatorCol.validationIndicatorCol") + ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasValidationIndicatorCol.validationIndicatorCol"), + +// [SPARK-23429][CORE] Add executor memory metrics to heartbeat and expose in executors REST API + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate.apply"), + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate.copy"), + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate.this"), + ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate$") --- End diff -- Will move up. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21557: [SPARK-24439][ML][PYTHON]Add distanceMeasure to B...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21557#discussion_r198684081 --- Diff: python/pyspark/ml/clustering.py --- @@ -622,10 +621,10 @@ def __init__(self, featuresCol="features", predictionCol="prediction", maxIter=2 @keyword_only @since("2.0.0") def setParams(self, featuresCol="features", predictionCol="prediction", maxIter=20, - seed=None, k=4, minDivisibleClusterSize=1.0): + seed=None, k=4, minDivisibleClusterSize=1.0, distanceMeasure="euclidean"): """ setParams(self, featuresCol="features", predictionCol="prediction", maxIter=20, \ - seed=None, k=4, minDivisibleClusterSize=1.0) + seed=None, k=4, minDivisibleClusterSize=1.0, distanceMeasure="euclidean") Sets params for BisectingKMeans. --- End diff -- @BryanCutler Thank you very much for your review. I will make change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r198683846 --- Diff: core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala --- @@ -251,6 +261,217 @@ class EventLoggingListenerSuite extends SparkFunSuite with LocalSparkContext wit } } + /** + * Test executor metrics update logging functionality. This checks that a + * SparkListenerExecutorMetricsUpdate event is added to the Spark history + * log if one of the executor metrics is larger than any previously + * recorded value for the metric, per executor per stage. The task metrics --- End diff -- Woops, that was left over from when it was ExecutorMetricsUpdated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92395/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r198683408 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala --- @@ -98,14 +102,48 @@ class ExecutorSummary private[spark]( val removeReason: Option[String], val executorLogs: Map[String, String], val memoryMetrics: Option[MemoryMetrics], -val blacklistedInStages: Set[Int]) +val blacklistedInStages: Set[Int], +@JsonSerialize(using = classOf[PeakMemoryMetricsSerializer]) +@JsonDeserialize(using = classOf[PeakMemoryMetricsDeserializer]) +val peakMemoryMetrics: Option[Array[Long]]) class MemoryMetrics private[spark]( val usedOnHeapStorageMemory: Long, val usedOffHeapStorageMemory: Long, val totalOnHeapStorageMemory: Long, val totalOffHeapStorageMemory: Long) +/** deserialzer for peakMemoryMetrics: convert to array ordered by metric name */ +class PeakMemoryMetricsDeserializer private[spark] extends JsonDeserializer[Option[Array[Long]]] { --- End diff -- This is odd, but I can't seem to comment on your earlier comment. Regarding having a serializer/deserializer, I also don't have strong feelings -- it makes it more readable, but also takes up more space in the history log. Regarding this comment, thanks, I hadn't realized the placement meant that it marked the constructor. It's meant for the class, and I'll move. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #92395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92395/testReport)** for PR 21546 at commit [`fe3319b`](https://github.com/apache/spark/commit/fe3319bd7ab290e30f6075a81acd0b17818ad546). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class BatchOrderSerializer(Serializer):` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation
Github user ConcurrencyPractitioner commented on the issue: https://github.com/apache/spark/pull/21651 I am uncertain about some of the ways we should transfer the data stored in OffsetSeqs to external storage (e.g. like KafkaSink which I mentioned before). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r198682917 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -264,6 +282,11 @@ private[spark] trait SparkListenerInterface { */ def onExecutorMetricsUpdate(executorMetricsUpdate: SparkListenerExecutorMetricsUpdate): Unit + /** + * Called when the driver reads stage executor metrics from the history log. --- End diff -- Updated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r198682980 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala --- @@ -669,6 +686,29 @@ private[spark] class AppStatusListener( } } } +event.executorUpdates.foreach { updates: Array[Long] => + // check if there is a new peak value for any of the executor level memory metrics + liveExecutors.get(event.execId).foreach { exec: LiveExecutor => +if (exec.peakExecutorMetrics.compareAndUpdate(updates)) { + maybeUpdate(exec, now) +} + } +} + } + + override def onStageExecutorMetrics(executorMetrics: SparkListenerStageExecutorMetrics): Unit = { +val now = System.nanoTime() + +// check if there is a new peak value for any of the executor level memory metrics --- End diff -- Unfortunately, yes. I've added some comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r198682809 --- Diff: core/src/main/scala/org/apache/spark/metrics/MetricGetter.scala --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.metrics + +import java.lang.management.{BufferPoolMXBean, ManagementFactory} +import javax.management.ObjectName + +import org.apache.spark.memory.MemoryManager + +sealed trait MetricGetter { --- End diff -- Added. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21533#discussion_r198682844 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1519,7 +1519,12 @@ class SparkContext(config: SparkConf) extends Logging { def addFile(path: String, recursive: Boolean): Unit = { val uri = new Path(path).toUri val schemeCorrectedPath = uri.getScheme match { - case null | "local" => new File(path).getCanonicalFile.toURI.toString + case null => new File(path).getCanonicalFile.toURI.toString + case "local" => +logWarning("We do not support add a local file here because file with local scheme is " + + "already existed on every node, there is no need to call addFile to add it again. " + + "(See more discussion about this in SPARK-24195.)") --- End diff -- Can we please rephrase to "File with 'local' scheme is not supported to add to file server, since it is already available on every node."? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation
Github user ConcurrencyPractitioner commented on the issue: https://github.com/apache/spark/pull/21651 cc @koeninger --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r198682884 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -169,6 +181,28 @@ private[spark] class EventLoggingListener( // Events that trigger a flush override def onStageCompleted(event: SparkListenerStageCompleted): Unit = { +if (shouldLogExecutorMetricsUpdates) { + // clear out any previous attempts, that did not have a stage completed event + val prevAttemptId = event.stageInfo.attemptNumber() - 1 + for (attemptId <- 0 to prevAttemptId) { +liveStageExecutorMetrics.remove((event.stageInfo.stageId, attemptId)) + } + + // log the peak executor metrics for the stage, for each live executor, + // whether or not the executor is running tasks for the stage + val executorMap = liveStageExecutorMetrics.remove( +(event.stageInfo.stageId, event.stageInfo.attemptNumber())) + executorMap.foreach { + executorEntry => { + for ((executorId, peakExecutorMetrics) <- executorEntry) { --- End diff -- Yes, the naming is confusing. Changed to the 1st option. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r198682779 --- Diff: core/src/main/scala/org/apache/spark/Heartbeater.scala --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import java.util.concurrent.TimeUnit + +import org.apache.spark.internal.Logging +import org.apache.spark.memory.MemoryManager +import org.apache.spark.metrics.MetricGetter +import org.apache.spark.util.{ThreadUtils, Utils} + +/** + * Creates a heartbeat thread which will call the specified reportHeartbeat function at + * intervals of intervalMs. + * + * @param memoryManager the memory manager for execution and storage memory. + * @param reportHeartbeat the heartbeat reporting function to call. + * @param name the thread name for the heartbeater. + * @param intervalMs the interval between heartbeats. + */ +private[spark] class Heartbeater( +memoryManager: MemoryManager, +reportHeartbeat: () => Unit, +name: String, +intervalMs: Long) extends Logging { + // Executor for the heartbeat task + private val heartbeater = ThreadUtils.newDaemonSingleThreadScheduledExecutor(name) + + /** Schedules a task to report a heartbeat. */ + private[spark] def start(): Unit = { --- End diff -- Removed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r198682787 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1922,6 +1928,12 @@ class SparkContext(config: SparkConf) extends Logging { Utils.tryLogNonFatalError { _eventLogger.foreach(_.stop()) } +if(_heartbeater != null) { --- End diff -- Added. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21631 @HyukjinKwon BTW, can you check this? @MaxGekk Probably, I feel you'd be better to file a new jira for the point you're looking into. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21651 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21651 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org