[GitHub] spark issue #23206: [SPARK-26249][SQL] Add ability to inject a rule in order...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23206 cc @viirya @maropu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23232: [SPARK-26233][SQL][BACKPORT-2.4] CheckOverflow when enco...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23232 I merged all three PRs (2.4/2.3/2.2). Please close the PRs. :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23240: [SPARK-26281][WebUI] Duration column of task tabl...
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/23240 [SPARK-26281][WebUI] Duration column of task table should be executor run time instead of real duration ## What changes were proposed in this pull request? In PR https://github.com/apache/spark/pull/23081/ , the duration column is changed to executor run time. The behavior is consistent with the summary metrics table and previous Spark version. However, after PR https://github.com/apache/spark/pull/21688, the issue can be reproduced again. ## How was this patch tested? Before the change, we can see: 1. The minimum duration in aggregation table doesn't match with the task table below. 2. The sorting order is wrong. ![image](https://user-images.githubusercontent.com/1097932/49533048-f7eecb80-f8f8-11e8-9256-2eb524e81be0.png) After the change, the issues are fixed: ![image](https://user-images.githubusercontent.com/1097932/49533069-06d57e00-f8f9-11e8-872b-402e3014f557.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark fixDuration Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23240.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23240 commit 612c4c7242f6289d3a1e424a69951be25cd126af Author: Gengliang Wang Date: 2018-12-05T17:44:55Z fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/23240 @shahidki31 @pgandhi999 @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23240 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23240 **[Test build #99739 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99739/testReport)** for PR 23240 at commit [`612c4c7`](https://github.com/apache/spark/commit/612c4c7242f6289d3a1e424a69951be25cd126af). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23240 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5779/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23223: [SPARK-26269][YARN]Yarnallocator should have same...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/23223#discussion_r239173889 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala --- @@ -417,4 +426,59 @@ class YarnAllocatorSuite extends SparkFunSuite with Matchers with BeforeAndAfter clock.advance(50 * 1000L) handler.getNumExecutorsFailed should be (0) } + + test("SPARK-26296: YarnAllocator should have same blacklist behaviour with YARN") { +val rmClientSpy = spy(rmClient) +val maxExecutors = 11 + +val handler = createAllocator( + maxExecutors, + rmClientSpy, + Map( +"spark.yarn.blacklist.executor.launch.blacklisting.enabled" -> "true", +"spark.blacklist.application.maxFailedExecutorsPerNode" -> "0")) +handler.updateResourceRequests() + +val hosts = (0 until maxExecutors).map(i => s"host$i") +val ids = (0 to maxExecutors).map(i => ContainerId.newContainerId(appAttemptId, i)) +val containers = createContainers(hosts, ids) +handler.handleAllocatedContainers(containers.slice(0, 9)) +val cs0 = ContainerStatus.newInstance(containers(0).getId, ContainerState.COMPLETE, + "success", ContainerExitStatus.SUCCESS) +val cs1 = ContainerStatus.newInstance(containers(1).getId, ContainerState.COMPLETE, + "preempted", ContainerExitStatus.PREEMPTED) +val cs2 = ContainerStatus.newInstance(containers(2).getId, ContainerState.COMPLETE, + "killed_exceeded_vmem", ContainerExitStatus.KILLED_EXCEEDED_VMEM) +val cs3 = ContainerStatus.newInstance(containers(3).getId, ContainerState.COMPLETE, + "killed_exceeded_pmem", ContainerExitStatus.KILLED_EXCEEDED_PMEM) +val cs4 = ContainerStatus.newInstance(containers(4).getId, ContainerState.COMPLETE, + "killed_by_resourcemanager", ContainerExitStatus.KILLED_BY_RESOURCEMANAGER) +val cs5 = ContainerStatus.newInstance(containers(5).getId, ContainerState.COMPLETE, + "killed_by_appmaster", ContainerExitStatus.KILLED_BY_APPMASTER) +val cs6 = ContainerStatus.newInstance(containers(6).getId, ContainerState.COMPLETE, + "killed_after_app_completion", ContainerExitStatus.KILLED_AFTER_APP_COMPLETION) +val cs7 = ContainerStatus.newInstance(containers(7).getId, ContainerState.COMPLETE, + "aborted", ContainerExitStatus.ABORTED) +val cs8 = ContainerStatus.newInstance(containers(8).getId, ContainerState.COMPLETE, + "disk_failed", ContainerExitStatus.DISKS_FAILED) --- End diff -- just a suggestion, you can avoid some repetition here ```scala val nonBlacklistedStatuses = Seq(ContainerExitStatus.SUCCESSS, ..., ContainerExitStatus.DISKS_FAILED) val containerStatuses = nonBlacklistedStatus.zipWithIndex.map { case (state, idx) => ContainerStatus.newInstance(containers(idx).getId, ContainerState.COMPLETE, "diagnostics", state) } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23240 Hi @gengliangwang , It seems, this was already handled in the PR, https://github.com/apache/spark/pull/23160 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r239177994 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,199 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)** +- **JAAS login configuration** + +### Delegation token + +This way the application can be configured via Spark parameters and may not need JAAS login +configuration (Spark can use Kafka's dynamic JAAS configuration feature). For further information +about delegation tokens, see [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + +The process is initiated by Spark's Kafka delegation token provider. When `spark.kafka.bootstrap.servers`, +Spark considers the following log in options, in order of preference: +- **JAAS login configuration** +- **Keytab file**, such as, + + ./bin/spark-submit \ + --keytab \ + --principal \ + --conf spark.kafka.bootstrap.servers= \ + ... + +- **Kerberos credential cache**, such as, + + ./bin/spark-submit \ + --conf spark.kafka.bootstrap.servers= \ + ... + +The Kafka delegation token provider can be turned off by setting `spark.security.credentials.kafka.enabled` to `false` (default: `true`). + +Spark can be configured to use the following authentication protocols to obtain token (it must match with +Kafka broker configuration): +- **SASL SSL (default)** +- **SSL** +- **SASL PLAINTEXT (for testing)** + +After obtaining delegation token successfully, Spark distributes it across nodes and renews it accordingly. +Delegation token uses `SCRAM` login module for authentication and because of that the appropriate +`sasl.mechanism` has to be configured on source/sink: + + + +{% highlight scala %} + +// Setting on Kafka Source for Streaming Queries --- End diff -- I think having just one example should be enough. Is `SCRAM-SHA-512` the only possible value? I think you mentioned different values before. If this needs to match the broker's configuration, that needs to be mentioned. Separately, it would be nice to think about having an external config for this so people don't need to hardcode this kind of thing in their code... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99722/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23228: [MINOR][DOC]The condition description of serialized shuf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23228 **[Test build #4453 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4453/testReport)** for PR 23228 at commit [`d5dadbf`](https://github.com/apache/spark/commit/d5dadbf30d5429c36ec3d5c2845a71c2717fd6f3). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23223: [SPARK-26269][YARN]Yarnallocator should have same...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/23223#discussion_r239174670 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -612,11 +612,14 @@ private[yarn] class YarnAllocator( val message = "Container killed by YARN for exceeding physical memory limits. " + s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key}." (true, message) + case exit_status if NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(exit_status) => --- End diff -- also after this gets rearranged, I'd leave a comment in here pointing to the code in hadoop you linked to on the jira. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/23218 https://issues.apache.org/jira/browse/SPARK-26282 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org