[GitHub] spark issue #23206: [SPARK-26249][SQL] Add ability to inject a rule in order...

2018-12-05 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23206
  
cc @viirya @maropu 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23232: [SPARK-26233][SQL][BACKPORT-2.4] CheckOverflow when enco...

2018-12-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23232
  
I merged all three PRs (2.4/2.3/2.2). Please close the PRs. :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23240: [SPARK-26281][WebUI] Duration column of task tabl...

2018-12-05 Thread gengliangwang
GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/23240

[SPARK-26281][WebUI] Duration column of task table should be executor run 
time instead of real duration

## What changes were proposed in this pull request?

In PR https://github.com/apache/spark/pull/23081/ , the duration column is 
changed to executor run time. The behavior is consistent with the summary 
metrics table and previous Spark version.

However, after PR https://github.com/apache/spark/pull/21688, the issue can 
be reproduced again.

## How was this patch tested?

Before the change, we can see:

1. The minimum duration in aggregation table doesn't match with the task 
table below.
2. The sorting order is wrong.

![image](https://user-images.githubusercontent.com/1097932/49533048-f7eecb80-f8f8-11e8-9256-2eb524e81be0.png)

After the change, the issues are fixed:

![image](https://user-images.githubusercontent.com/1097932/49533069-06d57e00-f8f9-11e8-872b-402e3014f557.png)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark fixDuration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23240.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23240


commit 612c4c7242f6289d3a1e424a69951be25cd126af
Author: Gengliang Wang 
Date:   2018-12-05T17:44:55Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...

2018-12-05 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/23240
  
@shahidki31 @pgandhi999 @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23240
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23240
  
**[Test build #99739 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99739/testReport)**
 for PR 23240 at commit 
[`612c4c7`](https://github.com/apache/spark/commit/612c4c7242f6289d3a1e424a69951be25cd126af).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23240
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5779/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23223: [SPARK-26269][YARN]Yarnallocator should have same...

2018-12-05 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/23223#discussion_r239173889
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala
 ---
@@ -417,4 +426,59 @@ class YarnAllocatorSuite extends SparkFunSuite with 
Matchers with BeforeAndAfter
 clock.advance(50 * 1000L)
 handler.getNumExecutorsFailed should be (0)
   }
+
+  test("SPARK-26296: YarnAllocator should have same blacklist behaviour 
with YARN") {
+val rmClientSpy = spy(rmClient)
+val maxExecutors = 11
+
+val handler = createAllocator(
+  maxExecutors,
+  rmClientSpy,
+  Map(
+"spark.yarn.blacklist.executor.launch.blacklisting.enabled" -> 
"true",
+"spark.blacklist.application.maxFailedExecutorsPerNode" -> "0"))
+handler.updateResourceRequests()
+
+val hosts = (0 until maxExecutors).map(i => s"host$i")
+val ids = (0 to maxExecutors).map(i => 
ContainerId.newContainerId(appAttemptId, i))
+val containers = createContainers(hosts, ids)
+handler.handleAllocatedContainers(containers.slice(0, 9))
+val cs0 = ContainerStatus.newInstance(containers(0).getId, 
ContainerState.COMPLETE,
+  "success", ContainerExitStatus.SUCCESS)
+val cs1 = ContainerStatus.newInstance(containers(1).getId, 
ContainerState.COMPLETE,
+  "preempted", ContainerExitStatus.PREEMPTED)
+val cs2 = ContainerStatus.newInstance(containers(2).getId, 
ContainerState.COMPLETE,
+  "killed_exceeded_vmem", ContainerExitStatus.KILLED_EXCEEDED_VMEM)
+val cs3 = ContainerStatus.newInstance(containers(3).getId, 
ContainerState.COMPLETE,
+  "killed_exceeded_pmem", ContainerExitStatus.KILLED_EXCEEDED_PMEM)
+val cs4 = ContainerStatus.newInstance(containers(4).getId, 
ContainerState.COMPLETE,
+  "killed_by_resourcemanager", 
ContainerExitStatus.KILLED_BY_RESOURCEMANAGER)
+val cs5 = ContainerStatus.newInstance(containers(5).getId, 
ContainerState.COMPLETE,
+  "killed_by_appmaster", ContainerExitStatus.KILLED_BY_APPMASTER)
+val cs6 = ContainerStatus.newInstance(containers(6).getId, 
ContainerState.COMPLETE,
+  "killed_after_app_completion", 
ContainerExitStatus.KILLED_AFTER_APP_COMPLETION)
+val cs7 = ContainerStatus.newInstance(containers(7).getId, 
ContainerState.COMPLETE,
+  "aborted", ContainerExitStatus.ABORTED)
+val cs8 = ContainerStatus.newInstance(containers(8).getId, 
ContainerState.COMPLETE,
+  "disk_failed", ContainerExitStatus.DISKS_FAILED)
--- End diff --

just a suggestion, you can avoid some repetition here

```scala
val nonBlacklistedStatuses = Seq(ContainerExitStatus.SUCCESSS, ..., 
ContainerExitStatus.DISKS_FAILED)
val containerStatuses = nonBlacklistedStatus.zipWithIndex.map { case 
(state, idx) =>
  ContainerStatus.newInstance(containers(idx).getId, 
ContainerState.COMPLETE, "diagnostics", state)
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...

2018-12-05 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23240
  
Hi @gengliangwang , It seems, this was already handled in the PR, 
https://github.com/apache/spark/pull/23160


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...

2018-12-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23195#discussion_r239177994
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -624,3 +624,199 @@ For experimenting on `spark-shell`, you can also use 
`--packages` to add `spark-
 
 See [Application Submission Guide](submitting-applications.html) for more 
details about submitting
 applications with external dependencies.
+
+## Security
+
+Kafka 0.9.0.0 introduced several features that increases security in a 
cluster. For detailed
+description about these possibilities, see [Kafka security 
docs](http://kafka.apache.org/documentation.html#security).
+
+It's worth noting that security is optional and turned off by default.
+
+Spark supports the following ways to authenticate against Kafka cluster:
+- **Delegation token (introduced in Kafka broker 1.1.0)**
+- **JAAS login configuration**
+
+### Delegation token
+
+This way the application can be configured via Spark parameters and may 
not need JAAS login
+configuration (Spark can use Kafka's dynamic JAAS configuration feature). 
For further information
+about delegation tokens, see [Kafka delegation token 
docs](http://kafka.apache.org/documentation/#security_delegation_token).
+
+The process is initiated by Spark's Kafka delegation token provider. When 
`spark.kafka.bootstrap.servers`,
+Spark considers the following log in options, in order of preference:
+- **JAAS login configuration**
+- **Keytab file**, such as,
+
+  ./bin/spark-submit \
+  --keytab  \
+  --principal  \
+  --conf spark.kafka.bootstrap.servers= \
+  ...
+
+- **Kerberos credential cache**, such as,
+
+  ./bin/spark-submit \
+  --conf spark.kafka.bootstrap.servers= \
+  ...
+
+The Kafka delegation token provider can be turned off by setting 
`spark.security.credentials.kafka.enabled` to `false` (default: `true`).
+
+Spark can be configured to use the following authentication protocols to 
obtain token (it must match with
+Kafka broker configuration):
+- **SASL SSL (default)**
+- **SSL**
+- **SASL PLAINTEXT (for testing)**
+
+After obtaining delegation token successfully, Spark distributes it across 
nodes and renews it accordingly.
+Delegation token uses `SCRAM` login module for authentication and because 
of that the appropriate
+`sasl.mechanism` has to be configured on source/sink:
+
+
+
+{% highlight scala %}
+
+// Setting on Kafka Source for Streaming Queries
--- End diff --

I think having just one example should be enough.

Is `SCRAM-SHA-512` the only possible value? I think you mentioned different 
values before. If this needs to match the broker's configuration, that needs to 
be mentioned.

Separately, it would be nice to think about having an external config for 
this so people don't need to hardcode this kind of thing in their code...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23207
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99722/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23228: [MINOR][DOC]The condition description of serialized shuf...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23228
  
**[Test build #4453 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4453/testReport)**
 for PR 23228 at commit 
[`d5dadbf`](https://github.com/apache/spark/commit/d5dadbf30d5429c36ec3d5c2845a71c2717fd6f3).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23223: [SPARK-26269][YARN]Yarnallocator should have same...

2018-12-05 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/23223#discussion_r239174670
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 ---
@@ -612,11 +612,14 @@ private[yarn] class YarnAllocator(
 val message = "Container killed by YARN for exceeding physical 
memory limits. " +
   s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key}."
 (true, message)
+  case exit_status if 
NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(exit_status) =>
--- End diff --

also after this gets rearranged, I'd leave a comment in here pointing to 
the code in hadoop you linked to on the jira.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-05 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/23218
  
https://issues.apache.org/jira/browse/SPARK-26282


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    2   3   4   5   6   7