date:20180905

[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-05 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/22192
  
@vanzin @NiharS I am uncomfortable with that change as well - which is why 
I wanted the initialization to be pushed into a separate thread (and then join) 
- if we really need to set the context classloader.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22318: [SPARK-25150][SQL] Rewrite condition when dedupli...

2018-09-05 Thread peter-toth

Github user peter-toth commented on a diff in the pull request:

https://github.com/apache/spark/pull/22318#discussion_r215499599
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeMap.scala
 ---
@@ -23,12 +23,14 @@ package org.apache.spark.sql.catalyst.expressions
  * of the name, or the expected nullability).
  */
 object AttributeMap {
+  val empty = new AttributeMap(Map.empty)
--- End diff --

@gatorsmile this was the initial version. then came the idea of @mgaido91 
and @maropu to use val (instead of def) either here or in ResolveReferences to 
spare some empty() calls


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22337: [SPARK-25338][Test][kafka][kinesis][flume] Ensure to cal...

2018-09-05 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22337
  
Sure, I just focused on files under `external`. Let me address other files, 
too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...

2018-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22344
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2887/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...

2018-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22344
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...

2018-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22344
  
**[Test build #95737 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95737/testReport)**
 for PR 22344 at commit 
[`8d49c1a`](https://github.com/apache/spark/commit/8d49c1afdbd6c0219d6cc182e53311201f73489f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22344: [SPARK-25352][SQL] Perform ordered global limit w...

2018-09-05 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/22344

[SPARK-25352][SQL] Perform ordered global limit when limit number is bigger 
than topKSortFallbackThreshold

## What changes were proposed in this pull request?

We have optimization on global limit to evenly distribute limit rows across 
all partitions. This optimization doesn't work for ordered results.

For a query ending with sort + limit, in most cases it is performed by 
`TakeOrderedAndProjectExec`.

But if limit number is bigger than `SQLConf.TOP_K_SORT_FALLBACK_THRESHOLD`, 
global limit will be used. At this moment, we need to do ordered global limit.

## How was this patch tested?

Unit tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-25352

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22344.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22344


commit 8d49c1afdbd6c0219d6cc182e53311201f73489f
Author: Liang-Chi Hsieh 
Date:   2018-09-06T04:43:15Z

Do ordered global limit when limit number is bigger than 
topKSortFallbackThreshold.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as function i...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18142
  
BTW, this is changed at Spark 2.3.0. How did we handle this before?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...

2018-09-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/6


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22226: [SPARK-25252][SQL] Support arrays of any types by to_jso...

2018-09-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/6
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22340: [SPARK-25337][SQL][TEST] `runSparkSubmit` should provide...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22340
  
It's great to have this in Spark 2.4!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22340: [SPARK-25337][SQL][TEST] `runSparkSubmit` should provide...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22340
  
Thank you for merging, @srowen .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22337: [SPARK-25338][Test][kafka][kinesis][flume] Ensure to cal...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22337
  
Could you check the other suites like 
`ExternalAppendOnlyUnsafeRowArraySuite` and `TakeOrderedAndProjectSuite`, too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22337: [SPARK-25338][Test][kafka][kinesis][flume] Ensure...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22337#discussion_r215491592
  
--- Diff: 
external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisStreamSuite.scala
 ---
@@ -83,6 +83,7 @@ abstract class KinesisStreamTests(aggregateTestData: 
Boolean) extends KinesisFun
   testUtils.deleteStream()
   testUtils.deleteDynamoDBTable(appName)
 }
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22337: [SPARK-25338][Test][kafka][kinesis][flume] Ensure...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22337#discussion_r215491496
  
--- Diff: 
external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisInputDStreamBuilderSuite.scala
 ---
@@ -41,6 +41,7 @@ class KinesisInputDStreamBuilderSuite extends 
TestSuiteBase with BeforeAndAfterE
 
   override def afterAll(): Unit = {
 ssc.stop()
--- End diff --

Here, too. Let's use `try` to make it sure the invocation of 
`super.afterAll()`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22340: [SPARK-25337][SQL][TEST] `runSparkSubmit` should ...

2018-09-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22340


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22337: [SPARK-25338][Test][kafka][kinesis][flume] Ensure...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22337#discussion_r215490599
  
--- Diff: 
external/flume/src/test/scala/org/apache/spark/streaming/flume/FlumePollingStreamSuite.scala
 ---
@@ -58,6 +58,7 @@ class FlumePollingStreamSuite extends SparkFunSuite with 
BeforeAndAfterAll with
   _sc.stop()
   _sc = null
 }
--- End diff --

It seems that we need `try` for `_sc.stop()`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22340: [SPARK-25337][SQL][TEST] `runSparkSubmit` should provide...

2018-09-05 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22340
  
Merged to master (which looks like is still 2.4)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22337: [SPARK-25338][Test][kafka][kinesis][flume] Ensure to cal...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22337
  
Hi, @kiszk . Oh, did you check them all? Are these all of them?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22339: SPARK-17159 Significant speed up for running spark strea...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22339
  
Hi, @ScrapCodes . Could you do the followings?
- Update the title to `[SPARK-17159][SS]...`
- Remove `Please review http://spark.apache.org/contributing.html ` 
from PR description
- Share the numbers because the PR title has `Significant speed up`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22302: [SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test f...

2018-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22302
  
**[Test build #95736 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95736/testReport)**
 for PR 22302 at commit 
[`c3c5af2`](https://github.com/apache/spark/commit/c3c5af2b9f27e4effcc8f44f1b98f24a6d1fafa7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2018-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18853
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95733/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2018-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18853
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2018-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18853
  
**[Test build #95733 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95733/testReport)**
 for PR 18853 at commit 
[`d0a2089`](https://github.com/apache/spark/commit/d0a2089b8c52a3be3977f0bfc5538758ef7d2b55).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22340: [SPARK-25337][SQL][TEST] `runSparkSubmit` should provide...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22340
  
Thank you, @HyukjinKwon !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22302: [SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test f...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22302
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22302: [SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test f...

2018-09-05 Thread fjh100456

Github user fjh100456 commented on the issue:

https://github.com/apache/spark/pull/22302
  
The pr [#20120](https://github.com/apache/spark/pull/20120) was closed, 
instead [SPARK-23355](https://issues.apache.org/jira/browse/SPARK-23355) had 
resolved it.
Does it mean the same thing 
[#21269](https://github.com/apache/spark/pull/21269/files)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-05 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r215485269
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
 
 val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v")
-intercept[RuntimeException] {
+intercept[Exception] {
--- End diff --

@HyukjinKwon We get a SparkException here which in turn wraps a 
RuntimeException. When we have ConvertToLocalRelation active, we get a 
RuntimeException from driver. But when we disable it, the error is raised from 
the executor with a SparkException as the top level exception. Thats the reason 
i changed it to intercept `Exception` so that this test can run both when the 
rule is active vs when its not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22270
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22270
  
**[Test build #95735 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95735/testReport)**
 for PR 22270 at commit 
[`507f89c`](https://github.com/apache/spark/commit/507f89c4b7c2ee7ae4ad352cd66d4c2bed4adba5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

2018-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22329
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95734/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22270
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2886/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

2018-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22329
  
**[Test build #95734 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95734/testReport)**
 for PR 22329 at commit 
[`1f342aa`](https://github.com/apache/spark/commit/1f342aa7158bc2440f504b7cb47b692fcdcce41d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

2018-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22329
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-05 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r215485064
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
 
 val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v")
-intercept[RuntimeException] {
+intercept[Exception] {
--- End diff --

@gatorsmile Thanks.. I am checking the message now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-05 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r215484920
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
 
 val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v")
-intercept[RuntimeException] {
+intercept[Exception] {
   df5.select(map_from_arrays($"k", $"v")).collect
--- End diff --

@maropu We get a SparkException here which in turn wraps a 
RuntimeException. When we have ConvertToLocalRelation active, we get a 
RuntimeException from driver. But when we disable it, the error is raised from 
the executor with a SparkException as the top level exception.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22140
  
@BryanCutler, for 
https://github.com/apache/spark/pull/22140#issuecomment-414802978, yea, to me 
it looks less sense actually but seems at least working for now:

```python
from pyspark.sql import Row
rowClass = Row("c1", "c2")
spark.createDataFrame([rowClass(1)]).show()
```

```
+---+
| c1|
+---+
|  1|
+---+
```

I think we should consider disallowing it in 3.0.0 given the test above.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22271: [SPARK-25268][GraphX]run Parallel Personalized PageRank ...

2018-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22271
  
**[Test build #4332 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4332/testReport)**
 for PR 22271 at commit 
[`7651652`](https://github.com/apache/spark/commit/76516529e31a5507afc0a1de31b30104a4d7c6ad).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22140: [SPARK-25072][PySpark] Forbid extra value for cus...

2018-09-05 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22140#discussion_r215483530
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -269,6 +269,10 @@ def test_struct_field_type_name(self):
 struct_field = StructField("a", IntegerType())
 self.assertRaises(TypeError, struct_field.typeName)
 
+def test_invalid_create_row(slef):
+rowClass = Row("c1", "c2")
--- End diff --

nit: `rowClass` -> `row_class`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22340: [SPARK-25337][SQL][TEST] `runSparkSubmit` should provide...

2018-09-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22340
  
Seems okay to me too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22287: [SPARK-25135][SQL] FileFormatWriter should respec...

2018-09-05 Thread wangyum

Github user wangyum closed the pull request at:

https://github.com/apache/spark/pull/22287


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...

2018-09-05 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22320
  
@gengliangwang We need backport this pr to branch-2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

2018-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22329
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

2018-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22329
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2885/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22307: [SPARK-25301][SQL] When a view uses an UDF from a non de...

2018-09-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22307
  
@vinod, see the discussion made in 
https://github.com/apache/spark/pull/18142. Shall we close this? cc @cloud-fan 
as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as function i...

2018-09-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18142
  
I agree with this change too for clarification.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

2018-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22329
  
**[Test build #95734 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95734/testReport)**
 for PR 22329 at commit 
[`1f342aa`](https://github.com/apache/spark/commit/1f342aa7158bc2440f504b7cb47b692fcdcce41d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as function i...

2018-09-05 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18142
  
@HyukjinKwon Thanks for the note! I think this behavior is better, I'm 
adding a `release_note` tag to the JIRA ticket, so that we don't forget to 
mention it in release notes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...

2018-09-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22320


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22320: [SPARK-25313][SQL]Fix regression in FileFormatWriter out...

2018-09-05 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22320
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22320: [SPARK-25313][SQL]Fix regression in FileFormatWri...

2018-09-05 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22320#discussion_r215479502
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -82,7 +83,7 @@ case class CreateHiveTableAsSelectCommand(
   query,
   overwrite = true,
   ifPartitionNotExists = false,
-  outputColumns = outputColumns).run(sparkSession, child)
+  outputColumnNames = outputColumnNames).run(sparkSession, child)
--- End diff --

I feel it's better to specify parameters by name if the previous parameter 
is already specified by name, e.g. `ifPartitionNotExists = false`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory operat...

2018-09-05 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/21968
  
cc @cloud-fan @hvanhovell @maropu


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-09-05 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/21860
  
cc @cloud-fan @hvanhovell @maropu 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22302: [SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test f...

2018-09-05 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22302
  
btw, the behaivour differences of `TABLEPROPERTIES` and `OPTIONS` have 
already been documented somewhere? 
https://github.com/apache/spark/pull/20120#issuecomment-354704862


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as function i...

2018-09-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18142
  


I am leaving a note (at least to myself) since it looked anyhow caused 
behaviour change.


With this statements in Hive side:

```sql
CREATE TABLE emp AS SELECT 'user' AS name, 'address' as address;
CREATE DATABASE d100;
CREATE FUNCTION d100.udf100 AS 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper';
```


**Hive**


```
hive> SELECT d100.udf100(`emp`.`name`) FROM `emp`;
USER
hive> SELECT `d100.udf100`(`emp`.`name`) FROM `emp`;
USER
```


**Spark**

Before:

```
scala> spark.sql("SELECT d100.udf100(`emp`.`name`) FROM `emp`").show
+-+
|d100.udf100(name)|
+-+
| USER|
+-+


scala> spark.sql("SELECT `d100.udf100`(`emp`.`name`) FROM `emp`").show
+-+
|d100.udf100(name)|
+-+
| USER|
+-+
```

After:

```
scala> spark.sql("SELECT d100.udf100(`emp`.`name`) FROM `emp`").show
+-+
|d100.udf100(name)|
+-+
| USER|
+-+


scala> spark.sql("SELECT `d100.udf100`(`emp`.`name`) FROM `emp`").show
org.apache.spark.sql.AnalysisException: Undefined function: 'd100.udf100'. 
This function is neither a registered temporary function nor a permanent 
function registered in the database 'default'.; line 1 pos 7
```


**MySQL**

This change causes a inconsistency with Hive although looks consistent 
compating to MySQL.

```
mysql> SELECT `d100.udf100`(`emp`.`name`) FROM `emp`;
ERROR 1305 (42000): FUNCTION hkwon.d100.udf100 does not exist
mysql> SELECT d100.udf100(`emp`.`name`) FROM `emp`;
+---+
| d100.udf100(`emp`.`name`) |
+---+
| Hello, user!  |
+---+
1 row in set (0.01 sec)
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22302: [SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test f...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22302
  
nit. Maybe, it seems that you removed `##`. You may want to simplify the PR 
description. This only re-enables the test cases of previous your PR.
```
## What changes were proposed in this pull request?
...

## How was this patch tested?
...
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22302: [SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test f...

2018-09-05 Thread fjh100456

Github user fjh100456 commented on the issue:

https://github.com/apache/spark/pull/22302
  
@dongjoon-hyun @maropu  I'm so sorry, and I have changed the description. 
Is it ok now ? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22327: [SPARK-25330][BUILD] Revert Hadoop 2.7 to 2.7.3

2018-09-05 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22327
  
I think that is reasonable personally


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread ifilonenko

Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215471255
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ---
@@ -60,7 +62,23 @@ private[spark] case class KubernetesConf[T <: 
KubernetesRoleSpecificConf](
 roleSecretEnvNamesToKeyRefs: Map[String, String],
 roleEnvs: Map[String, String],
 roleVolumes: Iterable[KubernetesVolumeSpec[_ <: 
KubernetesVolumeSpecificConf]],
-sparkFiles: Seq[String]) {
+sparkFiles: Seq[String],
+hadoopConfDir: Option[String]) {
+
+  def getHadoopConfigMapName: String = 
s"$appResourceNamePrefix-hadoop-config"
+
+  def getKRBConfigMapName: String = s"$appResourceNamePrefix-krb5-file"
+
+  def getHadoopStepsOrchestrator : Option[HadoopStepsOrchestrator] = 
hadoopConfDir.map {
+hConf => new HadoopStepsOrchestrator(
+  sparkConf,
+  appResourceNamePrefix,
+  hConf,
+  getHadoopConfigMapName,
+  getTokenManager.isSecurityEnabled)}
+
+  def getTokenManager : KubernetesHadoopDelegationTokenManager =
--- End diff --

The `KubernetesHadoopDelegationTokenManager` is crucial for doing token 
retrieval and needs to be done in the Feature steps. As such, must be retrieved 
via the `KubernetesConf`. Sadly, I don't see another way given the design of 
the feature steps. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Struc...

2018-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22282
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread ifilonenko

Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215471079
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -225,6 +225,60 @@ private[spark] object Config extends Logging {
 "Ensure that major Python version is either Python2 or Python3")
   .createWithDefault("2")
 
+  val KUBERNETES_KERBEROS_PROXY_USER =
--- End diff --

Ah, this was in reference to @skonto who was interested in being able to 
handle the case of using a proxy user with kerberos. I decided to move this 
feature addition into a separate PR but left this config. I'll remove this for 
the sake of proper separation between PR features. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Struc...

2018-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22282
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95731/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Struc...

2018-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22282
  
**[Test build #95731 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95731/testReport)**
 for PR 22282 at commit 
[`b0f64e9`](https://github.com/apache/spark/commit/b0f64e91cb4f6306a7c0c60d4a17f1a0aacb3a51).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread ifilonenko

Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215470727
  
--- Diff: docs/security.md ---
@@ -722,6 +722,62 @@ with encryption, at least.
 The Kerberos login will be periodically renewed using the provided 
credentials, and new delegation
 tokens for supported will be created.
 
+## Secure Interaction with Kubernetes
+
+When talking to Hadoop-based services behind Kerberos, it was noted that 
Spark needs to obtain delegation tokens
+so that non-local processes can authenticate. These delegation tokens in 
Kubernetes are stored in Secrets that are 
+shared by the Driver and its Executors. As such, there are three ways of 
submitting a kerberos job: 
+
+1. Submitting with a $kinit that stores a TGT in the Local Ticket Cache:
+```bash
+/usr/bin/kinit -kt  /
+/opt/spark/bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.HdfsTest \
+--master k8s:// \
+--conf spark.executor.instances=1 \
+--conf spark.app.name=spark-hdfs \
+--conf spark.kubernetes.container.image=spark:latest \
+--conf spark.kubernetes.kerberos.krb5location=/etc/krb5.conf \
+
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar \
+
+```
+3. Submitting with a local keytab and principle
+```bash
+/opt/spark/bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.HdfsTest \
+--master k8s:// \
+--conf spark.executor.instances=1 \
+--conf spark.app.name=spark-hdfs \
+--conf spark.kubernetes.container.image=spark:latest \
+--conf spark.kubernetes.kerberos.keytab= \
+--conf spark.kubernetes.kerberos.principal= \
+--conf spark.kubernetes.kerberos.krb5location=/etc/krb5.conf \
+
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar \
+
+```
+
+3. Submitting with pre-populated secrets already existing within the 
namespace
--- End diff --

Agreed, I should be more clear. This is a secret containing the Delegation 
Token that will be used to mount onto the driver and executors, pointed to via 
the ENV: `HADOOP_TOKEN_FILE_LOCATION`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread ifilonenko

Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215470379
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopStepsOrchestrator.scala
 ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadoopsteps
+
+import java.io.File
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.features.OptionRequirements
+import 
org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager
+import org.apache.spark.internal.Logging
+
+private[spark] class HadoopStepsOrchestrator(
+  conf: SparkConf,
+  kubernetesResourceNamePrefix: String,
+  hadoopConfDir: String,
+  hadoopConfigMapName: String,
+  isKerberosEnabled: Boolean) extends Logging {
+
+  private val maybePrincipal = conf.get(KUBERNETES_KERBEROS_PRINCIPAL)
+  private val maybeKeytab = conf.get(KUBERNETES_KERBEROS_KEYTAB)
+.map(k => new File(k))
+  private val maybeExistingSecretName = 
conf.get(KUBERNETES_KERBEROS_DT_SECRET_NAME)
+  private val maybeExistingSecretItemKey =
+conf.get(KUBERNETES_KERBEROS_DT_SECRET_ITEM_KEY)
+  private val maybeRenewerPrincipal =
+conf.get(KUBERNETES_KERBEROS_RENEWER_PRINCIPAL)
+
+  require(maybeKeytab.forall( _ => isKerberosEnabled ),
+"You must enable Kerberos support if you are specifying a Kerberos 
Keytab")
+
+  require(maybeExistingSecretName.forall( _ => isKerberosEnabled ),
+"You must enable Kerberos support if you are specifying a Kerberos 
Secret")
+
+  OptionRequirements.requireBothOrNeitherDefined(
+maybeKeytab,
+maybePrincipal,
+"If a Kerberos keytab is specified you must also specify a Kerberos 
principal",
+"If a Kerberos principal is specified you must also specify a Kerberos 
keytab")
+
+  OptionRequirements.requireBothOrNeitherDefined(
+maybeExistingSecretName,
+maybeExistingSecretItemKey,
+"If a secret storing a Kerberos Delegation Token is specified you must 
also" +
+  " specify the label where the data is stored",
+"If a secret data item-key where the data of the Kerberos Delegation 
Token is specified" +
+  " you must also specify the name of the secret")
+
+  def getHadoopSteps(kubeTokenManager: 
KubernetesHadoopDelegationTokenManager):
--- End diff --

I thought that the interface was clear and easier to understand then using 
specific submodules which could bloat a single step, making it trickier, imo, 
to unit test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22327: [SPARK-25330][BUILD] Revert Hadoop 2.7 to 2.7.3

2018-09-05 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22327
  
How about revert it to branch-2.3 as we are going to release 2.3.2?
We have time to fix it before releasing 2.4.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread ifilonenko

Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215470115
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopKerberosSecretResolverStep.scala
 ---
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadoopsteps
+
+import 
org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager
+import org.apache.spark.internal.Logging
+
+ /**
+  * This step assumes that you have already done all the heavy lifting in 
retrieving a
--- End diff --

This specific "step" is used when the delegation token has already been 
stored in a pre-existing secret (that is not created on-the-fly by the 
Submission Client). This is a use-case we have seen desired by those running on 
Kubernetes Clusters where they do not wish to provide certain clients with 
keytabs and merely wish to point to pre-populated secrets that the user has 
access to (access is restricted via RBAC). 

I thought that secret creation logic and non-creation logic should be 
separated, but I can combine them into the same step. I just thought it would 
be more clear. 

(Also easier for unit testing). 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread ifilonenko

Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215469488
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/hadoopsteps/HadoopKerberosKeytabResolverStep.scala
 ---
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features.hadoopsteps
+
+import java.io._
+import java.security.PrivilegedExceptionAction
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.SecretBuilder
+import org.apache.commons.codec.binary.Base64
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s.Constants._
+import 
org.apache.spark.deploy.k8s.security.KubernetesHadoopDelegationTokenManager
+import org.apache.spark.deploy.security.HadoopDelegationTokenManager
+import org.apache.spark.internal.Logging
+
+ /**
+  * This step does all the heavy lifting for Delegation Token logic. This 
step
+  * assumes that the job user has either specified a principal and keytab 
or ran
+  * $kinit before running spark-submit. With a TGT stored locally, by 
running
+  * UGI.getCurrentUser you are able to obtain the current user, 
alternatively
+  * you can run UGI.loginUserFromKeytabAndReturnUGI and by running .doAs 
run
+  * as the logged into user instead of the current user. With the Job User 
principal
+  * you then retrieve the delegation token from the NameNode and store 
values in
+  * DelegationToken. Lastly, the class puts the data into a secret. All 
this is
+  * appended to the current HadoopSpec which in turn will append to the 
current
+  * DriverSpec.
+  */
+private[spark] class HadoopKerberosKeytabResolverStep(
+submissionSparkConf: SparkConf,
+kubernetesResourceNamePrefix : String,
+maybePrincipal: Option[String],
+maybeKeytab: Option[File],
+maybeRenewerPrincipal: Option[String],
+tokenManager: KubernetesHadoopDelegationTokenManager)
+   extends HadoopConfigurationStep with Logging {
+
+override def configureHadoopSpec(hSpec: HadoopConfigSpec): 
HadoopConfigSpec = {
+  val hadoopConf = 
SparkHadoopUtil.get.newConfiguration(submissionSparkConf)
+  if (!tokenManager.isSecurityEnabled) {
+throw new SparkException("Hadoop not configured with Kerberos")
+  }
+  val maybeJobUserUGI =
--- End diff --

Shouldn't the `yarn` config change exist in a different PR or should I 
include it in here? 
I agree that it would be super helpful. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-05 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22192
  
> It seems like the line 
Thread.currentThread().setContextClassLoader(replClassLoader) is causing the 
pyspark failures

What if you restore the original class loader after initializing the 
plugins? I was a little worried about this call, but was waiting for tests... 
so if it's causing problems, better to not change the way things work there.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-05 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22192
  
> I'll change the config documentation to specify YARN only, hopefully it's 
not a huge issue.

I don't think you're really understanding what I'm saying above. I'm not 
saying this feature only works with YARN. I'm saying that distributing the 
plugins with `--jars` only works with YARN. And that if there was 
user-documentation for this feature - which there isn't - then it would have to 
explain how the jar is expected to be made available to the executors.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22271: [SPARK-25268][GraphX]run Parallel Personalized PageRank ...

2018-09-05 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/22271
  
LGTM
I tested this locally and confirmed it fixes the serialization issue.
Thank you @shahidki31 !
Merging with master after fresh tests finish


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-05 Thread NiharS

Github user NiharS commented on the issue:

https://github.com/apache/spark/pull/22192
  
I'll change the config documentation to specify YARN only, hopefully it's 
not a huge issue.

It seems like the line 
`Thread.currentThread().setContextClassLoader(replClassLoader)` is causing the 
pyspark failures, they pass when I remove it. I'm looking at the test cases but 
I really don't see how this is affecting them...it seems that in both test 
cases, the DStreams monitor a directory but don't pick up the changes they're 
supposed to, and just time out. I checked that I can bypass this issue by 
changing back to having the plugins loaded on a separate thread (and setting 
that thread's contextClassLoader instead of the current thread) and it passes 
tests and continues to work. That said this issue does seem to be indicative of 
some problem in pyspark streaming


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22262: [SPARK-25175][SQL] Field resolution should fail if there...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22262
  
Thank you, @seancxmao . I'll review tonight again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22302: [SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test f...

2018-09-05 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22302
  
@fjh100456 please do not break the PR format, and can you clean up, too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20999: [WIP][SPARK-14922][SPARK-23866][SQL] Support partition f...

2018-09-05 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20999
  
still WIP? btw, it seems currently credits can go to multiple developers;
https://github.com/apache/spark/pull/22324#issuecomment-418227335


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22302: [SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test f...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22302
  
@fjh100456 . I'm @dongjoon-hyun . I think you put a wrong person in the the 
PR description. :)
BTW, you don't need to mention the person. It would be enough to mention 
SPARK JIRA IDs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22324: [SPARK-25237][SQL] Remove updateBytesReadWithFile...

2018-09-05 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22324#discussion_r215461695
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
@@ -473,6 +476,27 @@ class FileBasedDataSourceSuite extends QueryTest with 
SharedSQLContext with Befo
   }
 }
   }
+
+  test("SPARK-25237 compute correct input metrics in FileScanRDD") {
+withTempPath { p =>
+  val path = p.getAbsolutePath
+  spark.range(1000).repartition(1).write.csv(path)
+  val bytesReads = new mutable.ArrayBuffer[Long]()
+  val bytesReadListener = new SparkListener() {
+override def onTaskEnd(taskEnd: SparkListenerTaskEnd) {
+  bytesReads += taskEnd.taskMetrics.inputMetrics.bytesRead
+}
+  }
+  sparkContext.addSparkListener(bytesReadListener)
+  try {
+spark.read.csv(path).limit(1).collect()
+sparkContext.listenerBus.waitUntilEmpty(1000L)
+assert(bytesReads.sum === 7860)
--- End diff --

yea, actually the file size is `3890`, but the hadoop API 
(`FileSystem.getAllStatistics
) reports that number (`3930`). I didn't look into the Hadoop code yet, so 
I don't get why. I'll dig into it later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21525: [SPARK-24513][ML] Attribute support in UnaryTrans...

2018-09-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21525#discussion_r215461652
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Transformer.scala ---
@@ -116,10 +116,17 @@ abstract class UnaryTransformer[IN, OUT, T <: 
UnaryTransformer[IN, OUT, T]]
 StructType(outputFields)
   }
 
+  /**
+   * Returns [[Metadata]] to be attached to the output column.
+   */
+  protected def outputMetadata(outputSchema: StructType, dataset: 
Dataset[_]): Metadata =
+Metadata.empty
+
   override def transform(dataset: Dataset[_]): DataFrame = {
-transformSchema(dataset.schema, logging = true)
+val outputSchema = transformSchema(dataset.schema, logging = true)
 val transformUDF = udf(this.createTransformFunc, outputDataType)
-dataset.withColumn($(outputCol), transformUDF(dataset($(inputCol
+val metadata = outputMetadata(outputSchema, dataset)
--- End diff --

`HashingTF` is an example that the metadata is created in `transformSchema` 
and attached to `outputSchema`. So my question is, do we need an extra API 
`outputMetadata` to do this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18853: [SPARK-21646][SQL] Add new type coercion to compatible w...

2018-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18853
  
**[Test build #95733 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95733/testReport)**
 for PR 18853 at commit 
[`d0a2089`](https://github.com/apache/spark/commit/d0a2089b8c52a3be3977f0bfc5538758ef7d2b55).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22340: [SPARK-25337][SQL][TEST] `runSparkSubmit` should provide...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22340
  
Could you review this Scala-2.12 PR again, @cloud-fan and @gatorsmile ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20887
  
Based on the review opinion, I'll close this PR and JIRA issue for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should ...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/20887


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-09-05 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22270
  
Thank you!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...

2018-09-05 Thread mccheah

Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22323#discussion_r215456606
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala
 ---
@@ -45,6 +47,10 @@ private[spark] class LocalDirsFeatureStep(
 new VolumeBuilder()
   .withName(s"spark-local-dir-${index + 1}")
   .withNewEmptyDir()
+.withMedium(useLocalDirTmpFs match {
+  case true => "Memory" // Use tmpfs
--- End diff --

Think we shouldn't use `case true... false` - can instead do this:

```
.withNewEmptyDir().withMedium(if (useLocalDirTmpFs) "Memory" else null)...
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...

2018-09-05 Thread mccheah

Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22323#discussion_r215456338
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -215,6 +215,19 @@ 
spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.clai
 
 The configuration properties for mounting volumes into the executor pods 
use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. 
For a complete list of available options for each supported type of volumes, 
please refer to the [Spark Properties](#spark-properties) section below. 
 
+## Local Storage
+
+Spark uses temporary scratch space to spill data to disk during shuffles 
and other operations.  When using Kubernetes as the resource manager the pods 
will be created with an 
[emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) 
volume mounted for each directory listed in `SPARK_LOCAL_DIRS`.  If no 
directories are explicitly specified then a default directory is created and 
configured appropriately.
+
+`emptyDir` volumes use the ephemeral storage feature of Kubernetes and do 
not persist beyond the life of the pod.
+
+### Using RAM for local storage
+
+As `emptyDir` volumes use the nodes backing storage for ephemeral storage 
this default behaviour may not be appropriate for some compute environments.  
For example if you have diskless nodes with remote storage mounted over a 
network having lots of executors doing IO to this remote storage may actually 
degrade performance.
+
+In this case it may be desirable to set 
`spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause 
the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes.  
When configured like this Sparks local storage usage will count towards your 
pods memory usage therefore you may wish to increase your memory requests via 
the normal `spark.driver.memory` and `spark.executor.memory` configuration 
properties.
--- End diff --

You can't allocate space for tmpfs via `spark,{driver, executor}.memory` 
because that will strictly be allocated to the heap. The Java command is 
basically this:

`/bin/java -Xmx${spark.driver.memory}`

hence strictly requiring memory overhead to get this space to be dedicated 
to tmpfs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...

2018-09-05 Thread mccheah

Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22323#discussion_r215456006
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala
 ---
@@ -22,6 +22,7 @@ import java.util.UUID
 import io.fabric8.kubernetes.api.model.{ContainerBuilder, HasMetadata, 
PodBuilder, VolumeBuilder, VolumeMountBuilder}
 
 import org.apache.spark.deploy.k8s.{KubernetesConf, 
KubernetesDriverSpecificConf, KubernetesRoleSpecificConf, SparkPod}
+import org.apache.spark.deploy.k8s.Config._
--- End diff --

Actually I think wildcard for configuration fields is probably ok. But we 
haven't been consistent.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-05 Thread mccheah

Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215455515
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ---
@@ -60,7 +62,23 @@ private[spark] case class KubernetesConf[T <: 
KubernetesRoleSpecificConf](
 roleSecretEnvNamesToKeyRefs: Map[String, String],
 roleEnvs: Map[String, String],
 roleVolumes: Iterable[KubernetesVolumeSpec[_ <: 
KubernetesVolumeSpecificConf]],
-sparkFiles: Seq[String]) {
+sparkFiles: Seq[String],
+hadoopConfDir: Option[String]) {
+
+  def getHadoopConfigMapName: String = 
s"$appResourceNamePrefix-hadoop-config"
+
+  def getKRBConfigMapName: String = s"$appResourceNamePrefix-krb5-file"
+
+  def getHadoopStepsOrchestrator : Option[HadoopStepsOrchestrator] = 
hadoopConfDir.map {
+hConf => new HadoopStepsOrchestrator(
+  sparkConf,
+  appResourceNamePrefix,
+  hConf,
+  getHadoopConfigMapName,
+  getTokenManager.isSecurityEnabled)}
+
+  def getTokenManager : KubernetesHadoopDelegationTokenManager =
--- End diff --

Strange to have the `KubernetesConf` object return a unit that does work - 
most of these are properties. This thing should basically behave like a struct.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21525: [SPARK-24513][ML] Attribute support in UnaryTransformer

2018-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21525
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21525: [SPARK-24513][ML] Attribute support in UnaryTransformer

2018-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21525
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95732/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21525: [SPARK-24513][ML] Attribute support in UnaryTransformer

2018-09-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21525
  
**[Test build #95732 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95732/testReport)**
 for PR 21525 at commit 
[`cfeae4d`](https://github.com/apache/spark/commit/cfeae4d5282be98d619da97f350c11339057e92f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22335: [SPARK-25091][SQL] reduce the storage memory in E...

2018-09-05 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22335#discussion_r215452975
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
@@ -646,8 +646,17 @@ private[spark] class AppStatusListener(
   }
 
   override def onUnpersistRDD(event: SparkListenerUnpersistRDD): Unit = {
-liveRDDs.remove(event.rddId)
-kvstore.delete(classOf[RDDStorageInfoWrapper], event.rddId)
+while (true) {
--- End diff --

Thank you for letting me know about a single thread. 

I agree with your point. As you pointed out, when `rdd.isEmpty()` continues 
to return `false`, it will get into an infinite loop. Thus, I imagine that this 
works under multithread.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files

2018-09-05 Thread mccheah

Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22146
  
I think we still need some consensus on the driver container API. 
Personally I think it's clearer to use a configuration option to specifically 
select the container that is the driver, but this is an opinion I hold loosely.

Would like specific weigh-in from @erikerlandson and @liyinan926.

Aside from that, this looks good on my end apart from the tests we need to 
add and should be ready to merge pretty soon (depending also on Spark 2.4 
timeline)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files

2018-09-05 Thread mccheah

Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22146#discussion_r215452489
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala
 ---
@@ -81,9 +95,12 @@ private[spark] class KubernetesDriverBuilder(
   .getOrElse(provideJavaStep(kubernetesConf))
 
 val allFeatures = (baseFeatures :+ bindingsStep) ++
-  secretFeature ++ envSecretFeature ++ volumesFeature
+  secretFeature ++ envSecretFeature ++ volumesFeature ++ 
podTemplateFeature
 
-var spec = 
KubernetesDriverSpec.initialSpec(kubernetesConf.sparkConf.getAll.toMap)
+var spec = KubernetesDriverSpec(
+  provideInitialPod(),
+  Seq.empty,
--- End diff --

Ping on this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...

2018-09-05 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/22270
  
@gatorsmile Yeah.. i will push the changes tonight for you to take a look. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files

2018-09-05 Thread mccheah

Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22146#discussion_r215451113
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
 ---
@@ -59,5 +66,28 @@ private[spark] object KubernetesUtils {
 }
   }
 
+  def loadPodFromTemplate(
+  kubernetesClient: KubernetesClient,
+  templateFile: File): SparkPod = {
+try {
+  val pod = kubernetesClient.pods().load(templateFile).get()
+  pod.getSpec.getContainers.asScala.toList match {
--- End diff --

Ah actually because we're extracting first and rest this implementation is 
probably cleaner.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files

2018-09-05 Thread mccheah

Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22146#discussion_r215450800
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
 ---
@@ -74,8 +74,16 @@ private[spark] object Constants {
   val ENV_R_PRIMARY = "R_PRIMARY"
   val ENV_R_ARGS = "R_APP_ARGS"
 
+  // Pod spec templates
+  val EXECUTOR_POD_SPEC_TEMPLATE_FILE_NAME = "pod-spec-template.yml"
+  val EXECUTOR_POD_SPEC_TEMPLATE_MOUNTHPATH = "/opt/spark/pod-template"
--- End diff --

Spelling, think we want `EXECUTOR_POD_SPEC_TEMPLATE_MOUNTPATH`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18080: [Spark-20771][SQL] Make weekofyear more intuitive

2018-09-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18080
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files

2018-09-05 Thread mccheah

Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/22146#discussion_r215450600
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
 ---
@@ -59,5 +66,28 @@ private[spark] object KubernetesUtils {
 }
   }
 
+  def loadPodFromTemplate(
+  kubernetesClient: KubernetesClient,
+  templateFile: File): SparkPod = {
+try {
+  val pod = kubernetesClient.pods().load(templateFile).get()
+  pod.getSpec.getContainers.asScala.toList match {
--- End diff --

I think you can use `headOption` here to get an `Option` object and use 
`map`, `getOrElse`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22333: [SPARK-25335][BUILD] Skip Zinc downloading if it's insta...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22333
  
Thank you, @srowen .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22179: [SPARK-25258][SPARK-23131][SPARK-25176][BUILD] Up...

2018-09-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22179


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 553 matches

Mail list logo