[GitHub] spark issue #21313: [SPARK-24187][R][SQL]Add array_join function to SparkR

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21313
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21313: [SPARK-24187][R][SQL]Add array_join function to SparkR

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21313
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91403/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21313: [SPARK-24187][R][SQL]Add array_join function to SparkR

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21313
  
**[Test build #91403 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91403/testReport)**
 for PR 21313 at commit 
[`e05e701`](https://github.com/apache/spark/commit/e05e701f3027607fc6942a81e1a9f8d0a5cc6e5f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...

2018-06-01 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/21467#discussion_r192513260
  
--- Diff: python/pyspark/util.py ---
@@ -53,16 +53,11 @@ def _get_argspec(f):
 """
 Get argspec of a function. Supports both Python 2 and Python 3.
 """
-
-if hasattr(f, '_argspec'):
-# only used for pandas UDF: they wrap the user function, losing 
its signature
-# workers need this signature, so UDF saves it here
-argspec = f._argspec
-elif sys.version_info[0] < 3:
+# `getargspec` is deprecated since python3.0 (incompatible with 
function annotations).
--- End diff --

I meant. The comment itself can be moved back to the "else" block? (This is 
minor though)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21481
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3768/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21481
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21390: [SPARK-24340][Core] Clean up non-shuffle disk blo...

2018-06-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21390


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21443: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-06-01 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21443
  
@gatorsmile ok, I wil (so, I reopend 
https://issues.apache.org/jira/browse/SPARK-24369)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...

2018-06-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21390
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21313: [SPARK-24187][R][SQL]Add array_join function to SparkR

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21313
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3767/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21313: [SPARK-24187][R][SQL]Add array_join function to SparkR

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21313
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21313: [SPARK-24187][R][SQL]Add array_join function to SparkR

2018-06-01 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/21313
  
@felixcheung @HyukjinKwon  Any more comments?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21481: [SPARK-24452][SQL][Core] Avoid possible overflow in int ...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21481
  
**[Test build #91404 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91404/testReport)**
 for PR 21481 at commit 
[`324fd5c`](https://github.com/apache/spark/commit/324fd5ccb73c8017f5537031db21b687ac1ca27a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21481: [SPARK-24452][SQL][Core] Avoid possible overflow ...

2018-06-01 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/21481

[SPARK-24452][SQL][Core] Avoid possible overflow in int add or multiple

## What changes were proposed in this pull request?

This PR fixes possible overflow in int add or multiply.

The following assignments may cause overflow in right hand side. As a 
result, the result may be negative.
```
long = int * int
long = int + int
```

To avoid this problem, this PR performs cast from int to long in right hand 
side.

## How was this patch tested?

Existing UTs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-24452

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21481.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21481


commit 324fd5ccb73c8017f5537031db21b687ac1ca27a
Author: Kazuaki Ishizaki 
Date:   2018-06-01T20:22:34Z

initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3628/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21313: [SPARK-24187][R][SQL]Add array_join function to SparkR

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21313
  
**[Test build #91403 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91403/testReport)**
 for PR 21313 at commit 
[`e05e701`](https://github.com/apache/spark/commit/e05e701f3027607fc6942a81e1a9f8d0a5cc6e5f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3766/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3628/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21346
  
**[Test build #4194 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4194/testReport)**
 for PR 21346 at commit 
[`83c3271`](https://github.com/apache/spark/commit/83c3271d2f45bbef18d865bddbc6807e9fbd2503).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-06-01 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r192502051
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala
 ---
@@ -423,7 +423,9 @@ class JsonExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper with
 val input = """{"a": 1}"""
 val schema = ArrayType(StructType(StructField("a", IntegerType) :: 
Nil))
 val output = InternalRow(1) :: Nil
-checkEvaluation(JsonToStructs(schema, Map.empty, Literal(input), 
gmtId, true), output)
+checkEvaluation(
+  JsonToStructs(schema, Map("unpackArray" -> "true"), Literal(input), 
gmtId, true),
--- End diff --

add case for `unpackArray` as `false`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91402/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91402 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91402/testReport)**
 for PR 20697 at commit 
[`845cba1`](https://github.com/apache/spark/commit/845cba1db95293d7962fb6029c46e006a5da46a0).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91402 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91402/testReport)**
 for PR 20697 at commit 
[`845cba1`](https://github.com/apache/spark/commit/845cba1db95293d7962fb6029c46e006a5da46a0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...

2018-06-01 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request:

https://github.com/apache/spark/pull/21467#discussion_r192502611
  
--- Diff: python/pyspark/util.py ---
@@ -53,16 +53,11 @@ def _get_argspec(f):
 """
 Get argspec of a function. Supports both Python 2 and Python 3.
 """
-
-if hasattr(f, '_argspec'):
-# only used for pandas UDF: they wrap the user function, losing 
its signature
-# workers need this signature, so UDF saves it here
-argspec = f._argspec
-elif sys.version_info[0] < 3:
+# `getargspec` is deprecated since python3.0 (incompatible with 
function annotations).
--- End diff --

no, this is the purpose of this PR :) that's how we fixed a bug in [a 
previous PR](https://github.com/apache/spark/pull/21383), but we felt it was a 
hack so now we are doing it properly


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-06-01 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r192501912
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -523,6 +523,11 @@ case class JsonToStructs(
   // can generate incorrect files if values are missing in columns 
declared as non-nullable.
   val nullableSchema = if (forceNullableSchema) schema.asNullable else 
schema
 
+  private val caseInsensitiveOptions = CaseInsensitiveMap(options)
+  private val unpackArray: Boolean = {
--- End diff --

Why do we need this? Can you add comments about it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21479
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21479
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91396/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21479
  
**[Test build #91396 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91396/testReport)**
 for PR 21479 at commit 
[`a1a4db3`](https://github.com/apache/spark/commit/a1a4db3774e7e0911e710ed1a99694add29df545).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21376: [SPARK-24250][SQL] support accessing SQLConf inside task...

2018-06-01 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21376
  
I'm still seeing a lot of build failures which seem to be related to this 
(accessing a conf in a task in turn accesses the LiveListenerBus).  Is this 
something new?  Or related to this change?  eg.

https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4193/testReport/org.apache.spark.sql.execution/UnsafeRowSerializerSuite/toUnsafeRow___test_helper_method/

```
sbt.ForkMain$ForkError: java.lang.IllegalStateException: LiveListenerBus is 
stopped.
at 
org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97)
at 
org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80)
at 
org.apache.spark.sql.internal.SharedState.(SharedState.scala:93)
at 
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:120)
at 
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:120)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:120)
at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:119)
at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286)
at 
org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42)
at 
org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41)
at 
org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:95)
at 
org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:95)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:95)
at 
org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:94)
at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:126)
at 
org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:54)
at 
org.apache.spark.sql.catalyst.expressions.UnsafeProjection$.create(Projection.scala:157)
at 
org.apache.spark.sql.catalyst.expressions.UnsafeProjection$.create(Projection.scala:150)
at 
org.apache.spark.sql.execution.UnsafeRowSerializerSuite.org$apache$spark$sql$execution$UnsafeRowSerializerSuite$$unsafeRowConverter(UnsafeRowSerializerSuite.scala:54)
at 
org.apache.spark.sql.execution.UnsafeRowSerializerSuite.org$apache$spark$sql$execution$UnsafeRowSerializerSuite$$toUnsafeRow(UnsafeRowSerializerSuite.scala:49)
at 
org.apache.spark.sql.execution.UnsafeRowSerializerSuite$$anonfun$2.apply(UnsafeRowSerializerSuite.scala:63)
at 
org.apache.spark.sql.execution.UnsafeRowSerializerSuite$$anonfun$2.apply(UnsafeRowSerializerSuite.scala:60)
...
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3765/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21061
  
**[Test build #91401 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91401/testReport)**
 for PR 21061 at commit 
[`adc68cc`](https://github.com/apache/spark/commit/adc68cc033dec8b26be23e861eb53b466f35ad38).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3626/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...

2018-06-01 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/21467#discussion_r192493299
  
--- Diff: python/pyspark/util.py ---
@@ -53,16 +53,11 @@ def _get_argspec(f):
 """
 Get argspec of a function. Supports both Python 2 and Python 3.
 """
-
-if hasattr(f, '_argspec'):
-# only used for pandas UDF: they wrap the user function, losing 
its signature
-# workers need this signature, so UDF saves it here
-argspec = f._argspec
-elif sys.version_info[0] < 3:
+# `getargspec` is deprecated since python3.0 (incompatible with 
function annotations).
--- End diff --

This change doesn't seems necessary... Let's move it back?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3764/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3626/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-06-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r192490355
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1882,3 +1882,311 @@ case class ArrayRepeat(left: Expression, right: 
Expression)
   }
 
 }
+
+object ArraySetLike {
+  val kindUnion = 1
+
+  private val MAX_ARRAY_LENGTH: Int = 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH
+
+  def toArrayDataInt(hs: OpenHashSet[Int]): ArrayData = {
+val array = new Array[Int](hs.size)
+var pos = hs.nextPos(0)
+var i = 0
+while (pos != OpenHashSet.INVALID_POS) {
+  array(i) = hs.getValue(pos)
+  pos = hs.nextPos(pos + 1)
+  i += 1
+}
+
+val numBytes = 4L * array.length
+val unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes(array.length) +
+  
org.apache.spark.unsafe.array.ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes)
+// Since UnsafeArrayData.fromPrimitiveArray() uses long[], max 
elements * 8 bytes can be used
+if (unsafeArraySizeInBytes <= Integer.MAX_VALUE * 8) {
+  UnsafeArrayData.fromPrimitiveArray(array)
+} else {
+  new GenericArrayData(array)
+}
+  }
+
+  def toArrayDataLong(hs: OpenHashSet[Long]): ArrayData = {
+val array = new Array[Long](hs.size)
+var pos = hs.nextPos(0)
+var i = 0
+while (pos != OpenHashSet.INVALID_POS) {
+  array(i) = hs.getValue(pos)
+  pos = hs.nextPos(pos + 1)
+  i += 1
+}
+
+val numBytes = 8L * array.length
+val unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes(array.length) +
+  
org.apache.spark.unsafe.array.ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes)
+// Since UnsafeArrayData.fromPrimitiveArray() uses long[], max 
elements * 8 bytes can be used
+if (unsafeArraySizeInBytes <= Integer.MAX_VALUE * 8) {
--- End diff --

Ah, I misunderstood. To accept `Integer.MAX_VALUE * 8` looks a future plan.
Anyway, I will use the same calculation in 
`UnsafeArrayData.fromPrimitiveArray()`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91400 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91400/testReport)**
 for PR 20697 at commit 
[`b936953`](https://github.com/apache/spark/commit/b936953c871226ae8a2ccc7caa6096e9fc38c317).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91400/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20697
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20697
  
**[Test build #91400 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91400/testReport)**
 for PR 20697 at commit 
[`b936953`](https://github.com/apache/spark/commit/b936953c871226ae8a2ccc7caa6096e9fc38c317).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread ssuchter
Github user ssuchter commented on the issue:

https://github.com/apache/spark/pull/20697
  
@mccheah If you want to merge, and then I can fix the commented out test in 
another PR, that's ok too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14940: [SPARK-17383][GRAPHX] Improvement LabelPropagaton, and r...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14940
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...

2018-06-01 Thread ssuchter
Github user ssuchter commented on a diff in the pull request:

https://github.com/apache/spark/pull/20697#discussion_r192488928
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala
 ---
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.integrationtest
+
+import java.io.File
+import java.nio.file.{Path, Paths}
+import java.util.UUID
+import java.util.regex.Pattern
+
+import scala.collection.JavaConverters._
+
+import com.google.common.io.PatternFilenameFilter
+import io.fabric8.kubernetes.api.model.{Container, Pod}
+import org.scalatest.{BeforeAndAfter, BeforeAndAfterAll}
+import org.scalatest.concurrent.{Eventually, PatienceConfiguration}
+import org.scalatest.time.{Minutes, Seconds, Span}
+
+import org.apache.spark.SparkFunSuite
+import 
org.apache.spark.deploy.k8s.integrationtest.backend.{IntegrationTestBackend, 
IntegrationTestBackendFactory}
+import org.apache.spark.deploy.k8s.integrationtest.config._
+
+private[spark] class KubernetesSuite extends SparkFunSuite
+  with BeforeAndAfterAll with BeforeAndAfter {
+
+  import KubernetesSuite._
+
+  private var testBackend: IntegrationTestBackend = _
+  private var sparkHomeDir: Path = _
+  private var kubernetesTestComponents: KubernetesTestComponents = _
+  private var sparkAppConf: SparkAppConf = _
+  private var image: String = _
+  private var containerLocalSparkDistroExamplesJar: String = _
+  private var appLocator: String = _
+  private var driverPodName: String = _
+
+  override def beforeAll(): Unit = {
+// The scalatest-maven-plugin gives system properties that are 
referenced but not set null
+// values. We need to remove the null-value properties before 
initializing the test backend.
+val nullValueProperties = System.getProperties.asScala
+  .filter(entry => entry._2.equals("null"))
+  .map(entry => entry._1.toString)
+nullValueProperties.foreach { key =>
+  System.clearProperty(key)
+}
+
+val sparkDirProp = 
System.getProperty("spark.kubernetes.test.unpackSparkDir")
+require(sparkDirProp != null, "Spark home directory must be provided 
in system properties.")
+sparkHomeDir = Paths.get(sparkDirProp)
+require(sparkHomeDir.toFile.isDirectory,
+  s"No directory found for spark home specified at $sparkHomeDir.")
+val imageTag = getTestImageTag
+val imageRepo = getTestImageRepo
+image = s"$imageRepo/spark:$imageTag"
+
+val sparkDistroExamplesJarFile: File = 
sparkHomeDir.resolve(Paths.get("examples", "jars"))
+  .toFile
+  .listFiles(new 
PatternFilenameFilter(Pattern.compile("^spark-examples_.*\\.jar$")))(0)
+containerLocalSparkDistroExamplesJar = 
s"local:///opt/spark/examples/jars/" +
+  s"${sparkDistroExamplesJarFile.getName}"
+testBackend = IntegrationTestBackendFactory.getTestBackend
+testBackend.initialize()
+kubernetesTestComponents = new 
KubernetesTestComponents(testBackend.getKubernetesClient)
+  }
+
+  override def afterAll(): Unit = {
+testBackend.cleanUp()
+  }
+
+  before {
+appLocator = UUID.randomUUID().toString.replaceAll("-", "")
+driverPodName = "spark-test-app-" + 
UUID.randomUUID().toString.replaceAll("-", "")
+sparkAppConf = kubernetesTestComponents.newSparkAppConf()
+  .set("spark.kubernetes.container.image", image)
+  .set("spark.kubernetes.driver.pod.name", driverPodName)
+  .set("spark.kubernetes.driver.label.spark-app-locator", appLocator)
+  .set("spark.kubernetes.executor.label.spark-app-locator", appLocator)
+if (!kubernetesTestComponents.hasUserSpecifiedNamespace) {
+  kubernetesTestComponents.createNamespace()
+}
+  }
+
+  after {
+if (!kubernetesTestComponents.hasUserSpecifiedNamespace) {

[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-06-01 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r192488365
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -101,6 +102,13 @@ class JacksonParser(
 }
   }
 
+  private def makeArrayRootConverter(at: ArrayType): JsonParser => 
Seq[InternalRow] = {
+val elemConverter = makeConverter(at.elementType)
+(parser: JsonParser) => parseJsonToken[Seq[InternalRow]](parser, at) {
+  case START_ARRAY => Seq(InternalRow(convertArray(parser, 
elemConverter)))
--- End diff --

In line 87:
```
val array = convertArray(parser, elementConverter)
// Here, as we support reading top level JSON arrays and take every 
element
// in such an array as a row, this case is possible.
if (array.numElements() == 0) {
  Nil
} else {
  array.toArray[InternalRow](schema).toSeq
}
```
Should we also follow this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21444: Branch 2.3

2018-06-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21444


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21474: [SPARK-24297][CORE] Fetch-to-disk by default for ...

2018-06-01 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21474#discussion_r192487033
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -429,7 +429,11 @@ package object config {
 "external shuffle service, this feature can only be worked when 
external shuffle" +
 "service is newer than Spark 2.2.")
   .bytesConf(ByteUnit.BYTE)
-  .createWithDefault(Long.MaxValue)
+  // fetch-to-mem is guaranteed to fail if the message is bigger than 
2 GB, so we might
+  // as well use fetch-to-disk in that case.  The message includes 
some metadata in addition
+  // to the block data itself (in particular UploadBlock has a lot of 
metadata), so we leave
+  // extra room.
+  .createWithDefault(Int.MaxValue - 500)
--- End diff --

no guarantee its big enough.  Seemed OK in the test I tried.  But 
UploadBlock has some variable length strings so can't say for sure.

I'm fine making this much bigger, eg. 1 MB -- you'd only be bigger than 
that with a pathological case.  then there would be *some* cases where we'd be 
taking an old message which was fine with fetch-to-mem and we'd switch to 
fetch-to-disk.  But such a tiny case, and not an unreasonable change even for 
that ... so should be OK.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19680: [SPARK-22461][ML] Refactor Spark ML model summari...

2018-06-01 Thread sethah
Github user sethah closed the pull request at:

https://github.com/apache/spark/pull/19680


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21443: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-06-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21443
  
I will revert this PR now. @maropu Could you submit a new fix to resolve 
the above issue?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21443: [SPARK-24369][SQL] Correct handling for multiple distinc...

2018-06-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21443
  
For the following query, we could see the performance regression
```SQL
SELECT sum(DISTINCT x), avg(DISTINCT x)
FROM (VALUES (1, 1), (2, 2), (2, 2)) t(x, y)
```

Before this PR:
```
== Optimized Logical Plan ==
Aggregate [sum(distinct cast(x#189 as bigint)) AS sum(DISTINCT x)#193L, 
avg(distinct cast(x#189 as bigint)) AS avg(DISTINCT x)#194]
+- LocalRelation [x#189]
```

After this PR
```
== Optimized Logical Plan ==
Aggregate [sum(if ((gid#195 = 1)) CAST(`x` AS BIGINT)#196L else null) AS 
sum(DISTINCT x)#193L, avg(if ((gid#195 = 1)) CAST(`x` AS BIGINT)#196L else 
null) AS avg(DISTINCT x)#194]
+- Aggregate [CAST(`x` AS BIGINT)#196L, gid#195], [CAST(`x` AS 
BIGINT)#196L, gid#195]
   +- Expand [List(cast(x#189 as bigint), 1)], [CAST(`x` AS BIGINT)#196L, 
gid#195]
  +- LocalRelation [x#189]
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19691
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91393/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19691
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP PARTITIO...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19691
  
**[Test build #91393 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91393/testReport)**
 for PR 19691 at commit 
[`defc9f1`](https://github.com/apache/spark/commit/defc9f11831c053727970e8d9c1f784ec5223644).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20894
  
**[Test build #91399 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91399/testReport)**
 for PR 20894 at commit 
[`3b37712`](https://github.com/apache/spark/commit/3b37712ded664aaf716306574f50072e58b9bbd1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function

2018-06-01 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21061#discussion_r192479966
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1882,3 +1882,311 @@ case class ArrayRepeat(left: Expression, right: 
Expression)
   }
 
 }
+
+object ArraySetLike {
+  val kindUnion = 1
+
+  private val MAX_ARRAY_LENGTH: Int = 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH
+
+  def toArrayDataInt(hs: OpenHashSet[Int]): ArrayData = {
+val array = new Array[Int](hs.size)
+var pos = hs.nextPos(0)
+var i = 0
+while (pos != OpenHashSet.INVALID_POS) {
+  array(i) = hs.getValue(pos)
+  pos = hs.nextPos(pos + 1)
+  i += 1
+}
+
+val numBytes = 4L * array.length
+val unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes(array.length) +
+  
org.apache.spark.unsafe.array.ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes)
+// Since UnsafeArrayData.fromPrimitiveArray() uses long[], max 
elements * 8 bytes can be used
+if (unsafeArraySizeInBytes <= Integer.MAX_VALUE * 8) {
+  UnsafeArrayData.fromPrimitiveArray(array)
+} else {
+  new GenericArrayData(array)
+}
+  }
+
+  def toArrayDataLong(hs: OpenHashSet[Long]): ArrayData = {
+val array = new Array[Long](hs.size)
+var pos = hs.nextPos(0)
+var i = 0
+while (pos != OpenHashSet.INVALID_POS) {
+  array(i) = hs.getValue(pos)
+  pos = hs.nextPos(pos + 1)
+  i += 1
+}
+
+val numBytes = 8L * array.length
+val unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes(array.length) +
+  
org.apache.spark.unsafe.array.ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes)
+// Since UnsafeArrayData.fromPrimitiveArray() uses long[], max 
elements * 8 bytes can be used
+if (unsafeArraySizeInBytes <= Integer.MAX_VALUE * 8) {
--- End diff --

I'm just not sure the calculation satisfies the limit in 
`UnsafeArrayData.fromPrimitiveArray()`.
I'd prefer to do the same calculation in it here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21480: [SPARK-23668][K8S] Added missing config property in runn...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21480
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3625/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21092
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21092
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91394/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21092
  
**[Test build #91394 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91394/testReport)**
 for PR 21092 at commit 
[`24a704e`](https://github.com/apache/spark/commit/24a704e74f2c5816e5ea60dbf607be00c090ae8b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21480: [SPARK-23668][K8S] Added missing config property in runn...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21480
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3625/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-06-01 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20894#discussion_r192477066
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
 ---
@@ -110,14 +114,81 @@ abstract class CSVDataSource extends Serializable {
   }
 }
 
-object CSVDataSource {
+object CSVDataSource extends Logging {
   def apply(options: CSVOptions): CSVDataSource = {
 if (options.multiLine) {
   MultiLineCSVDataSource
 } else {
   TextInputCSVDataSource
 }
   }
+
+  /**
+   * Checks that column names in a CSV header and field names in the 
schema are the same
+   * by taking into account case sensitivity.
+   */
+  def checkHeaderColumnNames(
+  schema: StructType,
+  columnNames: Array[String],
+  fileName: String,
+  enforceSchema: Boolean,
+  caseSensitive: Boolean): Unit = {
+if (columnNames != null) {
+  val fieldNames = schema.map(_.name).toIndexedSeq
+  val (headerLen, schemaSize) = (columnNames.size, fieldNames.length)
+  var errorMessage: Option[String] = None
+
+  if (headerLen == schemaSize) {
+var i = 0
+while (errorMessage.isEmpty && i < headerLen) {
+  var (nameInSchema, nameInHeader) = (fieldNames(i), 
columnNames(i))
+  if (!caseSensitive) {
+nameInSchema = nameInSchema.toLowerCase
+nameInHeader = nameInHeader.toLowerCase
+  }
+  if (nameInHeader != nameInSchema) {
+errorMessage = Some(
+  s"""|CSV header is not conform to the schema.
+  | Header: ${columnNames.mkString(", ")}
+  | Schema: ${fieldNames.mkString(", ")}
+  |Expected: ${fieldNames(i)} but found: ${columnNames(i)}
+  |CSV file: $fileName""".stripMargin)
+  }
+  i += 1
+}
+  } else {
+errorMessage = Some(
+  s"""|Number of column in CSV header is not equal to number of 
fields in the schema:
+  | Header length: $headerLen, schema size: $schemaSize
+  |CSV file: $fileName""".stripMargin)
+  }
+
+  errorMessage.foreach { msg =>
+if (enforceSchema) {
+  logWarning(msg)
+} else {
+  throw new IllegalArgumentException(msg)
+}
+  }
+}
+  }
+
+  /**
+   * Checks that CSV header contains the same column names as fields names 
in the given schema
+   * by taking into account case sensitivity.
+   */
+  def checkHeader(
+  header: String,
+  parser: CsvParser,
+  schema: StructType,
+  fileName: String,
+  enforceSchema: Boolean,
+  caseSensitive: Boolean): Unit = {
+if (!enforceSchema) {
--- End diff --

I have to add a separate log appender to catch log output. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21480: [SPARK-23668][K8S] Added missing config property in runn...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21480
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3763/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21480: [SPARK-23668][K8S] Added missing config property in runn...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21480
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-06-01 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20894#discussion_r192476861
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -499,6 +503,11 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   StructType(schema.filterNot(_.name == 
parsedOptions.columnNameOfCorruptRecord))
 
 val linesWithoutHeader: RDD[String] = maybeFirstLine.map { firstLine =>
+  if (!parsedOptions.enforceSchema) {
--- End diff --

fixed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21476: [SPARK-24446][yarn] Properly quote library path f...

2018-06-01 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21476#discussion_r192476723
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -1485,6 +1486,22 @@ private object Client extends Logging {
 YarnAppReport(report.getYarnApplicationState(), 
report.getFinalApplicationStatus(), diagsOpt)
   }
 
+  /**
+   * Create a properly quoted library path string to be added as a prefix 
to the command executed by
+   * YARN. This is different from plain quoting due to YARN executing the 
command through "bash -c".
+   */
+  def createLibraryPathPrefix(libpath: String, conf: SparkConf): String = {
--- End diff --

This is so specific to the way YARN runs things that I don't think it would 
be useful anywhere else. If at some point it becomes useful, the code can be 
moved.

I think the tests I added are better than just unit testing this function, 
since that way the code is actually being run through YARN and bash.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21480: [SPARK-23668][K8S] Added missing config property in runn...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21480
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21480: [SPARK-23668][K8S] Added missing config property in runn...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21480
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91398/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21480: [SPARK-23668][K8S] Added missing config property in runn...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21480
  
**[Test build #91398 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91398/testReport)**
 for PR 21480 at commit 
[`a381353`](https://github.com/apache/spark/commit/a381353568c44fa30a50d1eb266e30f88c3116bd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/20697
  
we tried the --vm-driver=none and it wasn't working for us...  this was ~8
months ago, however, and i can't recall exactly went wrong.

On Fri, Jun 1, 2018 at 10:49 AM, Sean Suchter 
wrote:

> @skonto  I'll test that and discuss with
> @shaneknapp . It wouldn't involve directly
> changing code in this PR, since the minikube creation/destruction is done
> by Jenkins job config, but it is relevant to how we set up systems. We
> could potentially accelerate the adoption of all nodes. I'm not sure about
> whether the docker system is ready on all the nodes or not, but that's a
> good discussion.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21479
  
What about day_of_week?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21467
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91397/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21467
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21467
  
**[Test build #91397 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91397/testReport)**
 for PR 21467 at commit 
[`8505de2`](https://github.com/apache/spark/commit/8505de28231e99b63371d0798545b693692cbce4).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21480: [SPARK-23668][K8S] Added missing config property in runn...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21480
  
**[Test build #91398 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91398/testReport)**
 for PR 21480 at commit 
[`a381353`](https://github.com/apache/spark/commit/a381353568c44fa30a50d1eb266e30f88c3116bd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21480: [SPARK-23668][K8S] Added missing config property ...

2018-06-01 Thread liyinan926
GitHub user liyinan926 opened a pull request:

https://github.com/apache/spark/pull/21480

[SPARK-23668][K8S] Added missing config property in running-on-kubernetes.md

## What changes were proposed in this pull request?
PR https://github.com/apache/spark/pull/20811 introduced a new Spark 
configuration property `spark.kubernetes.container.image.pullSecrets` for 
specifying image pull secrets. However, the documentation wasn't updated 
accordingly. This PR adds the property introduced into running-on-kubernetes.md.

## How was this patch tested?
N/A.

@foxish @mccheah please help merge this. Thanks!

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liyinan926/spark-k8s master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21480.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21480


commit a381353568c44fa30a50d1eb266e30f88c3116bd
Author: Yinan Li 
Date:   2018-06-01T17:52:40Z

[SPARK-23668][K8S] Added missing config property in running-on-kubernetes.md




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21400: [SPARK-24351][SS]offsetLog/commitLog purge thresh...

2018-06-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21400


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-01 Thread ssuchter
Github user ssuchter commented on the issue:

https://github.com/apache/spark/pull/20697
  
@skonto I'll test that and discuss with @shaneknapp. It wouldn't involve 
directly changing code in this PR, since the minikube creation/destruction is 
done by Jenkins job config, but it is relevant to how we set up systems. We 
could potentially accelerate the adoption of all nodes. I'm not sure about 
whether the docker system is ready on all the nodes or not, but that's a good 
discussion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21400: [SPARK-24351][SS]offsetLog/commitLog purge thresholdBatc...

2018-06-01 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/21400
  
Thanks! Merging to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...

2018-06-01 Thread ssuchter
Github user ssuchter commented on a diff in the pull request:

https://github.com/apache/spark/pull/20697#discussion_r192466840
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala
 ---
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.integrationtest
+
+import java.io.File
+import java.nio.file.{Path, Paths}
+import java.util.UUID
+import java.util.regex.Pattern
+
+import scala.collection.JavaConverters._
+
+import com.google.common.io.PatternFilenameFilter
+import io.fabric8.kubernetes.api.model.{Container, Pod}
+import org.scalatest.{BeforeAndAfter, BeforeAndAfterAll}
+import org.scalatest.concurrent.{Eventually, PatienceConfiguration}
+import org.scalatest.time.{Minutes, Seconds, Span}
+
+import org.apache.spark.SparkFunSuite
+import 
org.apache.spark.deploy.k8s.integrationtest.backend.{IntegrationTestBackend, 
IntegrationTestBackendFactory}
+import org.apache.spark.deploy.k8s.integrationtest.config._
+
+private[spark] class KubernetesSuite extends SparkFunSuite
+  with BeforeAndAfterAll with BeforeAndAfter {
+
+  import KubernetesSuite._
+
+  private var testBackend: IntegrationTestBackend = _
+  private var sparkHomeDir: Path = _
+  private var kubernetesTestComponents: KubernetesTestComponents = _
+  private var sparkAppConf: SparkAppConf = _
+  private var image: String = _
+  private var containerLocalSparkDistroExamplesJar: String = _
+  private var appLocator: String = _
+  private var driverPodName: String = _
+
+  override def beforeAll(): Unit = {
+// The scalatest-maven-plugin gives system properties that are 
referenced but not set null
+// values. We need to remove the null-value properties before 
initializing the test backend.
+val nullValueProperties = System.getProperties.asScala
+  .filter(entry => entry._2.equals("null"))
+  .map(entry => entry._1.toString)
+nullValueProperties.foreach { key =>
+  System.clearProperty(key)
+}
+
+val sparkDirProp = 
System.getProperty("spark.kubernetes.test.unpackSparkDir")
+require(sparkDirProp != null, "Spark home directory must be provided 
in system properties.")
+sparkHomeDir = Paths.get(sparkDirProp)
+require(sparkHomeDir.toFile.isDirectory,
+  s"No directory found for spark home specified at $sparkHomeDir.")
+val imageTag = getTestImageTag
+val imageRepo = getTestImageRepo
+image = s"$imageRepo/spark:$imageTag"
+
+val sparkDistroExamplesJarFile: File = 
sparkHomeDir.resolve(Paths.get("examples", "jars"))
+  .toFile
+  .listFiles(new 
PatternFilenameFilter(Pattern.compile("^spark-examples_.*\\.jar$")))(0)
+containerLocalSparkDistroExamplesJar = 
s"local:///opt/spark/examples/jars/" +
+  s"${sparkDistroExamplesJarFile.getName}"
+testBackend = IntegrationTestBackendFactory.getTestBackend
+testBackend.initialize()
+kubernetesTestComponents = new 
KubernetesTestComponents(testBackend.getKubernetesClient)
+  }
+
+  override def afterAll(): Unit = {
+testBackend.cleanUp()
+  }
+
+  before {
+appLocator = UUID.randomUUID().toString.replaceAll("-", "")
+driverPodName = "spark-test-app-" + 
UUID.randomUUID().toString.replaceAll("-", "")
+sparkAppConf = kubernetesTestComponents.newSparkAppConf()
+  .set("spark.kubernetes.container.image", image)
+  .set("spark.kubernetes.driver.pod.name", driverPodName)
+  .set("spark.kubernetes.driver.label.spark-app-locator", appLocator)
+  .set("spark.kubernetes.executor.label.spark-app-locator", appLocator)
+if (!kubernetesTestComponents.hasUserSpecifiedNamespace) {
+  kubernetesTestComponents.createNamespace()
+}
+  }
+
+  after {
+if (!kubernetesTestComponents.hasUserSpecifiedNamespace) {

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21467
  
**[Test build #91397 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91397/testReport)**
 for PR 21467 at commit 
[`8505de2`](https://github.com/apache/spark/commit/8505de28231e99b63371d0798545b693692cbce4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-06-01 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21439#discussion_r192465130
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -548,7 +553,9 @@ case class JsonToStructs(
   forceNullableSchema = 
SQLConf.get.getConf(SQLConf.FROM_JSON_FORCE_NULLABLE_SCHEMA))
 
   override def checkInputDataTypes(): TypeCheckResult = nullableSchema 
match {
-case _: StructType | ArrayType(_: StructType, _) | _: MapType =>
+case ArrayType(_: StructType, _) if unpackArray =>
--- End diff --

Even if `unpackArray` is `false`, the next branch in line 558 still do 
`super.checkInputDataTypes()` for any `ArrayType`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21478: [SPARK-24444][DOCS][PYTHON][BRANCH-2.3] Improve Pandas U...

2018-06-01 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21478
  
Looks like it is in there now, thanks @HyukjinKwon and @holdenk !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21478: [SPARK-24444][DOCS][PYTHON][BRANCH-2.3] Improve P...

2018-06-01 Thread BryanCutler
Github user BryanCutler closed the pull request at:

https://github.com/apache/spark/pull/21478


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution w...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21450
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution w...

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21450
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91390/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution w...

2018-06-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21450
  
**[Test build #91390 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91390/testReport)**
 for PR 21450 at commit 
[`12a5145`](https://github.com/apache/spark/commit/12a5145464dc8429ec9b120afb86a0b984deed93).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

2018-06-01 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21092#discussion_r192457908
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ---
@@ -102,17 +110,30 @@ private[spark] object KubernetesConf {
   appId: String,
   mainAppResource: Option[MainAppResource],
   mainClass: String,
-  appArgs: Array[String]): 
KubernetesConf[KubernetesDriverSpecificConf] = {
+  appArgs: Array[String],
+  maybePyFiles: Option[String]): 
KubernetesConf[KubernetesDriverSpecificConf] = {
 val sparkConfWithMainAppJar = sparkConf.clone()
+val additionalFiles = mutable.ArrayBuffer.empty[String]
 mainAppResource.foreach {
-  case JavaMainAppResource(res) =>
-val previousJars = sparkConf
-  .getOption("spark.jars")
-  .map(_.split(","))
-  .getOrElse(Array.empty)
-if (!previousJars.contains(res)) {
-  sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
-}
+case JavaMainAppResource(res) =>
+  val previousJars = sparkConf
+.getOption("spark.jars")
+.map(_.split(","))
+.getOrElse(Array.empty)
+  if (!previousJars.contains(res)) {
+sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
+  }
+// The function of this outer match is to account for multiple 
nonJVM
+// bindings that will all have increased MEMORY_OVERHEAD_FACTOR to 
0.4
+case nonJVM: NonJVMResource =>
+  nonJVM match {
+case PythonMainAppResource(res) =>
+  additionalFiles += res
+  maybePyFiles.foreach{maybePyFiles =>
+additionalFiles.appendAll(maybePyFiles.split(","))}
+  
sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_MAIN_APP_RESOURCE, res)
+  }
+  sparkConfWithMainAppJar.set(MEMORY_OVERHEAD_FACTOR, 0.4)
--- End diff --

I think that power users would want the ability to try to overwrite this if 
they have a specific amount of memory overhead that they want and know that 
they need. Configurations should always be configurable, with defaults that are 
sane. I agree that we can afford to set a better default for Kubernetes, but 
there should always be a way to override default settings if the user knows the 
characteristics of their job. For example if the user does memory profiling of 
their container and sees that it's not using the full amount of memory, they 
can afford to drop this value and leave more resources for other applications.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...

2018-06-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21467#discussion_r192456717
  
--- Diff: python/pyspark/worker.py ---
@@ -140,15 +139,20 @@ def read_single_udf(pickleSer, infile, eval_type):
 else:
 row_func = chain(row_func, f)
 
+# make sure StopIteration's raised in the user code are not
+# ignored, but re-raised as RuntimeError's
+func = fail_on_stopiteration(row_func)
--- End diff --

Ah, sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...

2018-06-01 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request:

https://github.com/apache/spark/pull/21467#discussion_r192455582
  
--- Diff: python/pyspark/worker.py ---
@@ -140,15 +139,20 @@ def read_single_udf(pickleSer, infile, eval_type):
 else:
 row_func = chain(row_func, f)
 
+# make sure StopIteration's raised in the user code are not
+# ignored, but re-raised as RuntimeError's
+func = fail_on_stopiteration(row_func)
--- End diff --

I wanted to avoid the overhead of calling get_argspec even when it's not 
needed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...

2018-06-01 Thread ssuchter
Github user ssuchter commented on a diff in the pull request:

https://github.com/apache/spark/pull/20697#discussion_r192452101
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala
 ---
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.integrationtest
+
+import java.io.File
+import java.nio.file.{Path, Paths}
+import java.util.UUID
+import java.util.regex.Pattern
+
+import scala.collection.JavaConverters._
+
+import com.google.common.io.PatternFilenameFilter
+import io.fabric8.kubernetes.api.model.{Container, Pod}
+import org.scalatest.{BeforeAndAfter, BeforeAndAfterAll}
+import org.scalatest.concurrent.{Eventually, PatienceConfiguration}
+import org.scalatest.time.{Minutes, Seconds, Span}
+
+import org.apache.spark.SparkFunSuite
+import 
org.apache.spark.deploy.k8s.integrationtest.backend.{IntegrationTestBackend, 
IntegrationTestBackendFactory}
+import org.apache.spark.deploy.k8s.integrationtest.config._
+
+private[spark] class KubernetesSuite extends SparkFunSuite
+  with BeforeAndAfterAll with BeforeAndAfter {
+
+  import KubernetesSuite._
+
+  private var testBackend: IntegrationTestBackend = _
+  private var sparkHomeDir: Path = _
+  private var kubernetesTestComponents: KubernetesTestComponents = _
+  private var sparkAppConf: SparkAppConf = _
+  private var image: String = _
+  private var containerLocalSparkDistroExamplesJar: String = _
+  private var appLocator: String = _
+  private var driverPodName: String = _
+
+  override def beforeAll(): Unit = {
+// The scalatest-maven-plugin gives system properties that are 
referenced but not set null
+// values. We need to remove the null-value properties before 
initializing the test backend.
+val nullValueProperties = System.getProperties.asScala
+  .filter(entry => entry._2.equals("null"))
+  .map(entry => entry._1.toString)
+nullValueProperties.foreach { key =>
+  System.clearProperty(key)
+}
+
+val sparkDirProp = 
System.getProperty("spark.kubernetes.test.unpackSparkDir")
+require(sparkDirProp != null, "Spark home directory must be provided 
in system properties.")
+sparkHomeDir = Paths.get(sparkDirProp)
+require(sparkHomeDir.toFile.isDirectory,
+  s"No directory found for spark home specified at $sparkHomeDir.")
+val imageTag = getTestImageTag
+val imageRepo = getTestImageRepo
+image = s"$imageRepo/spark:$imageTag"
+
+val sparkDistroExamplesJarFile: File = 
sparkHomeDir.resolve(Paths.get("examples", "jars"))
+  .toFile
+  .listFiles(new 
PatternFilenameFilter(Pattern.compile("^spark-examples_.*\\.jar$")))(0)
+containerLocalSparkDistroExamplesJar = 
s"local:///opt/spark/examples/jars/" +
+  s"${sparkDistroExamplesJarFile.getName}"
+testBackend = IntegrationTestBackendFactory.getTestBackend
+testBackend.initialize()
+kubernetesTestComponents = new 
KubernetesTestComponents(testBackend.getKubernetesClient)
+  }
+
+  override def afterAll(): Unit = {
+testBackend.cleanUp()
+  }
+
+  before {
+appLocator = UUID.randomUUID().toString.replaceAll("-", "")
+driverPodName = "spark-test-app-" + 
UUID.randomUUID().toString.replaceAll("-", "")
+sparkAppConf = kubernetesTestComponents.newSparkAppConf()
+  .set("spark.kubernetes.container.image", image)
+  .set("spark.kubernetes.driver.pod.name", driverPodName)
+  .set("spark.kubernetes.driver.label.spark-app-locator", appLocator)
+  .set("spark.kubernetes.executor.label.spark-app-locator", appLocator)
+if (!kubernetesTestComponents.hasUserSpecifiedNamespace) {
+  kubernetesTestComponents.createNamespace()
+}
+  }
+
+  after {
+if (!kubernetesTestComponents.hasUserSpecifiedNamespace) {

[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

2018-06-01 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21092#discussion_r192448946
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ---
@@ -102,17 +110,30 @@ private[spark] object KubernetesConf {
   appId: String,
   mainAppResource: Option[MainAppResource],
   mainClass: String,
-  appArgs: Array[String]): 
KubernetesConf[KubernetesDriverSpecificConf] = {
+  appArgs: Array[String],
+  maybePyFiles: Option[String]): 
KubernetesConf[KubernetesDriverSpecificConf] = {
 val sparkConfWithMainAppJar = sparkConf.clone()
+val additionalFiles = mutable.ArrayBuffer.empty[String]
 mainAppResource.foreach {
-  case JavaMainAppResource(res) =>
-val previousJars = sparkConf
-  .getOption("spark.jars")
-  .map(_.split(","))
-  .getOrElse(Array.empty)
-if (!previousJars.contains(res)) {
-  sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
-}
+case JavaMainAppResource(res) =>
+  val previousJars = sparkConf
+.getOption("spark.jars")
+.map(_.split(","))
+.getOrElse(Array.empty)
+  if (!previousJars.contains(res)) {
+sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
+  }
+// The function of this outer match is to account for multiple 
nonJVM
+// bindings that will all have increased MEMORY_OVERHEAD_FACTOR to 
0.4
+case nonJVM: NonJVMResource =>
+  nonJVM match {
+case PythonMainAppResource(res) =>
+  additionalFiles += res
+  maybePyFiles.foreach{maybePyFiles =>
+additionalFiles.appendAll(maybePyFiles.split(","))}
+  
sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_MAIN_APP_RESOURCE, res)
+  }
+  sparkConfWithMainAppJar.set(MEMORY_OVERHEAD_FACTOR, 0.4)
--- End diff --

Yup, you can see my statement about not overriding the explicitly user 
provided value in comment on the 20th ("if the user has specified a different 
value don't think we should override it").

So this logic, as it stands, is K8s specific and I don't think we we can 
change how YARN chooses its memory overhead in a minor release, so I'd expect 
this to remain K8s specific until at least 3.0 when we can evaluate if we want 
to change this in YARN as well.

The memory overhead configuration notice done in the YARN page right now 
(see `spark.yarn.am.memoryOverhead` on 
http://spark.apache.org/docs/latest/running-on-yarn.html ). So I would document 
this in 
http://spark.apache.org/docs/latest/running-on-kubernetes.html#spark-properties 
e.g. `./docs/running-on-kubernetes.md`).

As for intuitive I'd argue that this actually is more intuitive than what 
we do in YARN, we know that users who run R & Python need more non-JVM heap 
space and many users don't know to think about this until their job fails. We 
can take advantage of our knowledge to handle this setting for the user more 
often. You can see how often this confuses folks on the list, docs, and stack 
overflow by looking at "memory overhead exceeded" and "Container killed by YARN 
for exceeding memory limits" and similar.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192446664
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
+dataframe will be ran automatically. HTML table will feedback the 
queries user have defined if
+_repl_html_ called by notebooks like Jupyter, otherwise 
for plain Python REPL, output
--- End diff --

`output ` -> `the output`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192446542
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
+dataframe will be ran automatically. HTML table will feedback the 
queries user have defined if
+_repl_html_ called by notebooks like Jupyter, otherwise 
for plain Python REPL, output
+will be shown like dataframe.show()
+(see https://issues.apache.org/jira/browse/SPARK-24215;>SPARK-24215 for 
more details).
+  
+
+
+  spark.sql.repl.eagerEval.maxNumRows
+  20
+  
+Default number of rows in eager evaluation output HTML table generated 
by _repr_html_ or plain text,
+this only take effect when 
spark.sql.repl.eagerEval.enabled set to true.
--- End diff --

`set to` -> `is set to`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192446886
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
--- End diff --

`REPL` -> `the REPL`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192447943
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
+dataframe will be ran automatically. HTML table will feedback the 
queries user have defined if
--- End diff --

`dataframe` -> `DataFrame/Dataset`

What is `HTML table`? Is the term used in Jupyter only?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21479
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3762/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21479: [SPARK-23903][SQL] Add support for date extract

2018-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21479
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   >