[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21427
  
cc @rxin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91134/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #91134 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91134/testReport)**
 for PR 21366 at commit 
[`45a02de`](https://github.com/apache/spark/commit/45a02de19a07217084caaa0a5d87b424e1b79d2e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF shou...

2018-05-24 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21427#discussion_r190793613
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4931,6 +4931,33 @@ def foo3(key, pdf):
 expected4 = udf3.func((), pdf)
 self.assertPandasEqual(expected4, result4)
 
+def test_column_order(self):
+import pandas as pd
+from pyspark.sql.functions import pandas_udf, col, PandasUDFType
--- End diff --

seems `col` is not used btw.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21426
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3574/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21426
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3573/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21426
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21426
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21427
  
Also, I really think we should mark this feature as experimental.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21426
  
**[Test build #91140 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91140/testReport)**
 for PR 21426 at commit 
[`39b10c5`](https://github.com/apache/spark/commit/39b10c5656a48f813a95d48d752e2d44ccb2c0d9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21427
  
Yea agree with not backporting and agree with configuration. Thing is, the 
configuration is inaccessible in worker.py side. That's why I was hesitant. The 
safest way is just to target 3.0.0 but there are currently many complaints too 
on the other hand.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-05-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21389#discussion_r190791115
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.types._
+
+
+object DataSourceUtils {
+
+  /**
+   * Verify if the schema is supported in datasource.
+   */
+  def verifySchema(format: String, schema: StructType): Unit = {
+def verifyType(dataType: DataType): Unit = dataType match {
+  case BooleanType | ByteType | ShortType | IntegerType | LongType | 
FloatType | DoubleType |
+   StringType | BinaryType | DateType | TimestampType | _: 
DecimalType =>
+
+  case st: StructType => st.foreach { f => verifyType(f.dataType) }
+
+  case ArrayType(elementType, _) => verifyType(elementType)
+
+  case MapType(keyType, valueType, _) =>
+verifyType(keyType)
+verifyType(valueType)
+
+  case udt: UserDefinedType[_] => verifyType(udt.sqlType)
+
+  // For backward-compatibility
--- End diff --

Yes, as long as it does not break anything. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21426
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21426
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3572/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21331: [SPARK-24276][SQL] Order of literals in IN should...

2018-05-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21331#discussion_r190790891
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
 ---
@@ -85,6 +86,9 @@ object Canonicalize {
 case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r)
 case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r)
 
+// order the list in the In operator
+case In(value, list) => In(value, list.sortBy(_.hashCode()))
--- End diff --

Let us exclude IN subqueries from this case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21428
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91135/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21428
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...

2018-05-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21410
  
LGTM except one minor comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21426
  
**[Test build #91139 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91139/testReport)**
 for PR 21426 at commit 
[`15d6ae2`](https://github.com/apache/spark/commit/15d6ae219ac134a277a74f5e4884e4ebc6cfcf34).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21428
  
**[Test build #91135 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91135/testReport)**
 for PR 21428 at commit 
[`e0108d7`](https://github.com/apache/spark/commit/e0108d7bc164b9e5eeb757c13c80bc1d11671188).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21410: [SPARK-24366][SQL] Improving of error messages fo...

2018-05-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21410#discussion_r190790254
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
 ---
@@ -309,6 +322,9 @@ object CatalystTypeConverters {
 case d: JavaBigDecimal => Decimal(d)
 case d: JavaBigInteger => Decimal(d)
 case d: Decimal => d
+case other => throw new IllegalArgumentException(
+  s"The value (${other.toString}) of the type 
(${other.getClass.getCanonicalName}) "
++ s"cannot be converted to ${dataType.simpleString}")
--- End diff --

Let us use `catalogString` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21426
  
I tested:

submit with yarn client: .py local
submit with yarn client: .py remote
submit with standalone client: .py local
submit with standalone client: .py remote

they all work fine.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21415: [SPARK-24244][SPARK-24368][SQL] Passing only requ...

2018-05-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21415


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91133/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21415
  
LGTM

Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #91133 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91133/testReport)**
 for PR 21366 at commit 
[`c398ebb`](https://github.com/apache/spark/commit/c398ebbe71e3ca586961df8fa2033b15235b27c2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21351: [SPARK-24002][SQL][BACKPORT-2.3] Task not serializable c...

2018-05-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21351
  
@imarios Please check the dev mailing list. It is being voted.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21427
  
Do not backport this to 2.3. This is a behavior change. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21427
  
How about making it configurable? Users can choose either resolve by names 
or resolve by positions. It is hard to say which one is right. If the names do 
not match when users want to resolve by names, we should issue an error.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21346
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3571/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21346
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21346
  
**[Test build #91138 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91138/testReport)**
 for PR 21346 at commit 
[`331124b`](https://github.com/apache/spark/commit/331124b125db6b59009e12249542f667a227226e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21428
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91132/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21428
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21428
  
**[Test build #91132 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91132/testReport)**
 for PR 21428 at commit 
[`f3ce675`](https://github.com/apache/spark/commit/f3ce67529372f72370a1e6028dc71a751acf26f2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...

2018-05-24 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21420#discussion_r190783462
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -430,18 +430,15 @@ private[spark] class SparkSubmit extends Logging {
 // Usage: PythonAppRunner   
[app arguments]
 args.mainClass = "org.apache.spark.deploy.PythonRunner"
 args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles) 
++ args.childArgs
-if (clusterManager != YARN) {
-  // The YARN backend distributes the primary file differently, so 
don't merge it.
-  args.files = mergeFileLists(args.files, args.primaryResource)
--- End diff --

it is duplicated with below code, you can check the original code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...

2018-05-24 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21420#discussion_r190783213
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -430,18 +430,15 @@ private[spark] class SparkSubmit extends Logging {
 // Usage: PythonAppRunner   
[app arguments]
 args.mainClass = "org.apache.spark.deploy.PythonRunner"
 args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles) 
++ args.childArgs
-if (clusterManager != YARN) {
-  // The YARN backend distributes the primary file differently, so 
don't merge it.
-  args.files = mergeFileLists(args.files, args.primaryResource)
--- End diff --

Eh @jerryshao why did we remove this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21400: [SPARK-24351][SS]offsetLog/commitLog purge thresholdBatc...

2018-05-24 Thread ivoson
Github user ivoson commented on the issue:

https://github.com/apache/spark/pull/21400
  
@jose-torres thanks for reply. I will try to add a unit test for this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL ...

2018-05-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21411


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21426
  
I haven't tried yet but I believe it has since It downloads into local. It 
has the assumption that the file is local within deploy.PythonRunner side too. 
Will check for doubly sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21428
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91131/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21428
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21428
  
**[Test build #91131 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91131/testReport)**
 for PR 21428 at commit 
[`63d38d8`](https://github.com/apache/spark/commit/63d38d849107eed226449cec8d24c2241cd583c9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...

2018-05-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21411
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...

2018-05-24 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21426
  
Did you try remote py files, does it have similar issue?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...

2018-05-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21411
  
I remember this summary file is disabled by default anyway. I think it's 
fine to just get rid of warnings.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91130/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21427
  
Just for clarification, I am okay @BryanCutler if you feel in this way too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-24 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/21427
  
I do think the current default behavior might be confusing to users and 
hard to debug. I have also received similar complaints. 

I think at the very least, we should make sure when column names of the 
schema and return value matches but orders are different, we should match by 
column name as it is extremely unlikely user want any other behavior in this 
case. This will mostly keep the current behavior unchanged, with the exception 
that "same column name, different order" which the new behavior is strictly 
better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #91130 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91130/testReport)**
 for PR 21366 at commit 
[`d4cf40f`](https://github.com/apache/spark/commit/d4cf40f715b7d6ad8b9d9e3cf9757b2d439f25ea).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21422: [Spark-24376][doc]Summary:compiling spark with scala-2.1...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21422
  
**[Test build #91137 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91137/testReport)**
 for PR 21422 at commit 
[`bf6b801`](https://github.com/apache/spark/commit/bf6b8011abcc9c82e941d7aeceb127f128aecbb0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21422: [Spark-24376][doc]Summary:compiling spark with scala-2.1...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21422
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91137/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21422: [Spark-24376][doc]Summary:compiling spark with scala-2.1...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21422
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21346
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3570/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21346
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21346
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91136/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21346
  
**[Test build #91136 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91136/testReport)**
 for PR 21346 at commit 
[`32f4f94`](https://github.com/apache/spark/commit/32f4f94e3cde50015a8ea478969636fca708cf82).
 * This patch **fails Java style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21346
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...

2018-05-24 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21426#discussion_r190778192
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -372,8 +376,27 @@ private[spark] class SparkSubmit extends Logging {
   localJars = Option(args.jars).map {
 downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
   }.orNull
-  localPyFiles = Option(args.pyFiles).map {
-downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
+  localPyFiles = Option(args.pyFiles).map { pyFiles =>
+if (isClientPythonSubmit) {
--- End diff --

Agreed with @vanzin , we can move this logic to python related code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21422: [Spark-24376][doc]Summary:compiling spark with scala-2.1...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21422
  
**[Test build #91137 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91137/testReport)**
 for PR 21422 at commit 
[`bf6b801`](https://github.com/apache/spark/commit/bf6b8011abcc9c82e941d7aeceb127f128aecbb0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...

2018-05-24 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21426#discussion_r190778033
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -372,8 +376,27 @@ private[spark] class SparkSubmit extends Logging {
   localJars = Option(args.jars).map {
 downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
   }.orNull
-  localPyFiles = Option(args.pyFiles).map {
-downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
+  localPyFiles = Option(args.pyFiles).map { pyFiles =>
+if (isClientPythonSubmit) {
--- End diff --

Yup, it can be. Will try.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21422: [Spark-24376][doc]Summary:compiling spark with scala-2.1...

2018-05-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21422
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21427
  
H .. I got that this one is more preferable and I think we haven't got 
a discussion for this so far if I remember this correctly.

Do you feel strongly about this @icexelloss and @BryanCutler? If so, let's 
update migration guide for 2.4.0 ... and I hope we can document this feature as 
an experimental. I think I could be okay.

Otherwise, I prefer to target this 3.0.0 and document this for now .. 
Another option is to add a configuration to control this behaviour but I 
remember it's tricky to inject the configuration there.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21346
  
**[Test build #91136 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91136/testReport)**
 for PR 21346 at commit 
[`32f4f94`](https://github.com/apache/spark/commit/32f4f94e3cde50015a8ea478969636fca708cf82).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91129/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21390
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21390
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91128/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #91129 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91129/testReport)**
 for PR 21366 at commit 
[`d4cf40f`](https://github.com/apache/spark/commit/d4cf40f715b7d6ad8b9d9e3cf9757b2d439f25ea).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21390
  
**[Test build #91128 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91128/testReport)**
 for PR 21390 at commit 
[`2011eed`](https://github.com/apache/spark/commit/2011eede002664ef75e00f1f0228c5d765753f4c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3446/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3446/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3569/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21428
  
**[Test build #91135 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91135/testReport)**
 for PR 21428 at commit 
[`e0108d7`](https://github.com/apache/spark/commit/e0108d7bc164b9e5eeb757c13c80bc1d11671188).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3568/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #91134 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91134/testReport)**
 for PR 21366 at commit 
[`45a02de`](https://github.com/apache/spark/commit/45a02de19a07217084caaa0a5d87b424e1b79d2e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-24 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/21366
  
Alright, this should be good for review now, with all cleanups and 
appropriate test coverage in place. Please take a look. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to...

2018-05-24 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21366#discussion_r190769478
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsPollingEventSource.scala
 ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.util.concurrent.{Future, ScheduledExecutorService, TimeUnit}
+
+import io.fabric8.kubernetes.client.KubernetesClient
+import scala.collection.JavaConverters._
+
+import org.apache.spark.deploy.k8s.Constants._
+
+private[spark] class ExecutorPodsPollingEventSource(
+kubernetesClient: KubernetesClient,
+eventHandler: ExecutorPodsEventHandler,
+pollingExecutor: ScheduledExecutorService) {
+
+  private var pollingFuture: Future[_] = null
+
+  def start(applicationId: String): Unit = {
+require(pollingFuture == null, "Cannot start polling more than once.")
+pollingFuture = pollingExecutor.scheduleWithFixedDelay(
+  new PollRunnable(applicationId), 0L, 30L, TimeUnit.SECONDS)
+  }
+
+  def stop(): Unit = {
+if (pollingFuture != null) {
+  pollingFuture.cancel(true)
+  pollingFuture = null
--- End diff --

Done, see below.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #91133 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91133/testReport)**
 for PR 21366 at commit 
[`c398ebb`](https://github.com/apache/spark/commit/c398ebbe71e3ca586961df8fa2033b15235b27c2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21428
  
**[Test build #91132 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91132/testReport)**
 for PR 21428 at commit 
[`f3ce675`](https://github.com/apache/spark/commit/f3ce67529372f72370a1e6028dc71a751acf26f2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21415
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21415
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91126/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21415
  
**[Test build #91126 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91126/testReport)**
 for PR 21415 at commit 
[`4115058`](https://github.com/apache/spark/commit/41150585c8a104804cbc59e3e95d2175ea3bc617).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21428
  
**[Test build #91131 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91131/testReport)**
 for PR 21428 at commit 
[`63d38d8`](https://github.com/apache/spark/commit/63d38d849107eed226449cec8d24c2241cd583c9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21428
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21428: [SPARK-24235][SS] Implement continuous shuffle wr...

2018-05-24 Thread jose-torres
GitHub user jose-torres opened a pull request:

https://github.com/apache/spark/pull/21428

[SPARK-24235][SS] Implement continuous shuffle write RDD for single reader 
partition.

## What changes were proposed in this pull request?

Implement continuous shuffle write RDD for a single reader partition. (I 
don't believe any implementation changes are actually required for multiple 
reader partitions, but this PR is already very large, so I want to exclude 
those for now to keep the size down.)

## How was this patch tested?

new unit tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jose-torres/spark writerTask

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21428.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21428


commit 1d6b71898e2a640e3c0809695d2b83f3f84eaa38
Author: Jose Torres 
Date:   2018-05-15T18:07:54Z

continuous shuffle read RDD

commit b5d100875932bdfcb645c8f6b2cdb7b815d84c80
Author: Jose Torres 
Date:   2018-05-17T03:11:11Z

docs

commit af407694a5f13c18568da4a63848f82374a44377
Author: Jose Torres 
Date:   2018-05-17T03:19:37Z

Merge remote-tracking branch 'apache/master' into readerRddMaster

commit 46456dc75a6aec9659b18523c421999debd060eb
Author: Jose Torres 
Date:   2018-05-17T03:22:49Z

fix ctor

commit 2ea8a6f94216e8b184e5780ec3e6ffb2838de382
Author: Jose Torres 
Date:   2018-05-17T03:43:10Z

multiple partition test

commit 955ac79eb05dc389e632d1aaa6c59396835c6ed5
Author: Jose Torres 
Date:   2018-05-17T13:33:51Z

unset task context after test

commit 8cefb724512b51f2aa1fdd81fa8a2d4560e60ce3
Author: Jose Torres 
Date:   2018-05-18T00:00:05Z

conf from RDD

commit f91bfe7e3fc174202d7d5c7cde5a8fb7ce86bfd3
Author: Jose Torres 
Date:   2018-05-18T00:00:44Z

endpoint name

commit 259029298fc42a65e8ebb4d2effe49b7fafa96f1
Author: Jose Torres 
Date:   2018-05-18T00:02:08Z

testing bool

commit 859e6e4dd4dd90ffd70fc9cbd243c94090d72506
Author: Jose Torres 
Date:   2018-05-18T00:22:10Z

tests

commit b23b7bb17abe3cbc873a3144c56d08c88bc0c963
Author: Jose Torres 
Date:   2018-05-18T00:40:55Z

take instead of poll

commit 97f7e8ff865e6054d0d70914ce9bb51880b161f6
Author: Jose Torres 
Date:   2018-05-18T00:58:44Z

add interface

commit de21b1c25a333d44c0521fe151b468e51f0bdc47
Author: Jose Torres 
Date:   2018-05-18T01:02:37Z

clarify comment

commit 7dcf51a13e92a0bb2998e2a12e67d351e1c1a4fc
Author: Jose Torres 
Date:   2018-05-18T22:39:28Z

multiple

commit ad0b5aab320413891f7c21ea6115b6da8d49ccf9
Author: Jose Torres 
Date:   2018-05-25T00:06:15Z

writer with 1 reader partition

commit c9adee5423c2e8a030911008d2e6942045d484bb
Author: Jose Torres 
Date:   2018-05-25T00:15:39Z

docs and iface

commit 63d38d849107eed226449cec8d24c2241cd583c9
Author: Jose Torres 
Date:   2018-05-25T00:27:26Z

Merge remote-tracking branch 'apache/master' into writerTask




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...

2018-05-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21385


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-24 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21427
  
I first glance, I thought this issue was slightly different than 
https://issues.apache.org/jira/browse/SPARK-23929, but yeah it seems to be the 
same.  After reading through that discussion, I guess we need to be careful 
about any changes.  I'm not used to creating DataFrames by position, but it is 
possible to do so with a list of tuples like the example from the doctest:

```
   >>> @pandas_udf("id long, v double", PandasUDFType.GROUPED_MAP)  # 
doctest: +SKIP
   ... def mean_udf(key, pdf):
   ... # key is a tuple of one numpy.int64, which is the value
   ... # of 'id' for the current group
   ... return pd.DataFrame([key + (pdf.v.mean(),)])
  
```
Then this would be a breaking change... so maybe it would be best to add 
better documentation for now like @HyukjinKwon mentioned in SPARK-23929, and 
target a change for Spark 3.0?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to...

2018-05-24 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21366#discussion_r190762965
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/DeterministicExecutorPodsEventQueue.scala
 ---
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import io.fabric8.kubernetes.api.model.Pod
+import scala.collection.mutable
+
+class DeterministicExecutorPodsEventQueue extends ExecutorPodsEventQueue {
+
+  private val eventBuffer = mutable.Buffer.empty[Pod]
+  private val subscribers = mutable.Buffer.empty[(Seq[Pod]) => Unit]
+
+  override def addSubscriber
+  (processBatchIntervalMillis: Long)
+  (onNextBatch: (Seq[Pod]) => Unit): Unit = {
+subscribers += onNextBatch
+  }
+
+  override def stopProcessingEvents(): Unit = {}
+
+  override def pushPodUpdate(updatedPod: Pod): Unit = eventBuffer += 
updatedPod
--- End diff --

Yup, basically just a live stream of the pod statuses as reported by the 
API.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...

2018-05-24 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21426#discussion_r190761869
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -372,8 +376,27 @@ private[spark] class SparkSubmit extends Logging {
   localJars = Option(args.jars).map {
 downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
   }.orNull
-  localPyFiles = Option(args.pyFiles).map {
-downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
+  localPyFiles = Option(args.pyFiles).map { pyFiles =>
+if (isClientPythonSubmit) {
--- End diff --

Couldn't this logic be in `PythonRunner`? That's basically what SparkSubmit 
runs when the conditions you use to create `isClientPythonSubmit` are met.

This class is already pretty hard to navigate, it'd be better to avoid 
adding more special cases to it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91123/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #91123 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91123/testReport)**
 for PR 21366 at commit 
[`5850439`](https://github.com/apache/spark/commit/5850439652fad6bb2b03daf4e35497304c8defdd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-24 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190761062
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -900,6 +900,17 @@ def __call__(self, x):
 self.assertEqual(f, f_.func)
 self.assertEqual(return_type, f_.returnType)
 
+def test_stopiteration_in_udf(self):
+# test for SPARK-23754
+from pyspark.sql.functions import udf
+from py4j.protocol import Py4JJavaError
+
+def foo(x):
+raise StopIteration()
+
+with self.assertRaises(Py4JJavaError):
--- End diff --

Can we check for error message here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isinSet in DataFrame AP...

2018-05-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/21416#discussion_r190759741
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -220,6 +219,7 @@ object OptimizeIn extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan => q transformExpressionsDown {
   case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral
+  case In(v, list) if list.length == 1 => EqualTo(v, list.head)
--- End diff --

Yep. This is that one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >