[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21427 cc @rxin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91134/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 **[Test build #91134 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91134/testReport)** for PR 21366 at commit [`45a02de`](https://github.com/apache/spark/commit/45a02de19a07217084caaa0a5d87b424e1b79d2e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF shou...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21427#discussion_r190793613 --- Diff: python/pyspark/sql/tests.py --- @@ -4931,6 +4931,33 @@ def foo3(key, pdf): expected4 = udf3.func((), pdf) self.assertPandasEqual(expected4, result4) +def test_column_order(self): +import pandas as pd +from pyspark.sql.functions import pandas_udf, col, PandasUDFType --- End diff -- seems `col` is not used btw. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21426 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3574/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21426 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3573/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21426 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21426 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 Also, I really think we should mark this feature as experimental. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21426 **[Test build #91140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91140/testReport)** for PR 21426 at commit [`39b10c5`](https://github.com/apache/spark/commit/39b10c5656a48f813a95d48d752e2d44ccb2c0d9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 Yea agree with not backporting and agree with configuration. Thing is, the configuration is inaccessible in worker.py side. That's why I was hesitant. The safest way is just to target 3.0.0 but there are currently many complaints too on the other hand. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21389#discussion_r190791115 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.types._ + + +object DataSourceUtils { + + /** + * Verify if the schema is supported in datasource. + */ + def verifySchema(format: String, schema: StructType): Unit = { +def verifyType(dataType: DataType): Unit = dataType match { + case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType | + StringType | BinaryType | DateType | TimestampType | _: DecimalType => + + case st: StructType => st.foreach { f => verifyType(f.dataType) } + + case ArrayType(elementType, _) => verifyType(elementType) + + case MapType(keyType, valueType, _) => +verifyType(keyType) +verifyType(valueType) + + case udt: UserDefinedType[_] => verifyType(udt.sqlType) + + // For backward-compatibility --- End diff -- Yes, as long as it does not break anything. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21426 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21426 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3572/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21331: [SPARK-24276][SQL] Order of literals in IN should...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21331#discussion_r190790891 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala --- @@ -85,6 +86,9 @@ object Canonicalize { case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r) case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r) +// order the list in the In operator +case In(value, list) => In(value, list.sortBy(_.hashCode())) --- End diff -- Let us exclude IN subqueries from this case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21428 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91135/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21428 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21410 LGTM except one minor comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21426 **[Test build #91139 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91139/testReport)** for PR 21426 at commit [`15d6ae2`](https://github.com/apache/spark/commit/15d6ae219ac134a277a74f5e4884e4ebc6cfcf34). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21428 **[Test build #91135 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91135/testReport)** for PR 21428 at commit [`e0108d7`](https://github.com/apache/spark/commit/e0108d7bc164b9e5eeb757c13c80bc1d11671188). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21410: [SPARK-24366][SQL] Improving of error messages fo...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21410#discussion_r190790254 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala --- @@ -309,6 +322,9 @@ object CatalystTypeConverters { case d: JavaBigDecimal => Decimal(d) case d: JavaBigInteger => Decimal(d) case d: Decimal => d +case other => throw new IllegalArgumentException( + s"The value (${other.toString}) of the type (${other.getClass.getCanonicalName}) " ++ s"cannot be converted to ${dataType.simpleString}") --- End diff -- Let us use `catalogString` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21426 I tested: submit with yarn client: .py local submit with yarn client: .py remote submit with standalone client: .py local submit with standalone client: .py remote they all work fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21415: [SPARK-24244][SPARK-24368][SQL] Passing only requ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21415 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91133/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21415 LGTM Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 **[Test build #91133 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91133/testReport)** for PR 21366 at commit [`c398ebb`](https://github.com/apache/spark/commit/c398ebbe71e3ca586961df8fa2033b15235b27c2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21351: [SPARK-24002][SQL][BACKPORT-2.3] Task not serializable c...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21351 @imarios Please check the dev mailing list. It is being voted. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21427 Do not backport this to 2.3. This is a behavior change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21427 How about making it configurable? Users can choose either resolve by names or resolve by positions. It is hard to say which one is right. If the names do not match when users want to resolve by names, we should issue an error. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21346 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3571/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21346 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21346 **[Test build #91138 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91138/testReport)** for PR 21346 at commit [`331124b`](https://github.com/apache/spark/commit/331124b125db6b59009e12249542f667a227226e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21428 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91132/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21428 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21428 **[Test build #91132 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91132/testReport)** for PR 21428 at commit [`f3ce675`](https://github.com/apache/spark/commit/f3ce67529372f72370a1e6028dc71a751acf26f2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21420#discussion_r190783462 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -430,18 +430,15 @@ private[spark] class SparkSubmit extends Logging { // Usage: PythonAppRunner [app arguments] args.mainClass = "org.apache.spark.deploy.PythonRunner" args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles) ++ args.childArgs -if (clusterManager != YARN) { - // The YARN backend distributes the primary file differently, so don't merge it. - args.files = mergeFileLists(args.files, args.primaryResource) --- End diff -- it is duplicated with below code, you can check the original code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21420: [SPARK-24377][Spark Submit] make --py-files work ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21420#discussion_r190783213 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -430,18 +430,15 @@ private[spark] class SparkSubmit extends Logging { // Usage: PythonAppRunner [app arguments] args.mainClass = "org.apache.spark.deploy.PythonRunner" args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles) ++ args.childArgs -if (clusterManager != YARN) { - // The YARN backend distributes the primary file differently, so don't merge it. - args.files = mergeFileLists(args.files, args.primaryResource) --- End diff -- Eh @jerryshao why did we remove this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21400: [SPARK-24351][SS]offsetLog/commitLog purge thresholdBatc...
Github user ivoson commented on the issue: https://github.com/apache/spark/pull/21400 @jose-torres thanks for reply. I will try to add a unit test for this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21411 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21426 I haven't tried yet but I believe it has since It downloads into local. It has the assumption that the file is local within deploy.PythonRunner side too. Will check for doubly sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21428 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91131/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21428 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21428 **[Test build #91131 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91131/testReport)** for PR 21428 at commit [`63d38d8`](https://github.com/apache/spark/commit/63d38d849107eed226449cec8d24c2241cd583c9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21411 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correc...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21426 Did you try remote py files, does it have similar issue? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21411 I remember this summary file is disabled by default anyway. I think it's fine to just get rid of warnings. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91130/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 Just for clarification, I am okay @BryanCutler if you feel in this way too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21427 I do think the current default behavior might be confusing to users and hard to debug. I have also received similar complaints. I think at the very least, we should make sure when column names of the schema and return value matches but orders are different, we should match by column name as it is extremely unlikely user want any other behavior in this case. This will mostly keep the current behavior unchanged, with the exception that "same column name, different order" which the new behavior is strictly better. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 **[Test build #91130 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91130/testReport)** for PR 21366 at commit [`d4cf40f`](https://github.com/apache/spark/commit/d4cf40f715b7d6ad8b9d9e3cf9757b2d439f25ea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21422: [Spark-24376][doc]Summary:compiling spark with scala-2.1...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21422 **[Test build #91137 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91137/testReport)** for PR 21422 at commit [`bf6b801`](https://github.com/apache/spark/commit/bf6b8011abcc9c82e941d7aeceb127f128aecbb0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21422: [Spark-24376][doc]Summary:compiling spark with scala-2.1...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21422 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91137/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21422: [Spark-24376][doc]Summary:compiling spark with scala-2.1...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21422 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21346 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3570/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21346 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21346 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91136/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21346 **[Test build #91136 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91136/testReport)** for PR 21346 at commit [`32f4f94`](https://github.com/apache/spark/commit/32f4f94e3cde50015a8ea478969636fca708cf82). * This patch **fails Java style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21346 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21426#discussion_r190778192 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -372,8 +376,27 @@ private[spark] class SparkSubmit extends Logging { localJars = Option(args.jars).map { downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr) }.orNull - localPyFiles = Option(args.pyFiles).map { -downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr) + localPyFiles = Option(args.pyFiles).map { pyFiles => +if (isClientPythonSubmit) { --- End diff -- Agreed with @vanzin , we can move this logic to python related code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21422: [Spark-24376][doc]Summary:compiling spark with scala-2.1...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21422 **[Test build #91137 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91137/testReport)** for PR 21422 at commit [`bf6b801`](https://github.com/apache/spark/commit/bf6b8011abcc9c82e941d7aeceb127f128aecbb0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21426#discussion_r190778033 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -372,8 +376,27 @@ private[spark] class SparkSubmit extends Logging { localJars = Option(args.jars).map { downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr) }.orNull - localPyFiles = Option(args.pyFiles).map { -downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr) + localPyFiles = Option(args.pyFiles).map { pyFiles => +if (isClientPythonSubmit) { --- End diff -- Yup, it can be. Will try. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21422: [Spark-24376][doc]Summary:compiling spark with scala-2.1...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21422 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21427 H .. I got that this one is more preferable and I think we haven't got a discussion for this so far if I remember this correctly. Do you feel strongly about this @icexelloss and @BryanCutler? If so, let's update migration guide for 2.4.0 ... and I hope we can document this feature as an experimental. I think I could be okay. Otherwise, I prefer to target this 3.0.0 and document this for now .. Another option is to add a configuration to control this behaviour but I remember it's tricky to inject the configuration there. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21346 **[Test build #91136 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91136/testReport)** for PR 21346 at commit [`32f4f94`](https://github.com/apache/spark/commit/32f4f94e3cde50015a8ea478969636fca708cf82). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91129/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21390 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21390 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91128/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 **[Test build #91129 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91129/testReport)** for PR 21366 at commit [`d4cf40f`](https://github.com/apache/spark/commit/d4cf40f715b7d6ad8b9d9e3cf9757b2d439f25ea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21390 **[Test build #91128 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91128/testReport)** for PR 21390 at commit [`2011eed`](https://github.com/apache/spark/commit/2011eede002664ef75e00f1f0228c5d765753f4c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3446/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3446/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3569/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21428 **[Test build #91135 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91135/testReport)** for PR 21428 at commit [`e0108d7`](https://github.com/apache/spark/commit/e0108d7bc164b9e5eeb757c13c80bc1d11671188). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3568/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 **[Test build #91134 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91134/testReport)** for PR 21366 at commit [`45a02de`](https://github.com/apache/spark/commit/45a02de19a07217084caaa0a5d87b424e1b79d2e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21366 Alright, this should be good for review now, with all cleanups and appropriate test coverage in place. Please take a look. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r190769478 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsPollingEventSource.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.util.concurrent.{Future, ScheduledExecutorService, TimeUnit} + +import io.fabric8.kubernetes.client.KubernetesClient +import scala.collection.JavaConverters._ + +import org.apache.spark.deploy.k8s.Constants._ + +private[spark] class ExecutorPodsPollingEventSource( +kubernetesClient: KubernetesClient, +eventHandler: ExecutorPodsEventHandler, +pollingExecutor: ScheduledExecutorService) { + + private var pollingFuture: Future[_] = null + + def start(applicationId: String): Unit = { +require(pollingFuture == null, "Cannot start polling more than once.") +pollingFuture = pollingExecutor.scheduleWithFixedDelay( + new PollRunnable(applicationId), 0L, 30L, TimeUnit.SECONDS) + } + + def stop(): Unit = { +if (pollingFuture != null) { + pollingFuture.cancel(true) + pollingFuture = null --- End diff -- Done, see below. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 **[Test build #91133 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91133/testReport)** for PR 21366 at commit [`c398ebb`](https://github.com/apache/spark/commit/c398ebbe71e3ca586961df8fa2033b15235b27c2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21428 **[Test build #91132 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91132/testReport)** for PR 21428 at commit [`f3ce675`](https://github.com/apache/spark/commit/f3ce67529372f72370a1e6028dc71a751acf26f2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21415 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21415 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91126/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21415 **[Test build #91126 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91126/testReport)** for PR 21415 at commit [`4115058`](https://github.com/apache/spark/commit/41150585c8a104804cbc59e3e95d2175ea3bc617). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21428 **[Test build #91131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91131/testReport)** for PR 21428 at commit [`63d38d8`](https://github.com/apache/spark/commit/63d38d849107eed226449cec8d24c2241cd583c9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle write RDD...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21428 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21428: [SPARK-24235][SS] Implement continuous shuffle wr...
GitHub user jose-torres opened a pull request: https://github.com/apache/spark/pull/21428 [SPARK-24235][SS] Implement continuous shuffle write RDD for single reader partition. ## What changes were proposed in this pull request? Implement continuous shuffle write RDD for a single reader partition. (I don't believe any implementation changes are actually required for multiple reader partitions, but this PR is already very large, so I want to exclude those for now to keep the size down.) ## How was this patch tested? new unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/jose-torres/spark writerTask Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21428.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21428 commit 1d6b71898e2a640e3c0809695d2b83f3f84eaa38 Author: Jose TorresDate: 2018-05-15T18:07:54Z continuous shuffle read RDD commit b5d100875932bdfcb645c8f6b2cdb7b815d84c80 Author: Jose Torres Date: 2018-05-17T03:11:11Z docs commit af407694a5f13c18568da4a63848f82374a44377 Author: Jose Torres Date: 2018-05-17T03:19:37Z Merge remote-tracking branch 'apache/master' into readerRddMaster commit 46456dc75a6aec9659b18523c421999debd060eb Author: Jose Torres Date: 2018-05-17T03:22:49Z fix ctor commit 2ea8a6f94216e8b184e5780ec3e6ffb2838de382 Author: Jose Torres Date: 2018-05-17T03:43:10Z multiple partition test commit 955ac79eb05dc389e632d1aaa6c59396835c6ed5 Author: Jose Torres Date: 2018-05-17T13:33:51Z unset task context after test commit 8cefb724512b51f2aa1fdd81fa8a2d4560e60ce3 Author: Jose Torres Date: 2018-05-18T00:00:05Z conf from RDD commit f91bfe7e3fc174202d7d5c7cde5a8fb7ce86bfd3 Author: Jose Torres Date: 2018-05-18T00:00:44Z endpoint name commit 259029298fc42a65e8ebb4d2effe49b7fafa96f1 Author: Jose Torres Date: 2018-05-18T00:02:08Z testing bool commit 859e6e4dd4dd90ffd70fc9cbd243c94090d72506 Author: Jose Torres Date: 2018-05-18T00:22:10Z tests commit b23b7bb17abe3cbc873a3144c56d08c88bc0c963 Author: Jose Torres Date: 2018-05-18T00:40:55Z take instead of poll commit 97f7e8ff865e6054d0d70914ce9bb51880b161f6 Author: Jose Torres Date: 2018-05-18T00:58:44Z add interface commit de21b1c25a333d44c0521fe151b468e51f0bdc47 Author: Jose Torres Date: 2018-05-18T01:02:37Z clarify comment commit 7dcf51a13e92a0bb2998e2a12e67d351e1c1a4fc Author: Jose Torres Date: 2018-05-18T22:39:28Z multiple commit ad0b5aab320413891f7c21ea6115b6da8d49ccf9 Author: Jose Torres Date: 2018-05-25T00:06:15Z writer with 1 reader partition commit c9adee5423c2e8a030911008d2e6942045d484bb Author: Jose Torres Date: 2018-05-25T00:15:39Z docs and iface commit 63d38d849107eed226449cec8d24c2241cd583c9 Author: Jose Torres Date: 2018-05-25T00:27:26Z Merge remote-tracking branch 'apache/master' into writerTask --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21385 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21427 I first glance, I thought this issue was slightly different than https://issues.apache.org/jira/browse/SPARK-23929, but yeah it seems to be the same. After reading through that discussion, I guess we need to be careful about any changes. I'm not used to creating DataFrames by position, but it is possible to do so with a list of tuples like the example from the doctest: ``` >>> @pandas_udf("id long, v double", PandasUDFType.GROUPED_MAP) # doctest: +SKIP ... def mean_udf(key, pdf): ... # key is a tuple of one numpy.int64, which is the value ... # of 'id' for the current group ... return pd.DataFrame([key + (pdf.v.mean(),)]) ``` Then this would be a breaking change... so maybe it would be best to add better documentation for now like @HyukjinKwon mentioned in SPARK-23929, and target a change for Spark 3.0? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r190762965 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/DeterministicExecutorPodsEventQueue.scala --- @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import io.fabric8.kubernetes.api.model.Pod +import scala.collection.mutable + +class DeterministicExecutorPodsEventQueue extends ExecutorPodsEventQueue { + + private val eventBuffer = mutable.Buffer.empty[Pod] + private val subscribers = mutable.Buffer.empty[(Seq[Pod]) => Unit] + + override def addSubscriber + (processBatchIntervalMillis: Long) + (onNextBatch: (Seq[Pod]) => Unit): Unit = { +subscribers += onNextBatch + } + + override def stopProcessingEvents(): Unit = {} + + override def pushPodUpdate(updatedPod: Pod): Unit = eventBuffer += updatedPod --- End diff -- Yup, basically just a live stream of the pod statuses as reported by the API. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21426: [SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21426#discussion_r190761869 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -372,8 +376,27 @@ private[spark] class SparkSubmit extends Logging { localJars = Option(args.jars).map { downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr) }.orNull - localPyFiles = Option(args.pyFiles).map { -downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr) + localPyFiles = Option(args.pyFiles).map { pyFiles => +if (isClientPythonSubmit) { --- End diff -- Couldn't this logic be in `PythonRunner`? That's basically what SparkSubmit runs when the conditions you use to create `isClientPythonSubmit` are met. This class is already pretty hard to navigate, it'd be better to avoid adding more special cases to it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91123/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 **[Test build #91123 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91123/testReport)** for PR 21366 at commit [`5850439`](https://github.com/apache/spark/commit/5850439652fad6bb2b03daf4e35497304c8defdd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190761062 --- Diff: python/pyspark/sql/tests.py --- @@ -900,6 +900,17 @@ def __call__(self, x): self.assertEqual(f, f_.func) self.assertEqual(return_type, f_.returnType) +def test_stopiteration_in_udf(self): +# test for SPARK-23754 +from pyspark.sql.functions import udf +from py4j.protocol import Py4JJavaError + +def foo(x): +raise StopIteration() + +with self.assertRaises(Py4JJavaError): --- End diff -- Can we check for error message here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isinSet in DataFrame AP...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21416#discussion_r190759741 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -220,6 +219,7 @@ object OptimizeIn extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan => q transformExpressionsDown { case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral + case In(v, list) if list.length == 1 => EqualTo(v, list.head) --- End diff -- Yep. This is that one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org