[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18687 Ok. Sounds reasonable. I'm preparing new fix for the case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18645: [SPARK-14280][BUILD][WIP] Update change-version.s...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/18645#discussion_r128891202 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala --- @@ -54,7 +54,10 @@ class TaskContextSuite extends SparkFunSuite with BeforeAndAfter with LocalSpark val rdd = new RDD[String](sc, List()) { override def getPartitions = Array[Partition](StubPartition(0)) override def compute(split: Partition, context: TaskContext) = { -context.addTaskCompletionListener(context => TaskContextSuite.completed = true) +context.addTaskCompletionListener(new TaskCompletionListener { --- End diff -- I think that you can avoid source incompatibilities for Scala users by removing the overloads which accept Scala functions and then adding in a package-level implicit conversion to convert from Scala functions back into our own custom trait / interface. The trickiness here is that you need to preserve binary compatibility on Scala 2.10/2.11, so the removal of the overload needs to be done conditionally so that it only occurs when building with Scala 2.12. Rather than having a separate source tree for 2.12, I'd propose defining the removed overload in a mixin trait which comes from a separate source file and then configure the build to use different versions of that file for 2.10/2.11 and for 2.12. For details on this proposal, see https://docs.google.com/document/d/1P_wmH3U356f079AYgSsN53HKixuNdxSEvo8nw_tgLgM/edit, a document that I wrote in March 2016 which explores these source incompatibility difficulties. Applying that idea here, the idea would be to remove the method ``` def addTaskCompletionListener(f: (TaskContext) => Unit) ``` and add a package-level implicit conversion from `TaskContext => Unit` to `TaskCompletionListener`, but to do this only in the 2.12 source tree / shim. This approach has some caveats and could potentially impact Java users who are doing weird things (violating the goal that Java Spark code is source and binary compatible with all Scala versions). See the linked doc for a full discussion of this problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18687 My above example is pretty common in many Spark SQL use cases. Many users rely on it. As long as one table is cached in one session, the other sessions can use the cached table without reading the table again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18687 @gatorsmile Thanks for reporting that. It is hard to argue the reported case is valid in semantics. Actually ds1 and ds2 are two different Datasets. In semantics, you cache one Dataset, why another Dataset needs to be cached too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/18388 @jinxing64 Sorry, I forgot to mention one request. Could you add a unit test? Right now it's disabled so the new codes are not tested. It will help avoid some obvious mistakes, such as the missing `return` issue :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18388 **[Test build #79858 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79858/testReport)** for PR 18388 at commit [`4de417f`](https://github.com/apache/spark/commit/4de417f946430dd6d963768583d5fa1f22fe4622). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18388 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18645: [SPARK-14280][BUILD][WIP] Update change-version.sh and p...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18645 @srowen You just showed that the Scala 2.12 changes are source breaking, isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18645: [SPARK-14280][BUILD][WIP] Update change-version.s...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18645#discussion_r128890891 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -353,7 +353,7 @@ class DatasetSuite extends QueryTest with SharedSQLContext { test("foreachPartition") { val ds = Seq(("a", 1), ("b", 2), ("c", 3)).toDS() val acc = sparkContext.longAccumulator -ds.foreachPartition(_.foreach(v => acc.add(v._2))) +ds.foreachPartition((it: Iterator[(String, Int)]) => it.foreach(v => acc.add(v._2))) --- End diff -- isn't this a source breaking change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18645: [SPARK-14280][BUILD][WIP] Update change-version.s...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18645#discussion_r128890868 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala --- @@ -54,7 +54,10 @@ class TaskContextSuite extends SparkFunSuite with BeforeAndAfter with LocalSpark val rdd = new RDD[String](sc, List()) { override def getPartitions = Array[Partition](StubPartition(0)) override def compute(split: Partition, context: TaskContext) = { -context.addTaskCompletionListener(context => TaskContextSuite.completed = true) +context.addTaskCompletionListener(new TaskCompletionListener { --- End diff -- isn't this a source breaking change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18687 Thanks for fixing this, but this PR does not fix all the cases that caused by our materialized plans in the QueryExecution. For examples, ```Scala Seq("1", "2").toDF().write.saveAsTable("t") val ds1 = spark.table("t") val ds2 = spark.table("t") ds1.collect() ds2.persist() ds1.collect() --> this still use the uncached plan. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18388 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18388 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79857/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18388 **[Test build #79857 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79857/testReport)** for PR 18388 at commit [`4de417f`](https://github.com/apache/spark/commit/4de417f946430dd6d963768583d5fa1f22fe4622). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18388 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79856/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18388 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18388 **[Test build #79856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79856/testReport)** for PR 18388 at commit [`5f622c3`](https://github.com/apache/spark/commit/5f622c3da3b65b8d183e329ac641caa1c9aed9bb). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18388 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79855/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18388 **[Test build #79855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79855/testReport)** for PR 18388 at commit [`4bfeabb`](https://github.com/apache/spark/commit/4bfeabb8755b71f161f086ef68f95f522b848f23). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18388 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18709: [SPARK-21504] [SQL] Add spark version info into t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18709#discussion_r128890400 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -345,6 +347,7 @@ object CatalogTable { val VIEW_QUERY_OUTPUT_PREFIX = "view.query.out." val VIEW_QUERY_OUTPUT_NUM_COLUMNS = VIEW_QUERY_OUTPUT_PREFIX + "numCols" val VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX = VIEW_QUERY_OUTPUT_PREFIX + "col." + val SCHEMA_SPARK_VERSION = "spark.sql.create.version" --- End diff -- `CREATED_SPARK_VERSION`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18709: [SPARK-21504] [SQL] Add spark version info into t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18709#discussion_r128890390 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -304,6 +305,7 @@ case class CatalogTable( if (owner.nonEmpty) map.put("Owner", owner) map.put("Created", new Date(createTime).toString) --- End diff -- not related, but seems `Created Time` is better? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18709: [SPARK-21504] [SQL] Add spark version info into t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18709#discussion_r128890385 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -304,6 +305,7 @@ case class CatalogTable( if (owner.nonEmpty) map.put("Owner", owner) map.put("Created", new Date(createTime).toString) map.put("Last Access", new Date(lastAccessTime).toString) +map.put("Create Version", createVersion) --- End diff -- Created Version? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18709: [SPARK-21504] [SQL] Add spark version info into t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18709#discussion_r128890382 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -217,6 +217,7 @@ case class CatalogTable( owner: String = "", createTime: Long = System.currentTimeMillis, lastAccessTime: Long = -1, +createVersion: String = org.apache.spark.SPARK_VERSION, --- End diff -- add parameter doc? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17006: [SPARK-17636] Parquet filter push down doesn't handle st...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17006 Does that one deal with nested filter access as well we nested column pruning? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17006: [SPARK-17636] Parquet filter push down doesn't handle st...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17006 No, this is filter push down whereas that one is column pruneing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18313: [SPARK-21087] [ML] CrossValidator, TrainValidatio...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18313#discussion_r12283 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -113,15 +122,28 @@ class CrossValidator @Since("1.2.0") (@Since("1.4.0") override val uid: String) // multi-model training logDebug(s"Train split $splitIndex with multiple sets of parameters.") val models = est.fit(trainingDataset, epm).asInstanceOf[Seq[Model[_]]] - trainingDataset.unpersist() + var i = 0 while (i < numModels) { // TODO: duplicate evaluator to take extra params from input val metric = eval.evaluate(models(i).transform(validationDataset, epm(i))) logDebug(s"Got metric $metric for model trained with ${epm(i)}.") +if (isDefined(modelPreservePath)) { + models(i) match { +case w: MLWritable => + // e.g. maxIter-5-regParam-0.001-split0-0.859 + val fileName = epm(i).toSeq.map(p => p.param.name + "-" + p.value).sorted +.mkString("-") + s"-split$splitIndex-${math.rint(metric * 1000) / 1000}" + w.save(new Path($(modelPreservePath), fileName).toString) +case _ => + // for third-party algorithms + logWarning(models(i).uid + " did not implement MLWritable. Serialization omitted.") + } +} metrics(i) += metric --- End diff -- Yes I think so. In order to save time, I would like to take over this feature, if you don't mind. ping @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17006: [SPARK-17636] Parquet filter push down doesn't handle st...
Github user Gauravshah commented on the issue: https://github.com/apache/spark/pull/17006 https://github.com/apache/spark/pull/16578 PR should solve this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16992 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16992 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79854/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16992 **[Test build #79854 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79854/testReport)** for PR 16992 at commit [`2e5afe4`](https://github.com/apache/spark/commit/2e5afe4852699aea7e33b0c889b78202b5fe184c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18388 **[Test build #79857 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79857/testReport)** for PR 18388 at commit [`4de417f`](https://github.com/apache/spark/commit/4de417f946430dd6d963768583d5fa1f22fe4622). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18313: [SPARK-21087] [ML] CrossValidator, TrainValidatio...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18313#discussion_r128886371 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -113,15 +122,28 @@ class CrossValidator @Since("1.2.0") (@Since("1.4.0") override val uid: String) // multi-model training logDebug(s"Train split $splitIndex with multiple sets of parameters.") val models = est.fit(trainingDataset, epm).asInstanceOf[Seq[Model[_]]] - trainingDataset.unpersist() + var i = 0 while (i < numModels) { // TODO: duplicate evaluator to take extra params from input val metric = eval.evaluate(models(i).transform(validationDataset, epm(i))) logDebug(s"Got metric $metric for model trained with ${epm(i)}.") +if (isDefined(modelPreservePath)) { + models(i) match { +case w: MLWritable => + // e.g. maxIter-5-regParam-0.001-split0-0.859 + val fileName = epm(i).toSeq.map(p => p.param.name + "-" + p.value).sorted +.mkString("-") + s"-split$splitIndex-${math.rint(metric * 1000) / 1000}" + w.save(new Path($(modelPreservePath), fileName).toString) +case _ => + // for third-party algorithms + logWarning(models(i).uid + " did not implement MLWritable. Serialization omitted.") + } +} metrics(i) += metric --- End diff -- so you want to keep all the trained models in CrossValidatorModel? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18388 **[Test build #79856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79856/testReport)** for PR 18388 at commit [`5f622c3`](https://github.com/apache/spark/commit/5f622c3da3b65b8d183e329ac641caa1c9aed9bb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18388 **[Test build #79855 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79855/testReport)** for PR 18388 at commit [`4bfeabb`](https://github.com/apache/spark/commit/4bfeabb8755b71f161f086ef68f95f522b848f23). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18594: [SPARK-20904][core] Don't report task failures to driver...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18594 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79852/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18594: [SPARK-20904][core] Don't report task failures to driver...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18594 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18594: [SPARK-20904][core] Don't report task failures to driver...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18594 **[Test build #79852 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79852/testReport)** for PR 18594 at commit [`a68c2f2`](https://github.com/apache/spark/commit/a68c2f2478f190ac56a491801c98ebda862605a6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18707: [SPARK-21503][UI]: Spark UI shows incorrect task status ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18707 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79850/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18707: [SPARK-21503][UI]: Spark UI shows incorrect task status ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18707 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18707: [SPARK-21503][UI]: Spark UI shows incorrect task status ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18707 **[Test build #79850 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79850/testReport)** for PR 18707 at commit [`172fc20`](https://github.com/apache/spark/commit/172fc20898896058b7288360eb5292ed9df9d79c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18709: [SPARK-21504] [SQL] Add spark version info into table me...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18709 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79853/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18709: [SPARK-21504] [SQL] Add spark version info into table me...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18709 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18709: [SPARK-21504] [SQL] Add spark version info into table me...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18709 **[Test build #79853 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79853/testReport)** for PR 18709 at commit [`ccbd3a9`](https://github.com/apache/spark/commit/ccbd3a96e5d9fe154f8adec172179fd0021eada2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18645: [SPARK-14280][BUILD][WIP] Update change-version.sh and p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18645 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79851/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18645: [SPARK-14280][BUILD][WIP] Update change-version.sh and p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18645 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18645: [SPARK-14280][BUILD][WIP] Update change-version.sh and p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18645 **[Test build #79851 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79851/testReport)** for PR 18645 at commit [`fec76be`](https://github.com/apache/spark/commit/fec76beb6a3b63e698b57d93e61f8254b56d4b0d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18652 @viirya I have not start reading the comments and the codes carefully. Just want to confirm whether the code changes in this PR follow what Hive is doing when we turn on the flag? If not, what is the behavior difference? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18710: [SPARK][Docs] Added note on meaning of position to subst...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18710 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128882147 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java --- @@ -130,11 +143,25 @@ private void processFetchRequest(final ChunkFetchRequest req) { return; } -respond(new ChunkFetchSuccess(req.streamChunkId, buf)); +respond(new ChunkFetchSuccess(req.streamChunkId, buf)).addListener(future -> { + streamManager.chunkSent(req.streamChunkId.streamId); +}); } private void processStreamRequest(final StreamRequest req) { +if (logger.isTraceEnabled()) { + logger.trace("Received req from {} to fetch stream {}", getRemoteAddress(channel), +req.streamId); +} + +long chunksBeingTransferred = streamManager.chunksBeingTransferred(); +if (chunksBeingTransferred > maxChunksBeingTransferred) { + logger.warn("The number of chunks being transferred {} is above {}, close the connection.", +chunksBeingTransferred, maxChunksBeingTransferred); + channel.close(); +} --- End diff -- To make the error handling simple, you can increase chunksBeingTransferred just before writing the chunk to the channel, and decrease it in the future returned by write. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18705: [SPARK-21502][Mesos] fix --supervise for mesos in cluste...
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/18705 @skonto LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18710: [SPARK][Docs] Added note on meaning of position t...
GitHub user maclockard opened a pull request: https://github.com/apache/spark/pull/18710 [SPARK][Docs] Added note on meaning of position to substring function ## What changes were proposed in this pull request? Enhanced some existing documentation Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maclockard/spark maclockard-patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18710.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18710 commit d503359235a06f92605755fb994272a03b4d2743 Author: Mac Date: 2017-07-22T00:01:24Z Added note on meaning of position to substring function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18709: [SPARK-21504] [SQL] Add spark version info into table me...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18709 cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18709: [SPARK-21504] [SQL] Add spark version info into t...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18709#discussion_r128881870 --- Diff: sql/core/src/test/resources/sql-tests/results/describe-table-after-alter-table.sql.out --- @@ -25,6 +25,7 @@ Database default Table table_with_comment Created [not included in comparison] Last Access [not included in comparison] +Create Version [not included in comparison] --- End diff -- I manually verified all these version values are right. To avoid the unnecessary test result updates, hide the values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18698: [SPARK-21434][Python][DOCS] Add pyspark pip docum...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18698 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128881079 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java --- @@ -130,11 +143,25 @@ private void processFetchRequest(final ChunkFetchRequest req) { return; } -respond(new ChunkFetchSuccess(req.streamChunkId, buf)); +respond(new ChunkFetchSuccess(req.streamChunkId, buf)).addListener(future -> { + streamManager.chunkSent(req.streamChunkId.streamId); +}); } private void processStreamRequest(final StreamRequest req) { +if (logger.isTraceEnabled()) { + logger.trace("Received req from {} to fetch stream {}", getRemoteAddress(channel), +req.streamId); +} + +long chunksBeingTransferred = streamManager.chunksBeingTransferred(); +if (chunksBeingTransferred > maxChunksBeingTransferred) { + logger.warn("The number of chunks being transferred {} is above {}, close the connection.", +chunksBeingTransferred, maxChunksBeingTransferred); + channel.close(); +} --- End diff -- Also please decrease `chunksBeingTransferred` for when sending ChunkFetchFailure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128879751 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java --- @@ -130,11 +143,25 @@ private void processFetchRequest(final ChunkFetchRequest req) { return; } -respond(new ChunkFetchSuccess(req.streamChunkId, buf)); +respond(new ChunkFetchSuccess(req.streamChunkId, buf)).addListener(future -> { + streamManager.chunkSent(req.streamChunkId.streamId); +}); } private void processStreamRequest(final StreamRequest req) { +if (logger.isTraceEnabled()) { + logger.trace("Received req from {} to fetch stream {}", getRemoteAddress(channel), +req.streamId); +} + +long chunksBeingTransferred = streamManager.chunksBeingTransferred(); +if (chunksBeingTransferred > maxChunksBeingTransferred) { + logger.warn("The number of chunks being transferred {} is above {}, close the connection.", +chunksBeingTransferred, maxChunksBeingTransferred); + channel.close(); +} ManagedBuffer buf; + --- End diff -- nit: extra empty line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128879607 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/OneForOneStreamManager.java --- @@ -96,18 +103,23 @@ public ManagedBuffer getChunk(long streamId, int chunkIndex) { @Override public ManagedBuffer openStream(String streamChunkId) { -String[] array = streamChunkId.split("_"); -assert array.length == 2: - "Stream id and chunk index should be specified when open stream for fetching block."; -long streamId = Long.valueOf(array[0]); -int chunkIndex = Integer.valueOf(array[1]); -return getChunk(streamId, chunkIndex); +Pair streamChunkIdPair = parseStreamChunkId(streamChunkId); +return getChunk(streamChunkIdPair.getLeft(), streamChunkIdPair.getRight()); } public static String genStreamChunkId(long streamId, int chunkId) { return String.format("%d_%d", streamId, chunkId); } + public static Pair parseStreamChunkId(String streamChunkId) { --- End diff -- nit: please document the meaning of the return value for this public method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128879502 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/OneForOneStreamManager.java --- @@ -122,6 +134,7 @@ public void connectionTerminated(Channel channel) { } } } + --- End diff -- nit: extra empty line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128879315 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/OneForOneStreamManager.java --- @@ -53,9 +56,13 @@ // that the caller only requests each chunk one at a time, in order. int curChunk = 0; +// Used to keep track of the number of chunks being transferred and not finished yet. +AtomicLong chunksBeingTransferred; --- End diff -- @jinxing64 `chunksBeingTransferred` are modified in the same thread. Not a big deal though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18705: [SPARK-21502][Mesos] fix --supervise for mesos in...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18705#discussion_r128878694 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -369,7 +369,8 @@ private[spark] class MesosClusterScheduler( } private def getDriverFrameworkID(desc: MesosDriverDescription): String = { -s"${frameworkId}-${desc.submissionId}" +val retries = desc.retryState.map{d => s"-retry-${d.retries.toString}"} --- End diff -- nit: add spaces around the braces --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18705: [SPARK-21502][Mesos] fix --supervise for mesos in...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18705#discussion_r128878929 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -369,7 +369,8 @@ private[spark] class MesosClusterScheduler( } private def getDriverFrameworkID(desc: MesosDriverDescription): String = { -s"${frameworkId}-${desc.submissionId}" +val retries = desc.retryState.map{d => s"-retry-${d.retries.toString}"} +s"${frameworkId}-${desc.submissionId}${retries.getOrElse("")}" --- End diff -- nit: move the `getOrElse()` call out of the string for clarity? val suffix = desc.retryState.map { }.getOrElse("") --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128879045 --- Diff: docs/configuration.md --- @@ -1809,6 +1809,14 @@ Apart from these, the following properties are also available, and may be useful + spark.shuffle.maxChunksBeingTransferred + Long.MAX_VALUE + +The max number of chunks being transferred at the same time. This config helps avoid OOM on --- End diff -- Please also move this to `Shuffle Behavior` section. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18705: [SPARK-21502][Mesos] fix --supervise for mesos in...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18705#discussion_r128878713 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -672,3 +682,9 @@ private class Slave(val hostname: String) { var taskFailures = 0 var shuffleRegistered = false } + +object IdHelper { + // Use atomic values since Spark contexts can initialized in parallel --- End diff -- "can be initialized" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128878875 --- Diff: docs/configuration.md --- @@ -1809,6 +1809,14 @@ Apart from these, the following properties are also available, and may be useful + spark.shuffle.maxChunksBeingTransferred + Long.MAX_VALUE + +The max number of chunks being transferred at the same time. This config helps avoid OOM on --- End diff -- nit: `The max number of chunks allowed to being transferred at the same time on shuffle service.` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128878596 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java --- @@ -130,11 +143,25 @@ private void processFetchRequest(final ChunkFetchRequest req) { return; } -respond(new ChunkFetchSuccess(req.streamChunkId, buf)); +respond(new ChunkFetchSuccess(req.streamChunkId, buf)).addListener(future -> { + streamManager.chunkSent(req.streamChunkId.streamId); +}); } private void processStreamRequest(final StreamRequest req) { +if (logger.isTraceEnabled()) { + logger.trace("Received req from {} to fetch stream {}", getRemoteAddress(channel), +req.streamId); +} + +long chunksBeingTransferred = streamManager.chunksBeingTransferred(); +if (chunksBeingTransferred > maxChunksBeingTransferred) { + logger.warn("The number of chunks being transferred {} is above {}, close the connection.", +chunksBeingTransferred, maxChunksBeingTransferred); + channel.close(); +} --- End diff -- missing `return` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128878556 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java --- @@ -118,6 +124,13 @@ private void processFetchRequest(final ChunkFetchRequest req) { req.streamChunkId); } +long chunksBeingTransferred = streamManager.chunksBeingTransferred(); +if (chunksBeingTransferred > maxChunksBeingTransferred) { + logger.warn("The number of chunks being transferred {} is above {}, close the connection.", +chunksBeingTransferred, maxChunksBeingTransferred); + channel.close(); +} --- End diff -- missing `return`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16992 **[Test build #79854 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79854/testReport)** for PR 16992 at commit [`2e5afe4`](https://github.com/apache/spark/commit/2e5afe4852699aea7e33b0c889b78202b5fe184c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128878335 --- Diff: common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java --- @@ -257,4 +257,7 @@ public Properties cryptoConf() { return CryptoUtils.toCryptoConf("spark.network.crypto.config.", conf.getAll()); } + public long maxChunksBeingTransferred() { +return conf.getLong("spark.network.shuffle.maxChunksBeingTransferred", Long.MAX_VALUE); --- End diff -- This default value totally depends on the JVM heap size. Seems hard to pick up a reasonable value. If it's too small, then if a user uses a large heap size, when they upgrade, their shuffle service may start to fail. If it's too large, it's just the same as MAX_VALUE. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18707: [SPARK-21503][UI]: Spark UI shows incorrect task ...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18707#discussion_r128876674 --- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala --- @@ -140,6 +140,8 @@ class ExecutorsListener(storageStatusListener: StorageStatusListener, conf: Spar return case _: ExceptionFailure => taskSummary.tasksFailed += 1 +case _: ExecutorLostFailure => --- End diff -- Looks like we can use `info.successful`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18709: [SPARK-21504] [SQL] Add spark version info into table me...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18709 **[Test build #79853 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79853/testReport)** for PR 18709 at commit [`ccbd3a9`](https://github.com/apache/spark/commit/ccbd3a96e5d9fe154f8adec172179fd0021eada2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18709: [SPARK-21504] [SQL] Add spark version info into t...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/18709 [SPARK-21504] [SQL] Add spark version info into table metadata ## What changes were proposed in this pull request? This PR is to add the spark version info in the table metadata. When creating the table, this value is assigned. It can help users find which version of Spark was used to create the table. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark addVersion Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18709.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18709 commit ccbd3a96e5d9fe154f8adec172179fd0021eada2 Author: gatorsmile Date: 2017-07-21T22:22:02Z fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18594: [SPARK-20904][core] Don't report task failures to driver...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18594 **[Test build #79852 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79852/testReport)** for PR 18594 at commit [`a68c2f2`](https://github.com/apache/spark/commit/a68c2f2478f190ac56a491801c98ebda862605a6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18594: [SPARK-20904][core] Don't report task failures to...
Github user jsoltren commented on a diff in the pull request: https://github.com/apache/spark/pull/18594#discussion_r128873660 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -473,29 +473,36 @@ private[spark] class Executor( // the default uncaught exception handler, which will terminate the Executor. logError(s"Exception in $taskName (TID $taskId)", t) - // Collect latest accumulator values to report back to the driver - val accums: Seq[AccumulatorV2[_, _]] = -if (task != null) { - task.metrics.setExecutorRunTime(System.currentTimeMillis() - taskStart) - task.metrics.setJvmGCTime(computeTotalGcTime() - startGCTime) - task.collectAccumulatorUpdates(taskFailed = true) -} else { - Seq.empty -} + // SPARK-20904: Do not report failure to driver if if happened during shut down. Because + // libraries may set up shutdown hooks that race with running tasks during shutdown, + // spurious failures may occur and can result in improper accounting in the driver (e.g. + // the task failure would not be ignored if the shutdown happened because of premption, + // instead of an app issue). + if (!ShutdownHookManager.inShutdown()) { --- End diff -- Yeah, it isn't guaranteed. I'm thinking that if this happens often enough maybe one executor will print the message, giving a clue to the user. Also it's a de-facto code comment. Yes, any daemon thread will terminate at any time at shutdown - even finishing this block isn't guaranteed. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18645: [SPARK-14280][BUILD][WIP] Update change-version.s...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18645#discussion_r128872194 --- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala --- @@ -89,6 +89,14 @@ trait FutureAction[T] extends Future[T] { */ override def value: Option[Try[T]] + // These two methods must be implemented in Scala 2.12, but won't be used by Spark + + def transform[S](f: (Try[T]) => Try[S])(implicit executor: ExecutionContext): Future[S] = --- End diff -- I tried compiling a small app that calls `RDD.countAsync` (which returns a `FutureAction`) and even implements a custom `FutureAction` and compiled vs 2.2.0, then ran vs this build, and it worked. I believe this may be legitimately excluded from MiMa. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18645: [SPARK-14280][BUILD][WIP] Update change-version.sh and p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18645 **[Test build #79851 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79851/testReport)** for PR 18645 at commit [`fec76be`](https://github.com/apache/spark/commit/fec76beb6a3b63e698b57d93e61f8254b56d4b0d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18594: [SPARK-20904][core] Don't report task failures to...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18594#discussion_r128872073 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -473,29 +473,36 @@ private[spark] class Executor( // the default uncaught exception handler, which will terminate the Executor. logError(s"Exception in $taskName (TID $taskId)", t) - // Collect latest accumulator values to report back to the driver - val accums: Seq[AccumulatorV2[_, _]] = -if (task != null) { - task.metrics.setExecutorRunTime(System.currentTimeMillis() - taskStart) - task.metrics.setJvmGCTime(computeTotalGcTime() - startGCTime) - task.collectAccumulatorUpdates(taskFailed = true) -} else { - Seq.empty -} + // SPARK-20904: Do not report failure to driver if if happened during shut down. Because + // libraries may set up shutdown hooks that race with running tasks during shutdown, + // spurious failures may occur and can result in improper accounting in the driver (e.g. + // the task failure would not be ignored if the shutdown happened because of premption, + // instead of an app issue). + if (!ShutdownHookManager.inShutdown()) { --- End diff -- Sure, I can add a log, but it's not guaranteed to be printed. During shutdown the JVM can die at any moment (only shutdown hooks run to completion, and this is not one of them)... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18594: [SPARK-20904][core] Don't report task failures to...
Github user jsoltren commented on a diff in the pull request: https://github.com/apache/spark/pull/18594#discussion_r128871196 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -473,29 +473,36 @@ private[spark] class Executor( // the default uncaught exception handler, which will terminate the Executor. logError(s"Exception in $taskName (TID $taskId)", t) - // Collect latest accumulator values to report back to the driver - val accums: Seq[AccumulatorV2[_, _]] = -if (task != null) { - task.metrics.setExecutorRunTime(System.currentTimeMillis() - taskStart) - task.metrics.setJvmGCTime(computeTotalGcTime() - startGCTime) - task.collectAccumulatorUpdates(taskFailed = true) -} else { - Seq.empty -} + // SPARK-20904: Do not report failure to driver if if happened during shut down. Because + // libraries may set up shutdown hooks that race with running tasks during shutdown, + // spurious failures may occur and can result in improper accounting in the driver (e.g. + // the task failure would not be ignored if the shutdown happened because of premption, + // instead of an app issue). + if (!ShutdownHookManager.inShutdown()) { --- End diff -- At this point I don't think we have any information on why we're in shutdown, whether it is an app issue, the Spark executor process being killed from the command line, etc. Yes, a nice log message would be nice. Maybe, in the else clause to this if, something like logInfo(s"Not reporting failure as we are in the middle of a shutdown"). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18708: [SPARK-21339] [CORE] spark-shell --packages option does ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18708 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18708: [SPARK-21339] [CORE] spark-shell --packages optio...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/18708 [SPARK-21339] [CORE] spark-shell --packages option does not add jars to classpath on windows ## What changes were proposed in this pull request? The --packages option jars are getting added to the classpath with the scheme as "file:///", in Unix it doesn't have problem with this since the scheme contains the Unix Path separator which separates the jar name with location in the classpath. In Windows, the jar file is not getting resolved from the classpath because of the scheme. Windows : file:///C:/Users//.ivy2/jars/.jar Unix : file:///home//.ivy2/jars/.jar With this PR, we are avoiding the 'file://' scheme to get added to the packages jar files. ## How was this patch tested? I have verified manually in Windows and Unix environments, with the change it adds the jar to classpath like below, Windows : C:\Users\\.ivy2\jars\.jar Unix : /home//.ivy2/jars/.jar You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-21339 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18708.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18708 commit 3242532d4815fa9595dd9ca2d4d0b86c6d206ddb Author: Devaraj K Date: 2017-07-21T21:51:46Z [SPARK-21339] [CORE] spark-shell --packages option does not add jars to classpath on windows --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18707: [SPARK-21503][UI]: Spark UI shows incorrect task status ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18707 **[Test build #79850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79850/testReport)** for PR 18707 at commit [`172fc20`](https://github.com/apache/spark/commit/172fc20898896058b7288360eb5292ed9df9d79c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18707: [SPARK-21503][UI]: Fixed the issue
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18707 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18707: [SPARK-21503][UI]: Fixed the issue
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18707 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18707: [SPARK-21503][UI]: Fixed the issue
GitHub user pgandhi999 opened a pull request: https://github.com/apache/spark/pull/18707 [SPARK-21503][UI]: Fixed the issue Added the case ExecutorLostFailure which was previously not there, thus, the default case would be executed in which case, task would be marked as completed. ## What changes were proposed in this pull request? Added the case ExecutorLostFailure in the ExecutorsTab.scala class, which will consider all those cases where executor connection to Spark Driver was lost due to killing the executor process, network connection etc. ## How was this patch tested? Manually Tested the fix by observing the UI change before and after. Before: https://user-images.githubusercontent.com/8190/28482929-571c9cea-6e30-11e7-93dd-728de5cdea95.png";> After: https://user-images.githubusercontent.com/8190/28482964-8649f5ee-6e30-11e7-91bd-2eb2089c61cc.png";> Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pgandhi999/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18707.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18707 commit 172fc20898896058b7288360eb5292ed9df9d79c Author: pgandhi Date: 2017-07-21T21:00:22Z [SPARK-21503]: Fixed the issue Added the case ExecutorLostFailure which was previously not there, thus, the default case would be executed in which case, task would be marked as completed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18706: [SPARK-21494][network] Use correct app id when authentic...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18706 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18706: [SPARK-21494][network] Use correct app id when authentic...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18706 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79846/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18706: [SPARK-21494][network] Use correct app id when authentic...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18706 **[Test build #79846 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79846/testReport)** for PR 18706 at commit [`4e6cc53`](https://github.com/apache/spark/commit/4e6cc532009efd2325be97c262f37e154ac17370). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18519: [SPARK-16742] Mesos Kerberos Support
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18519 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79849/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18519: [SPARK-16742] Mesos Kerberos Support
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18519 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18519: [SPARK-16742] Mesos Kerberos Support
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18519 **[Test build #79849 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79849/testReport)** for PR 18519 at commit [`e6a7357`](https://github.com/apache/spark/commit/e6a73573c9276d9bd68ae23f38eef15f9897ffef). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18705: [SPARK-21502][Mesos] fix --supervise for mesos in cluste...
Github user ArtRand commented on the issue: https://github.com/apache/spark/pull/18705 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18705: [SPARK-21502][Mesos] fix --supervise for mesos in cluste...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18705 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18705: [SPARK-21502][Mesos] fix --supervise for mesos in cluste...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79844/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18705: [SPARK-21502][Mesos] fix --supervise for mesos in cluste...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18705 **[Test build #79844 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79844/testReport)** for PR 18705 at commit [`b987c4b`](https://github.com/apache/spark/commit/b987c4b28c3aa96f39e78dcc74da570226c6bdba). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18696: [SPARK-21490][core] Make sure SparkLauncher redirects ne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18696 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18696: [SPARK-21490][core] Make sure SparkLauncher redirects ne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18696 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79848/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18696: [SPARK-21490][core] Make sure SparkLauncher redirects ne...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18696 **[Test build #79848 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79848/testReport)** for PR 18696 at commit [`7d3db5f`](https://github.com/apache/spark/commit/7d3db5fa1d7b0b6d1a9e247d5d6a223e4ef774df). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18691: [SPARK-21243][Core] Limit no. of map outputs in a shuffl...
Github user dhruve commented on the issue: https://github.com/apache/spark/pull/18691 Thanks @tgravescs Closing the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18691: [SPARK-21243][Core] Limit no. of map outputs in a shuffl...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18691 merged --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18704 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org