[GitHub] spark pull request: [SPARK-5266][Yarn]AM's numExecutorsFailed shou...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4061#issuecomment-70098530 [Test build #25601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25601/consoleFull) for PR 4061 at commit [`c0a3ec7`](https://github.com/apache/spark/commit/c0a3ec7937074d8a0b35cd3a7621d764b3d67431). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ExperimentalMethods protected[sql](sqlContext: SQLContext) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5266][Yarn]AM's numExecutorsFailed shou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4061#issuecomment-70098547 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25601/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5268] don't stop ExecutorBackend for ir...
GitHub user CodingCat opened a pull request: https://github.com/apache/spark/pull/4063 [SPARK-5268] don't stop ExecutorBackend for irrelevant DisassociatedEvent In CoarseGrainedExecutorBackend, we subscribe DisassociatedEvent in executor backend actor and exit the program upon receive such event... let's consider the following case The user may develop an Akka-based program which starts the actor with Spark's actor system and communicate with an external actor system (e.g. an Akka-based receiver in spark streaming which communicates with an external system) If the external actor system fails or disassociates with the actor within spark's system with purpose, we may receive DisassociatedEvent and the executor is restarted. This is not the expected behavior. This is a simple fix to check the event before making the quit decision You can merge this pull request into a Git repository by running: $ git pull https://github.com/CodingCat/spark SPARK-5268 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4063.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4063 commit 4a65793563d14a85b37b5f90fea52b377aec2d5c Author: CodingCat zhunans...@gmail.com Date: 2015-01-15T15:17:33Z check whether DisassociatedEvent is relevant before quit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3571#issuecomment-70131914 @jacek-lewandowski I can only review the code, you need a committer to be able to move forward. e.g. @andrewor14 @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3997#discussion_r23027089 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -449,6 +449,31 @@ class SparseVector( override def toString: String = (%s,%s,%s).format(size, indices.mkString([, ,, ]), values.mkString([, ,, ])) + override def equals(other: Any): Boolean = { +other match { + case v: SparseVector = { +if (this.size != v.size) { return false } +var k1 = 0 +var k2 = 0 +while (true) { + while (k1 this.values.size this.values(k1) == 0) k1 += 1 + while (k2 v.values.size v.values(k2) == 0) k2 += 1 + + if (k1 == this.values.size || k2 == v.values.size) { +return (k1 == this.values.size k2 == v.values.size) // check end alignment + } + if (this.indices(k1) != v.indices(k2) || this.values(k1) != v.values(k2)) { +return false + } + k1 += 1 + k2 += 1 +} +throw new Exception(unreachable) --- End diff -- I wondered to myself whether this could be simplified to not have `while (true)`, the dummy `Exception`, etc. The best I could do was with a helper function: ``` ... var k1 = nextNonzero(this.values, 0) var k2 = nextNonzero(v.values, 0) while (k1 this.values.size k2 v.values.size) { if (this.indices(k1) != v.indices(k2) || this.values(k1) != v.values(k2)) { return false } k1 = nextNonzero(this.values, k1 + 1) k2 = nextNonzero(v.values, k2 + 1) } return (k1 == this.values.size k2 == v.values.size) ... def nextNonzero(values: Array[Double], from: Int): Int = { var index = from while (index this.values.size this.values(index) == 0.0) index += 1 index } ``` I'm not sure it's better, just food for thought. So the idea would be to specialize `hashCode` as well, and also handle `DenseVector` right? and even remove the implementations in the parent? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3997#issuecomment-70127522 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25608/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3997#issuecomment-70127518 [Test build #25608 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25608/consoleFull) for PR 3997 at commit [`a6952c3`](https://github.com/apache/spark/commit/a6952c39532594e1f4eb1c2f764d528420320ea8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5012][MLLib][PySpark]Python API for Gau...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4059#issuecomment-70141761 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4286] Integrate external shuffle servic...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/3861#issuecomment-70131124 Adding you folks for review: @dragos @deanw @huitseeker @skyluc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5095] Support capping cores and launch ...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/4027#issuecomment-70131146 Adding you folks for review: @dragos @deanw @huitseeker @skyluc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4021#issuecomment-70117592 [Test build #25606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25606/consoleFull) for PR 4021 at commit [`18d62ec`](https://github.com/apache/spark/commit/18d62ec8906c4ea3fc8d753e889f36f87b539ef5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/3233#discussion_r23012898 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -762,46 +764,37 @@ object Client extends Logging { extraClassPath: Option[String] = None): Unit = { extraClassPath.foreach(addClasspathEntry(_, env)) addClasspathEntry(Environment.PWD.$(), env) - -// Normally the users app.jar is last in case conflicts with spark jars if (sparkConf.getBoolean(spark.yarn.user.classpath.first, false)) { - addUserClasspath(args, sparkConf, env) - addFileToClasspath(sparkJar(sparkConf), SPARK_JAR, env) - populateHadoopClasspath(conf, env) -} else { - addFileToClasspath(sparkJar(sparkConf), SPARK_JAR, env) - populateHadoopClasspath(conf, env) - addUserClasspath(args, sparkConf, env) + getUserClasspath(args, sparkConf).foreach { x = +addFileToClasspath(x, null, env) + } } - -// Append all jar files under the working directory to the classpath. -addClasspathEntry(Environment.PWD.$() + Path.SEPARATOR + *, env) --- End diff -- I agree, it would be good to keep consistent. I just wanted to make sure we didn't break anything by removal. It sounds like you tested all the things I can think of. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4943][SQL] Allow table name having dot ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4062#issuecomment-70102560 [Test build #25602 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25602/consoleFull) for PR 4062 at commit [`e0d1960`](https://github.com/apache/spark/commit/e0d19600204c3a54ca9a6a959ccaaa1c0d7bcdca). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/4021#issuecomment-70105302 I've updated the code to throw an exception in the error case you mentioned and I've reverted the file permission change. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3997#issuecomment-70107537 [Test build #25607 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25607/consoleFull) for PR 3997 at commit [`50abef3`](https://github.com/apache/spark/commit/50abef35ef4ccb4f4f037bb7d29c5200cc7ab7cb). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....
Github user hhbyyh commented on the pull request: https://github.com/apache/spark/pull/3997#issuecomment-70109130 Just send an update. I didn't use`sqdist` due to the performance concern. Since the original equals is actually a fail-fast comparison, yet `sqdist` will inevitably compute through the vectors even if the first element is different. The performance will be hard to accept for scenarios like doc2Vec over a large vocabulary. Current implementation is still based on the comparison for indices and values, just with the handling of the explicit 0. I gave some tests to the implementation and add a few ut. Any comment will be welcome! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4021#issuecomment-70117605 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25606/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3233#issuecomment-70094284 Nope your first comment answers it. Sorry I had read that a while ago but forgot about it. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...
Github user ilganeli commented on a diff in the pull request: https://github.com/apache/spark/pull/4021#discussion_r23018036 --- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala --- @@ -280,10 +281,12 @@ object AccumulatorParam { // TODO: The multi-thread support in accumulators is kind of lame; check // if there's a more intuitive way of doing it right private[spark] object Accumulators { - // TODO: Use soft references? = need to make readObject work properly then - val originals = Map[Long, Accumulable[_, _]]() - val localAccums = new ThreadLocal[Map[Long, Accumulable[_, _]]]() { -override protected def initialValue() = Map[Long, Accumulable[_, _]]() + // Store a WeakReference instead of a StrongReference because this way accumulators can be + // appropriately garbage collected during long-running jobs and release memory + type WeakAcc = WeakReference[Accumulable[_, _]] + val originals = Map[Long, WeakAcc]() + val localAccums = new ThreadLocal[Map[Long, WeakAcc]]() { --- End diff -- Hi Josh - are you suggesting to replace this snippet with a MapMaker just to simplify the initialization code? I believe the usage of either object would be the same - do you see a specific advantage to trying to use the MapMaker? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3997#issuecomment-70114222 [Test build #25608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25608/consoleFull) for PR 3997 at commit [`a6952c3`](https://github.com/apache/spark/commit/a6952c39532594e1f4eb1c2f764d528420320ea8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4063#issuecomment-70116742 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25605/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4943][SQL] Allow table name having dot ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4062#issuecomment-70102567 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25602/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4063#issuecomment-70103904 [Test build #25605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25605/consoleFull) for PR 4063 at commit [`a7654d0`](https://github.com/apache/spark/commit/a7654d08b97fb14a3a75622e179885ae26908ed9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4063#issuecomment-70113747 [Test build #25604 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25604/consoleFull) for PR 4063 at commit [`4a65793`](https://github.com/apache/spark/commit/4a65793563d14a85b37b5f90fea52b377aec2d5c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4063#issuecomment-70113758 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25604/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5263][SQL] `create table` DDL need to c...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4058#issuecomment-70115220 The semantics of temporary tables is that they can shadow existing persistent tables. This is by design. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4063#issuecomment-70116730 [Test build #25605 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25605/consoleFull) for PR 4063 at commit [`a7654d0`](https://github.com/apache/spark/commit/a7654d08b97fb14a3a75622e179885ae26908ed9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3571#issuecomment-70100458 [Test build #25603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25603/consoleFull) for PR 3571 at commit [`a703c9b`](https://github.com/apache/spark/commit/a703c9b58d23894ad92619c05ac4968445208373). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3997#issuecomment-70107705 [Test build #25607 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25607/consoleFull) for PR 3997 at commit [`50abef3`](https://github.com/apache/spark/commit/50abef35ef4ccb4f4f037bb7d29c5200cc7ab7cb). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4063#issuecomment-70101325 [Test build #25604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25604/consoleFull) for PR 4063 at commit [`4a65793`](https://github.com/apache/spark/commit/4a65793563d14a85b37b5f90fea52b377aec2d5c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3571#issuecomment-70112958 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25603/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3571#issuecomment-70112945 [Test build #25603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25603/consoleFull) for PR 3571 at commit [`a703c9b`](https://github.com/apache/spark/commit/a703c9b58d23894ad92619c05ac4968445208373). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5264][SQL] support `drop table` DDL com...
Github user OopsOutOfMemory commented on the pull request: https://github.com/apache/spark/pull/4060#issuecomment-70114354 Hi, @scwf @chenghao-intel Could u please review this. I modify it to have a same entry of logical plan. But: I have some questions: 1. dialect can not got by using `sqlContext.getConf(spark.sql.dialect)` in spark shell or in test suite. 2. sql package can not access hive package, so I add use `expose dialect as a function` in each Context. Any suggestions or better way to implement this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...
Github user jacek-lewandowski commented on the pull request: https://github.com/apache/spark/pull/3571#issuecomment-70101711 @vanzin can we move forward with this PR ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....
GitHub user hhbyyh reopened a pull request: https://github.com/apache/spark/pull/3997 [SPARK-5186] [MLLIB] Vector.equals and Vector.hashCode are very inefficient JIRA Issue: https://issues.apache.org/jira/browse/SPARK-5186 Currently SparseVector is using the inherited equals from Vector, which will create a full-size array for even the sparse vector. The pull request contains a specialized equals optimization that improves on both time and space. 1. The implementation will be consistent with the original. Especially it will keep equality comparison between SparseVector and DenseVector. 2. For the hash code, overriding it may generate some breaking change and we should do it in another PR. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hhbyyh/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3997.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3997 commit 574114433222f4adb95e1e98f5d96e72e733eb4d Author: Yuhao Yang yu...@yuhaodevbox.sh.intel.com Date: 2015-01-13T04:31:13Z Specialized equals for SparseVector commit f41b135ab0394e881bd03c87bb02aa77be61fb64 Author: Yuhao Yang hhb...@gmail.com Date: 2015-01-16T15:13:12Z iterative equals for sparse vector commit 50abef35ef4ccb4f4f037bb7d29c5200cc7ab7cb Author: Yuhao Yang hhb...@gmail.com Date: 2015-01-16T15:47:19Z fix ut for sparse vector with explicit 0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4021#issuecomment-70104769 [Test build #25606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25606/consoleFull) for PR 4021 at commit [`18d62ec`](https://github.com/apache/spark/commit/18d62ec8906c4ea3fc8d753e889f36f87b539ef5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-5111][SQL]HiveContext and Thriftserver ...
GitHub user zhzhan opened a pull request: https://github.com/apache/spark/pull/4064 [Spark-5111][SQL]HiveContext and Thriftserver cannot work in secure cluster beyond hadoop2.5 Hive0.13 cannot work with secure cluster in hadoop-2.5 and beyound. Due to java.lang.NoSuchFieldError: SASL_PROPS error. Need to backport some hive-0.14 fix into spark, since there is no effort to upgrade hive to 0.14 support in spark. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhzhan/spark spark5111 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4064.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4064 commit 3bf966c2f1bb913149a34176598a69041487cb88 Author: Zhan Zhang zhaz...@gmail.com Date: 2014-08-08T17:47:18Z test commit fc56b25ff62964f59b96d2db13b5c357ae1c2f2b Author: Zhan Zhang zhaz...@gmail.com Date: 2015-01-07T21:01:45Z squash all commits commit c6b57402d19557105bc2bb95978b5815d7e95907 Author: Zhan Zhang zhaz...@gmail.com Date: 2015-01-09T17:48:45Z hive secure cluster fix commit 456232c1ce29a7bff7f7d606764d5da00a478695 Author: Zhan Zhang zhaz...@gmail.com Date: 2015-01-09T21:57:54Z hive on secure cluster fix commit 6532a342ba85be0300c169ce81f671da7ea5dcb1 Author: Zhan Zhang zhaz...@gmail.com Date: 2015-01-15T19:53:36Z rebase --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5012][MLLib][PySpark]Python API for Gau...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4059#issuecomment-70142339 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25609/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3571#discussion_r23035825 --- Diff: core/src/main/scala/org/apache/spark/SSLOptions.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import java.io.File + +import scala.util.Try + +import com.typesafe.config.{Config, ConfigFactory, ConfigValueFactory} +import org.eclipse.jetty.util.ssl.SslContextFactory + +private[spark] case class SSLOptions( +enabled: Boolean = false, +keyStore: Option[File] = None, +keyStorePassword: Option[String] = None, +keyPassword: Option[String] = None, +trustStore: Option[File] = None, +trustStorePassword: Option[String] = None, +protocol: Option[String] = None, +enabledAlgorithms: Set[String] = Set.empty) { + + /** + * Creates a Jetty SSL context factory according to the SSL settings represented by this object. + */ + def createJettySslContextFactory(): Option[SslContextFactory] = { +if (enabled) { + val sslContextFactory = new SslContextFactory() + + keyStore.foreach(file = sslContextFactory.setKeyStorePath(file.getAbsolutePath)) + trustStore.foreach(file = sslContextFactory.setTrustStore(file.getAbsolutePath)) + keyStorePassword.foreach(sslContextFactory.setKeyStorePassword) + trustStorePassword.foreach(sslContextFactory.setTrustStorePassword) + keyPassword.foreach(sslContextFactory.setKeyManagerPassword) + protocol.foreach(sslContextFactory.setProtocol) + sslContextFactory.setIncludeCipherSuites(enabledAlgorithms.toSeq: _*) + + Some(sslContextFactory) +} else { + None +} + } + + /** + * Creates an Akka configuration object which contains all the SSL settings represented by this + * object. It can be used then to compose the ultimate Akka configuration. + */ + def createAkkaConfig: Option[Config] = { +import scala.collection.JavaConversions._ +if (enabled) { + Some(ConfigFactory.empty() +.withValue(akka.remote.netty.tcp.security.key-store, + ConfigValueFactory.fromAnyRef(keyStore.map(_.getAbsolutePath).getOrElse())) +.withValue(akka.remote.netty.tcp.security.key-store-password, + ConfigValueFactory.fromAnyRef(keyStorePassword.getOrElse())) +.withValue(akka.remote.netty.tcp.security.trust-store, + ConfigValueFactory.fromAnyRef(trustStore.map(_.getAbsolutePath).getOrElse())) +.withValue(akka.remote.netty.tcp.security.trust-store-password, + ConfigValueFactory.fromAnyRef(trustStorePassword.getOrElse())) +.withValue(akka.remote.netty.tcp.security.key-password, + ConfigValueFactory.fromAnyRef(keyPassword.getOrElse())) + .withValue(akka.remote.netty.tcp.security.random-number-generator, + ConfigValueFactory.fromAnyRef()) +.withValue(akka.remote.netty.tcp.security.protocol, + ConfigValueFactory.fromAnyRef(protocol.getOrElse())) +.withValue(akka.remote.netty.tcp.security.enabled-algorithms, + ConfigValueFactory.fromIterable(enabledAlgorithms.toSeq)) +.withValue(akka.remote.netty.tcp.enable-ssl, + ConfigValueFactory.fromAnyRef(true))) +} else { + None +} + } + + override def toString: String = sSSLOptions{enabled=$enabled, + + skeyStore=$keyStore, keyStorePassword=${keyStorePassword.map(_ = xxx)}, + + strustStore=$trustStore, trustStorePassword=${trustStorePassword.map(_ = xxx)}, + + sprotocol=$protocol, enabledAlgorithms=$enabledAlgorithms} + +} + +object SSLOptions extends Logging { + + /** + * Resolves SSLOptions settings from a given Spark configuration object at a given namespace. + * The parent directory of that location is used as a base directory to
[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3571#discussion_r23035899 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -18,7 +18,11 @@ package org.apache.spark import java.net.{Authenticator, PasswordAuthentication} +import java.security.KeyStore +import java.security.cert.X509Certificate +import javax.net.ssl._ --- End diff -- nit: `java.net` before `java.security` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3571#discussion_r23036580 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala --- @@ -21,6 +21,8 @@ import java.io.File import java.util.{List = JList} import java.util.Collections +import org.apache.spark.util.AkkaUtils --- End diff -- nit: spark imports come last --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5224] [PySpark] improve performance of ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4024 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3571#discussion_r23037152 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -523,10 +525,31 @@ private[spark] object Worker extends Logging { val securityMgr = new SecurityManager(conf) val (actorSystem, boundPort) = AkkaUtils.createActorSystem(systemName, host, port, conf = conf, securityManager = securityMgr) -val masterAkkaUrls = masterUrls.map(Master.toAkkaUrl) +val masterAkkaUrls = masterUrls.map(Master.toAkkaUrl(_, conf)) actorSystem.actorOf(Props(classOf[Worker], host, boundPort, webUiPort, cores, memory, masterAkkaUrls, systemName, actorName, workDir, conf, securityMgr), name = actorName) (actorSystem, boundPort) } + private[spark] def isUseLocalNodeSSLConfig(cmd: Command): Boolean = { +val pattern = \-Dspark\.ssl\.useNodeLocalConf\=(.+).r +val result = cmd.javaOpts.collectFirst { + case pattern(_result) = _result.toBoolean +} +result.getOrElse(false) + } + + private[spark] def maybeUpdateSSLSettings(cmd: Command, conf: SparkConf): Command = { +val prefix = spark.ssl. +val useLNCPrefix = spark.ssl.useNodeLocalConf +if (isUseLocalNodeSSLConfig(cmd)) { + val newJavaOpts = cmd.javaOpts + .filterNot(opt = opt.startsWith(s-D$prefix) !opt.startsWith(s-D$useLNCPrefix=)) ++ --- End diff -- nit: could you use `filter`? My brain gets into a knot trying to negate the condition here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3571#discussion_r23037199 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -523,10 +525,31 @@ private[spark] object Worker extends Logging { val securityMgr = new SecurityManager(conf) val (actorSystem, boundPort) = AkkaUtils.createActorSystem(systemName, host, port, conf = conf, securityManager = securityMgr) -val masterAkkaUrls = masterUrls.map(Master.toAkkaUrl) +val masterAkkaUrls = masterUrls.map(Master.toAkkaUrl(_, conf)) actorSystem.actorOf(Props(classOf[Worker], host, boundPort, webUiPort, cores, memory, masterAkkaUrls, systemName, actorName, workDir, conf, securityMgr), name = actorName) (actorSystem, boundPort) } + private[spark] def isUseLocalNodeSSLConfig(cmd: Command): Boolean = { +val pattern = \-Dspark\.ssl\.useNodeLocalConf\=(.+).r +val result = cmd.javaOpts.collectFirst { + case pattern(_result) = _result.toBoolean +} +result.getOrElse(false) + } + + private[spark] def maybeUpdateSSLSettings(cmd: Command, conf: SparkConf): Command = { +val prefix = spark.ssl. +val useLNCPrefix = spark.ssl.useNodeLocalConf --- End diff -- wait: is this a prefix at all? or is it a single config? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3571#discussion_r23036539 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/SimrSchedulerBackend.scala --- @@ -18,6 +18,7 @@ package org.apache.spark.scheduler.cluster import org.apache.hadoop.fs.{Path, FileSystem} +import org.apache.spark.util.AkkaUtils --- End diff -- nit: group with other spark imports. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4047#issuecomment-70146798 @EntilZha Here’s a sketch of my plan. Datasets: * UCI ML Repository data (also used by Asuncion et al., 2009): * KOS * NIPS * NYTimes * PubMed (full) * Wikipedia? Data preparation: * Converting to bags of words: * UCI datasets are given as word counts already. * Wikipedia dump is text. * I use the SimpleTokenizer in the LDAExample, which sets term = word and only accepts alphabetic characters. * Use stopwords from @dlwh located at [https://github.com/dlwh/spark/feature/lda] * No stemming * Choosing vocab: For various vocabSize settings, I took the most common vocabSize terms. Scaling tests: *(doing these first)* * corpus size * vocabSize * k * numIterations Accuracy tests: *(doing these second)* * train on full datasets * Tune hyperparameters via grid search, following Asuncion et al. (2009) section 4.1. * Can hopefully compare with their results in Fig. 5. These tests will run on a 16-node EC2 cluster of r3.2xlarge instances. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4047#issuecomment-70146828 @witgo I agree that there are 2 different use regimes for LDA: interpretable topics and featurization. The current implementation follows pretty much every other graph-based implementation I’ve seen: * 1 vertex per document + 1 vertex per term * Each vertex stores a vector of length # topics. * On each iteration, each doc vertex must communicate its vector to any connected term vertices (and likewise for term vertices), via map-reduce stages over triplets. I have not heard of methods which can avoid this amount of communication for LDA. I’m sure the implementation can be optimized, so please make comments here or JIRAs afterwards about that. For modified models, it might be possible to communicate less: sparsity-inducing priors, hierarchical models, etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3779#issuecomment-70149578 @kayousterhout Could you take a look at this. This is priority for 1.3 :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5012][MLLib][PySpark]Python API for Gau...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4059#issuecomment-70142336 [Test build #25609 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25609/consoleFull) for PR 4059 at commit [`5c83825`](https://github.com/apache/spark/commit/5c83825c570b4ee1357021ec25a1a35a09a633e7). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class GaussianMixtureModel(object):` * `class GaussianMixtureEM(object):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5224] [PySpark] improve performance of ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4024#issuecomment-70147787 LGTM, so I'm going to merge this into `master` (1.3.0) and `branch-1.2` (1.2.1). Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...
Github user jacek-lewandowski commented on a diff in the pull request: https://github.com/apache/spark/pull/3571#discussion_r23038107 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -523,10 +525,31 @@ private[spark] object Worker extends Logging { val securityMgr = new SecurityManager(conf) val (actorSystem, boundPort) = AkkaUtils.createActorSystem(systemName, host, port, conf = conf, securityManager = securityMgr) -val masterAkkaUrls = masterUrls.map(Master.toAkkaUrl) +val masterAkkaUrls = masterUrls.map(Master.toAkkaUrl(_, conf)) actorSystem.actorOf(Props(classOf[Worker], host, boundPort, webUiPort, cores, memory, masterAkkaUrls, systemName, actorName, workDir, conf, securityMgr), name = actorName) (actorSystem, boundPort) } + private[spark] def isUseLocalNodeSSLConfig(cmd: Command): Boolean = { +val pattern = \-Dspark\.ssl\.useNodeLocalConf\=(.+).r +val result = cmd.javaOpts.collectFirst { + case pattern(_result) = _result.toBoolean +} +result.getOrElse(false) + } + + private[spark] def maybeUpdateSSLSettings(cmd: Command, conf: SparkConf): Command = { +val prefix = spark.ssl. +val useLNCPrefix = spark.ssl.useNodeLocalConf --- End diff -- Actually it is not a prefix - renamed the constant --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5012][MLLib][PySpark]Python API for Gau...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4059#issuecomment-70142185 [Test build #25609 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25609/consoleFull) for PR 4059 at commit [`5c83825`](https://github.com/apache/spark/commit/5c83825c570b4ee1357021ec25a1a35a09a633e7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4504][Examples] fix run-example failure...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3377#issuecomment-70147153 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3571#discussion_r23036715 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -347,10 +347,10 @@ private[spark] class Worker( }.toSeq } appDirectories(appId) = appLocalDirs - - val manager = new ExecutorRunner(appId, execId, appDesc, cores_, memory_, -self, workerId, host, sparkHome, executorDir, akkaUrl, conf, appLocalDirs, -ExecutorState.LOADING) + val manager = new ExecutorRunner(appId, execId, +appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)), --- End diff -- Hmmm... why do you need the copy? A quick overlook of `ExecutorRunner` doesn't seem to indicate it modifies this object in any way... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4920][UI]: back port the PR-3763 to bra...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3768#issuecomment-70147186 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r23050901 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -827,9 +868,21 @@ class DAGScheduler( // might modify state of objects referenced in their closures. This is necessary in Hadoop // where the JobConf/Configuration object is not thread-safe. var taskBinary: Broadcast[Array[Byte]] = null + +// Check if RDD serialization debugging is enabled +val debugSerialization: Boolean = sc.getConf.getBoolean(spark.serializer.debug, false) --- End diff -- Ah I see - this does that already. Yeah so I'd just remove the config option and just always print debugging output if there is a failure. We usually try not to add config options unless there is a really compelling reason to not have the feature enabled. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4063#issuecomment-70181033 Rather than doing this one by one, can't we change the common class ActorLogReceive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4923][REPL] Add Developer API to REPL t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4034#issuecomment-70181623 [Test build #25617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25617/consoleFull) for PR 4034 at commit [`6dc1ee2`](https://github.com/apache/spark/commit/6dc1ee2b9ec589ceb2ade3454c3dbaf0697a09b4). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class SparkILoop(` * ` * @param id The id (variable name, method name, class name, etc) whose` * ` * Retrieves the class representing the id (variable name, method name,` * ` * @param id The id (variable name, method name, class name, etc) whose` * ` * @return Some containing term name (id) class if exists, else None` * ` * @param id The id (variable name, method name, class name, etc) whose` * ` * @param id The id (variable name, method name, class name, etc) whose` * ` * Retrieves the runtime class and type representing the id (variable name,` * ` * @param id The id (variable name, method name, class name, etc) whose` * ` * @param id The id (variable name, method name, class name, etc) whose` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5274][SQL] Reconcile Java and Scala UDF...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4056 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4020#issuecomment-70189111 [Test build #25621 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25621/consoleFull) for PR 4020 at commit [`e446287`](https://github.com/apache/spark/commit/e446287b866eedeb74e68c2f800acf29250d2a76). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4020#issuecomment-70189173 Had a look over, and this mostly looks good, but it looks like there are many places where the patch replaces assigning with incrementing. It would be good to take a close look and pull all these out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5274][SQL] Reconcile Java and Scala UDF...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4056#issuecomment-70176785 [Test build #25618 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25618/consoleFull) for PR 4056 at commit [`ae9c556`](https://github.com/apache/spark/commit/ae9c556d91a58f41098b40b3e10842570e4b3278). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r23051146 --- Diff: core/src/main/scala/org/apache/spark/util/ObjectWalker.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util + +import java.lang.reflect.{Modifier, Field} + +import com.google.common.collect.Queues + +import scala.collection.mutable + + +/** + * This class permits traversing a generic Object's reference graph. This is useful for debugging + * serialization errors. See SPARK-3694. + * + * This code is based on code written by Josh Rosen found here: + * https://gist.github.com/JoshRosen/d6a8972c2e97d040 + */ +object ObjectWalker { + def isTransient(field: Field): Boolean = Modifier.isTransient(field.getModifiers) + def isStatic(field: Field): Boolean = Modifier.isStatic(field.getModifiers) + def isPrimitive(field: Field): Boolean = field.getType.isPrimitive + + /** + * Traverse the graph representing all references between the provided root object, its + * members, and their references in turn. + * + * What we want to be able to do is readily identify un-serializable components AND the path + * to those components. To do this, store the traversal of the graph as a 2-tuple - the actual + * reference visited and its parent. Then, to get the path to the un-serializable reference + * we can simply follow the parent links. + * + * @param rootObj - The root object for which to generate the reference graph + * @return a new Set containing the 2-tuple of references from the traversal of the + * reference graph along with their parent references. (self, parent) + */ + def buildRefGraph(rootObj: AnyRef): mutable.LinkedList[AnyRef] = { +val visitedRefs = mutable.Set[AnyRef]() +val toVisit = Queues.newArrayDeque[AnyRef]() +var results = mutable.LinkedList[AnyRef]() + +toVisit.add(rootObj) + +while (!toVisit.isEmpty) { + val obj : AnyRef = toVisit.pollFirst() + // Store the last parent reference to enable quick retrieval of the path to a broken node + + if (!visitedRefs.contains(obj)) { +results = mutable.LinkedList(obj).append(results) +visitedRefs.add(obj) + +// Extract all the fields from the object that would be serialized. Transient and +// static references are not serialized and primitive variables will always be serializable +// and will not contain further references. + +for (field - getAllFields(obj.getClass) --- End diff -- could you pull this expression out into it's own variable `val fieldsToTest = getAllFields(...)`. We try not to nest expressions like this to make the code more readable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70180151 Hi Patrick - thanks for the feedback. I would love to print out the names of the fields but I wasn't able to figure out a way to do that - could you suggest how? I wasn't sure if printing the hash code was useful or not, Josh included it in his original example of a traversal so I figured I'd leave it in. I didn't know if there would be a way to look it up post-facto. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4020#issuecomment-70180180 [Test build #25619 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25619/consoleFull) for PR 4020 at commit [`6444391`](https://github.com/apache/spark/commit/644439144dba2f1a2c0cac29da16a0fc7a52b109). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5249] Added type specific set functions...
Github user AdamGS commented on the pull request: https://github.com/apache/spark/pull/4042#issuecomment-70183233 @pwendell, will just adding the new set (and setIfMissing) methods work? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...
Github user ilganeli commented on a diff in the pull request: https://github.com/apache/spark/pull/4020#discussion_r23056056 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -257,8 +257,8 @@ private[spark] class Executor( val serviceTime = System.currentTimeMillis() - taskStart val metrics = attemptedTask.flatMap(t = t.metrics) for (m - metrics) { -m.executorRunTime = serviceTime -m.jvmGCTime = gcTime - startGCTime +m.incExecutorRunTime(serviceTime) --- End diff -- I'm not sure whether the original behavior is necessarily correct. If the goal is to track total run time for the task, why does it make sense to do an assignment anywhere instead of an accumulation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/4063#issuecomment-70189607 good point, how about the current one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r23050776 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -789,6 +792,44 @@ class DAGScheduler( } } + /** + * Helper function to check whether an RDD and its dependencies are serializable. + * + * This hook is exposed here primarily for testing purposes. + * + * Note: This function is defined separately from the SerializationHelper.isSerializable() + * since DAGScheduler.isSerializable() is passed as a parameter to the RDDWalker class's graph + * traversal, which would otherwise require knowledge of the closureSerializer + * (which was undesirable). + * + * @param rdd - Rdd to attempt to serialize + * @return Array[SerializedRdd] - + * Return an array of Either objects indicating if serialization is successful. + * Each object represents the RDD or a dependency of the RDD + * Success: ByteBuffer - The serialized RDD + * Failure: String - The reason for the failure. + * + */ + def tryToSerializeRddDeps(rdd: RDD[_]): Array[RDDTrace] = { --- End diff -- I think initially it might be good to keep this private and just expose it as an internal utility that is triggered when we actually see serialization issues. Once we get some more experience with it in practice we can open up a debugging API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70179454 Hey just took a quick pass with some code style suggestions (more coming) and usability suggestions. One thing, would it be possible to track the name of the fields you are traversing? This would make the debugging output more useful. Also, is there a good reason to print the hash code? How would users use that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4746 make it easy to skip IntegrationTes...
Github user squito commented on the pull request: https://github.com/apache/spark/pull/4048#issuecomment-70181484 so, this doesn't actually work quite the way I wanted it to. It turns out its skipping all the Junit tests as well. The junit tests are run if you run with `test-only * -- -l`, but as sound as you add a tag like `test-only * -- -l foo`, then all the junit tests are skipped. From the [junit-interface docs](https://github.com/sbt/junit-interface) Any parameter not starting with - or + is treated as a glob pattern for matching tests. I will look into a solution for this, but I have a feeling this might mean we can't mix junit w/ the tagging approach, and we have to go to a more standard directory / filenaming approach to separating out integration tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4923][REPL] Add Developer API to REPL t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4034#issuecomment-70181635 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25617/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/4063#issuecomment-70183535 hmmm...I'm not sure if we really can do that, as Scala doesn't support super.method naturally I checked the actors in other components (master, worker and CoarseGrainedSchedulerBackend), they are just fine... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5274][SQL] Reconcile Java and Scala UDF...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4056#issuecomment-70184276 [Test build #25616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25616/consoleFull) for PR 4056 at commit [`675a3c9`](https://github.com/apache/spark/commit/675a3c985b9b65f1a818ec6756a215d9ef7b2246). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class UDFRegistration (sqlContext: SQLContext) extends org.apache.spark.Logging ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5274][SQL] Reconcile Java and Scala UDF...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4056#issuecomment-70184289 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25616/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5274][SQL] Reconcile Java and Scala UDF...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4056#issuecomment-70187936 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r23051221 --- Diff: core/src/main/scala/org/apache/spark/util/ObjectWalker.scala --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util + +import java.lang.reflect.{Modifier, Field} + +import com.google.common.collect.Queues --- End diff -- Does scala have a queue you can use here instead of using the google libraries? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r23051282 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -459,7 +459,23 @@ private[spark] class TaskSetManager( } // Serialize and return the task val startTime = clock.getTime() + val serializedTask: ByteBuffer = try { +// We rely on the DAGScheduler to catch non-serializable closures and RDDs, so in here +// we assume the task can be serialized without exceptions. + +// Check if serialization debugging is enabled +val debugSerialization: Boolean = sched.sc.getConf. + getBoolean(spark.serializer.debug, false) + +if (debugSerialization) { + SerializationHelper.tryToSerialize(ser, task).fold ( --- End diff -- We should make sure this catches any exceptions thrown by the serialization utility itself and in that case just say that we couldn't produce debugging output. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4923][REPL] Add Developer API to REPL t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4034#issuecomment-70172402 [Test build #25617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25617/consoleFull) for PR 4034 at commit [`6dc1ee2`](https://github.com/apache/spark/commit/6dc1ee2b9ec589ceb2ade3454c3dbaf0697a09b4). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r23051021 --- Diff: core/src/main/scala/org/apache/spark/util/SerializationHelper.scala --- @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util + +import java.io.NotSerializableException +import java.nio.ByteBuffer + +import scala.collection.mutable +import scala.collection.mutable.HashMap +import scala.util.control.NonFatal + +import org.apache.spark.rdd.RDD +import org.apache.spark.scheduler.Task +import org.apache.spark.serializer.SerializerInstance + +/** + * This enumeration defines variables use to standardize debugging output + */ +object SerializationState extends Enumeration { --- End diff -- Could you make this and all classes you expose in this pr `private[spark]`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user ilganeli commented on a diff in the pull request: https://github.com/apache/spark/pull/3518#discussion_r23052373 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -789,6 +792,44 @@ class DAGScheduler( } } + /** + * Helper function to check whether an RDD and its dependencies are serializable. + * + * This hook is exposed here primarily for testing purposes. + * + * Note: This function is defined separately from the SerializationHelper.isSerializable() + * since DAGScheduler.isSerializable() is passed as a parameter to the RDDWalker class's graph + * traversal, which would otherwise require knowledge of the closureSerializer + * (which was undesirable). + * + * @param rdd - Rdd to attempt to serialize + * @return Array[SerializedRdd] - + * Return an array of Either objects indicating if serialization is successful. + * Each object represents the RDD or a dependency of the RDD + * Success: ByteBuffer - The serialized RDD + * Failure: String - The reason for the failure. + * + */ + def tryToSerializeRddDeps(rdd: RDD[_]): Array[RDDTrace] = { --- End diff -- I can make this private[spark] but when I say testing purposes, I mean that it's used within the DAGSchedulerSuite so it needs to be public (at least within Spark). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4020#issuecomment-70186438 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25619/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4063#issuecomment-70186533 Can't we just intercept the message and only call receiveWithLogging on it if it is the proper one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4020#issuecomment-70186432 [Test build #25619 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25619/consoleFull) for PR 4020 at commit [`6444391`](https://github.com/apache/spark/commit/644439144dba2f1a2c0cac29da16a0fc7a52b109). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5274][SQL] Reconcile Java and Scala UDF...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4056#issuecomment-70188254 [Test build #25618 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25618/consoleFull) for PR 4056 at commit [`ae9c556`](https://github.com/apache/spark/commit/ae9c556d91a58f41098b40b3e10842570e4b3278). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class UDFRegistration (sqlContext: SQLContext) extends org.apache.spark.Logging ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5274][SQL] Reconcile Java and Scala UDF...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4056#issuecomment-70188261 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25618/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70188638 [Test build #25620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25620/consoleFull) for PR 3518 at commit [`1d2d563`](https://github.com/apache/spark/commit/1d2d563c04a7cfb302ccacf42fcfdc8b488a3a61). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70188734 [Test build #25620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25620/consoleFull) for PR 3518 at commit [`1d2d563`](https://github.com/apache/spark/commit/1d2d563c04a7cfb302ccacf42fcfdc8b488a3a61). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-70188737 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25620/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4020#discussion_r23055589 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -257,8 +257,8 @@ private[spark] class Executor( val serviceTime = System.currentTimeMillis() - taskStart val metrics = attemptedTask.flatMap(t = t.metrics) for (m - metrics) { -m.executorRunTime = serviceTime -m.jvmGCTime = gcTime - startGCTime +m.incExecutorRunTime(serviceTime) --- End diff -- will this replace `=` with `+=`? This applies in a couple places above as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5268] don't stop CoarseGrainedExecutorB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4063#issuecomment-70189998 [Test build #25622 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25622/consoleFull) for PR 4063 at commit [`4ed522c`](https://github.com/apache/spark/commit/4ed522c9c101573ee8eac7b8ab3206504cc8aabf). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4859][Core][Streaming] Refactor LiveLis...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4006#issuecomment-70206158 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25631/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4859][Core][Streaming] Refactor LiveLis...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4006#issuecomment-70206157 [Test build #25631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25631/consoleFull) for PR 4006 at commit [`0710364`](https://github.com/apache/spark/commit/0710364818d9c1338188d89fa522316d84482ec4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4066#issuecomment-70206625 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25629/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4879] [WIP] Use driver to coordinate Ha...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4066#discussion_r23062798 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -105,10 +107,20 @@ class SparkHadoopWriter(@transient jobConf: JobConf) def commit() { val taCtxt = getTaskContext() val cmtr = getOutputCommitter() +val dagSchedulerActor = + AkkaUtils.makeDriverRef(DAGScheduler, SparkEnv.get.conf, SparkEnv.get.actorSystem) +val askTimeout = AkkaUtils.askTimeout(SparkEnv.get.conf) if (cmtr.needsTaskCommit(taCtxt)) { try { -cmtr.commitTask(taCtxt) -logInfo (taID + : Committed) +val canCommit: Boolean = AkkaUtils.askWithReply( + AskPermissionToCommitOutput(jobID, splitID, attemptID), dagSchedulerActor, askTimeout) +if (canCommit) { + cmtr.commitTask(taCtxt) + logInfo (s$taID: Committed) +} else { + logInfo (s$taID: Not committed because DAGScheduler did not authorize commit) +} + } catch { case e: IOException = { logError(Error committing the output of task: + taID.value, e) --- End diff -- I guess we need to catch TimeoutException here, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5193][SQL] Remove Spark SQL Java-specif...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4065#issuecomment-70208071 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25632/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5193][SQL] Remove Spark SQL Java-specif...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4065#issuecomment-70208067 [Test build #25632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25632/consoleFull) for PR 4065 at commit [`500d2c4`](https://github.com/apache/spark/commit/500d2c4ee388dfc508d0c810d0402e1791441cb0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4777][CORE] Some block memory after unr...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3629#issuecomment-70208700 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25633/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4777][CORE] Some block memory after unr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3629#issuecomment-70208695 [Test build #25633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25633/consoleFull) for PR 3629 at commit [`f0e80f2`](https://github.com/apache/spark/commit/f0e80f29713615c60674998bba6cfbc39f120891). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3732#issuecomment-70208781 Adrian - as we spoke offline, it would be simpler (for future datetime related features) to just represent the Date type as a primitive int internally, and convert to java.sql.Date when we give it back to the user. You can create a DateTimeUtils class to implement common functionalities such as conversion between strings and int date. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org