[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...

2015-02-03 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3779#discussion_r24066157 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -506,13 +506,59 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...

2015-02-03 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3779#discussion_r24066349 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -506,13 +506,59 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...

2015-02-03 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3779#discussion_r24066137 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -506,13 +506,59 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...

2015-02-03 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3779#discussion_r24066698 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -506,13 +506,59 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...

2015-02-03 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3779#discussion_r24067073 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -506,13 +506,63 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...

2015-02-03 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3779#discussion_r24048516 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -506,13 +506,59 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...

2015-02-03 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3779#discussion_r24049293 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -506,13 +506,59 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: Disabling Utils.chmod700 for Windows

2015-01-31 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4299#discussion_r23892404 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -255,12 +255,17 @@ private[spark] object Utils extends Logging { * @return

[GitHub] spark pull request: [Build] Set all Debian package permissions to ...

2015-01-29 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/4277#issuecomment-72114226 This one is fairly innocuous, but in common with other issue with the Debian packaging within Spark, we don't have really good answers as to what is the right thing

[GitHub] spark pull request: Spark 3789

2015-01-26 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/4205#issuecomment-71501211 Yes, +1 to Sandy's request. In general, the JIRA should explain *why* a change is necessary or advisable, the description of the PR should explain *what

[GitHub] spark pull request: [SQL] SPARK-5309: Add support for dictionaries...

2015-01-24 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4187#discussion_r23501411 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetConverter.scala --- @@ -426,6 +423,33 @@ private[parquet] class

[GitHub] spark pull request: [SPARK-5374][CORE] abstract RDD's DAG graph it...

2015-01-23 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/4134#issuecomment-71214324 I'll take a deeper look over the weekend, but on a first pass I had a similar reaction to @rxin -- I'm not seeing a lot of benefit in terms of code clarity

[GitHub] spark pull request: [SPARK-5355] make SparkConf thread-safe

2015-01-21 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4143#discussion_r23334671 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -46,7 +46,7 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable

[GitHub] spark pull request: [SPARK-5214][Core] Add EventLoop and change DA...

2015-01-13 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/4016#issuecomment-69880382 Yes, of course, we can do this -- it's more-or-less going back to what we had before: https://github.com/apache/spark/commit/2539c0674501432fb62073577db6da52a26db850

[GitHub] spark pull request: [SPARK-5174][SPARK-5175] provide more APIs in ...

2015-01-09 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3984#discussion_r22756612 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala --- @@ -149,43 +181,61 @@ private[streaming] class

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2015-01-05 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68721665 @liyezhang556520 That's been done already in the DAGScheduler. If we need another level of supervision for Master or other actors, we should consider whether

[GitHub] spark pull request: SPARK-5059 - create README.md

2015-01-04 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3892#issuecomment-68641442 Duplicating documentation is a good way for things to get out of sync. If we really need a README.md here, I'd prefer that it be little more than a link to where

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2015-01-04 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68650323 @JoshRosen your thinking is that Master will be in good shape even though an exception has been thrown? If you can guarantee that, then resuming the actor while

[GitHub] spark pull request: SPARK-3655 GroupByKeyAndSortValues

2015-01-02 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3632#discussion_r22420650 --- Diff: core/src/main/scala/org/apache/spark/util/Ordering.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: SPARK-3655 GroupByKeyAndSortValues

2015-01-02 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3632#discussion_r22422723 --- Diff: core/src/main/scala/org/apache/spark/util/Ordering.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: SPARK-3655 GroupByKeyAndSortValues

2015-01-02 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3632#discussion_r22421802 --- Diff: core/src/main/scala/org/apache/spark/util/Ordering.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: SPARK-3655 GroupByKeyAndSortValues

2015-01-02 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3632#discussion_r22422719 --- Diff: core/src/main/scala/org/apache/spark/util/Ordering.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2014-12-31 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3825#issuecomment-68445280 It doesn't seem to me that usage of the newer Akka persistence API is called for, but it does seem that wrapping the `receive` in a try-catch is trying to do the job

[GitHub] spark pull request: [SPARK-4961] [CORE] Put HadoopRDD.getPartition...

2014-12-30 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3794#issuecomment-68385773 @JoshRosen If you've given it a look over and don't see a conflict, then you are probably right about the orthogonality. I was asking out of caution rather than

[GitHub] spark pull request: [SPARK-4961] [CORE] Put HadoopRDD.getPartition...

2014-12-29 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3794#discussion_r22336168 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -203,9 +204,27 @@ class HadoopRDD[K, V]( for (i - 0 until

[GitHub] spark pull request: [SPARK-4417] New API: sample RDD to fixed numb...

2014-12-28 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3723#issuecomment-68216759 My biggest problem with this is that, while the existing `sample` is an action, `sampleByCount` is another one of those unholy beasts that is neither an action nor

[GitHub] spark pull request: [SPARK-3955] Different versions between jackso...

2014-12-27 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3716#issuecomment-68180192 Whoa there! You've now got the dependency for jackson-mapper-asl twice and for jackson-core-asl not at all. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-4723] [CORE] To abort the stages which ...

2014-12-24 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3786#issuecomment-68051186 I don't like the approach of saying for some reason, something happens and then putting in a patch to address what happens instead of identifying and correcting

[GitHub] spark pull request: SPARK-4454 Fix race condition in DAGScheduler

2014-12-24 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3345#issuecomment-68068199 Ah yes, I see now. Thanks for coming back to this one, Josh. `DAGScheduler#getPreferredLocs` is definitely broken. You're correct that the access

[GitHub] spark pull request: SPARK-3655 GroupByKeyAndSortValues

2014-12-24 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3632#issuecomment-68069421 The reason for separate classes is to cleanly segregate the available/supportable functionality. Not every `PairRDD` has keys that can be ordered, so `sortByKey

[GitHub] spark pull request: SPARK-3655 GroupByKeyAndSortValues

2014-12-24 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3632#issuecomment-68082299 @koertkuipers Don't get me wrong, I'm not arguing that the way that `PairRDDFuntions` and `OrderedRDDFunctions` work is objectively the best and unquestionably

[GitHub] spark pull request: [SPARK-4834] [standalone] Clean up application...

2014-12-22 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3705#issuecomment-67912545 @vanzin You need to read some more about Akka actors. The fundamental abstraction is that an actor has a message queue, the only way to communicate

[GitHub] spark pull request: [SPARK-4871][SQL] Show sql statement in spark ...

2014-12-20 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3718#issuecomment-67750241 That wrapping in the UI is barely tolerable for what are still fairly modest-length queries. We're really going to need some kind of elided query in the main Jobs

[GitHub] spark pull request: SPARK-3655 GroupByKeyAndSortValues

2014-12-20 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3632#issuecomment-67757353 On a first pass, this doesn't look right. If you are providing additional methods that should be available for `RDD[(K, V)]` where there is an `Ordering` available

[GitHub] spark pull request: [SPARK-4498][core] Don't transition ExecutorIn...

2014-12-02 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3550#issuecomment-65194915 The application won't be killed if an executor has been recognized by master as RUNNING (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache

[GitHub] spark pull request: [SPARK-4498][core] Don't transition ExecutorIn...

2014-12-02 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3550#issuecomment-65269257 It's worth spending a little time checking that any executors that are RUNNING for an application definitely will transition to a Finished state and be removed from

[GitHub] spark pull request: [SPARK-4498][SPARK-2424] [WIP] Add driver - m...

2014-12-02 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3548#issuecomment-65304935 https://github.com/apache/spark/pull/3550 doesn't address SPARK-2424; so if we want to handle that issue in 1.2, then we still need a PR it. --- If your project

[GitHub] spark pull request: [SPARK-4498][core] Don't transition ExecutorIn...

2014-12-01 Thread markhamstra
GitHub user markhamstra opened a pull request: https://github.com/apache/spark/pull/3550 [SPARK-4498][core] Don't transition ExecutorInfo to RUNNING until Driver adds Executor The ExecutorInfo only reaches the RUNNING state if the Driver is alive to send the ExecutorStateChanged

[GitHub] spark pull request: [SPARK-4498][SPARK-2424] [WIP] Add driver - m...

2014-12-01 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3548#issuecomment-65193703 Seems needlessly complicated to me. I'm still doing tests, but it seems to me that all that is required is https://github.com/apache/spark/pull/3550 --- If your

[GitHub] spark pull request: [SPARK-4654] Clean up DAGScheduler getMissingP...

2014-11-29 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3515#discussion_r21056495 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -401,7 +370,7 @@ class DAGScheduler( val s = stages.head

[GitHub] spark pull request: [SPARK-4626] Kill a task only if the executorI...

2014-11-27 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3483#discussion_r21007543 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -127,7 +127,13 @@ class

[GitHub] spark pull request: [SPARK-4626] Kill a task only if the executorI...

2014-11-27 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3483#discussion_r21009879 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -127,7 +127,13 @@ class

[GitHub] spark pull request: [SPARK-4626] Kill a task only if the executorI...

2014-11-27 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3483#issuecomment-64837465 Separating style from substance and making purely stylistic PRs is not really a good idea. That only serves to complicate the revision history and make

[GitHub] spark pull request: [BUILD] Maven with zinc support, this script w...

2014-11-19 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3364#issuecomment-63665832 @srowen Looks like it would install the tarball in spark/mvn/target, but not actually run anything if it detects an already running instance of zinc; so it shouldn't

[GitHub] spark pull request: SPARK-4454 Fix race condition in DAGScheduler

2014-11-18 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3345#issuecomment-63517904 Where is the race condition? `cacheLocs` is state local to the DAGScheduler and should only be used within the actions of DAGSchedulerEventProcessActor, which

[GitHub] spark pull request: [SPARK-4145] Web UI job pages

2014-11-17 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3009#issuecomment-63426293 There are several alternative approaches, and they largely differentiate on who is the intended consumer of the progress bar information. For example, a system

[GitHub] spark pull request: [SPARK-4436][SPARK-3624][BUILD] Debian packagi...

2014-11-16 Thread markhamstra
GitHub user markhamstra opened a pull request: https://github.com/apache/spark/pull/3297 [SPARK-4436][SPARK-3624][BUILD] Debian packaging fixes This makes the Spark jar as well as the datanucleus jars (and anything else in lib_managed/jars) available in the debian package at spark

[GitHub] spark pull request: [WIP][SPARK-4428][BUILD] Use ${scala.binary.ve...

2014-11-15 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3285#issuecomment-63175889 @srowen is correct, this kind of parameterized artifactId has already been considered but is not permissible. Please close this PR. --- If your project is set up

[GitHub] spark pull request: [WIP][SPARK-4428][BUILD] Use ${scala.binary.ve...

2014-11-15 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/3285#issuecomment-63181339 That was quite awhile back and would require some searching of the mail archives and repo from when Spark was in the Apache incubator. I did what you did soon after

[GitHub] spark pull request: [Build] SPARK-3624: Failed to find Spark assem...

2014-11-11 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/2477#issuecomment-62570870 No, we should definitely do something to make the Debian package functional again. The question is whether we are concerned enough about backward compatibility

[GitHub] spark pull request: [SPARK-4145] Web UI job pages

2014-11-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3009#discussion_r20174342 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-4318][SQL] Fix empty sum distinct.

2014-11-10 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/3184#discussion_r20125624 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DslQuerySuite.scala --- @@ -195,6 +195,18 @@ class DslQuerySuite extends QueryTest

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-20 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/2828#issuecomment-59810037 @CodingCat, Worker is private[spark], so what is the nature of your concern? In fact, I'm wondering whether we really want the changes in this PR that make some

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-20 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/2828#issuecomment-59811803 A legitimate concern, and certainly something that could be worked up into a JIRA issue and separate pull request. But it's not a very pressing issue since nothing

[GitHub] spark pull request: [SPARK-3944][Core] Code re-factored as suggest...

2014-10-16 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/2810#issuecomment-59400293 Yup, LGTM. And as a general rule for the future, avoid pattern matching on Some and None. In most cases you should instead use a map, flatMap or foreach over

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-16 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/2828#discussion_r18985880 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -243,6 +249,10 @@ private[spark] class Worker( System.exit

[GitHub] spark pull request: [SPARK-3944][Core] Code re-factored as suggest...

2014-10-15 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/2810#issuecomment-59223588 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-3944][Core] Using Option[String] where ...

2014-10-14 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/2795#discussion_r18861574 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -340,8 +340,8 @@ private[spark] object Utils extends Logging { val

[GitHub] spark pull request: [Build] SPARK-3624: Failed to find Spark assem...

2014-10-07 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/2477#issuecomment-58293963 This certainly works, but I'm not sure that we need to maintain the complexity of having both a `jars` directory and a `lib` symlink to it. What we want

[GitHub] spark pull request: [SPARK-732][SPARK-3628][CORE][RESUBMIT] make i...

2014-09-29 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/2524#discussion_r18193968 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -112,6 +112,10 @@ class DAGScheduler( // stray messages

[GitHub] spark pull request: [SPARK-3613] Record only average block size in...

2014-09-28 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/2470#discussion_r18136843 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -24,22 +24,123 @@ import org.apache.spark.storage.BlockManagerId

[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...

2014-09-27 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-57043930 Have either of you thought about how to coordinate this with Josh's work on SPARK-3626? https://github.com/apache/spark/pull/2482 --- If your project is set up

[GitHub] spark pull request: [SPARK-3626] [WIP] Replace AsyncRDDActions wit...

2014-09-21 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/2482#issuecomment-56310103 +1 @rxin Just scanned through the code quickly, and I didn't immediately see anything that would preclude retaining and deprecating the old code while

[GitHub] spark pull request: SPARK-3580: New public method for RDD's to hav...

2014-09-18 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/2447#discussion_r17741897 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -208,6 +208,23 @@ abstract class RDD[T: ClassTag

[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...

2014-09-15 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/2014#issuecomment-55609759 There really should be at least some mention of zinc (https://github.com/typesafehub/zinc) in our maven build instructions, since using zinc greatly improves

[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...

2014-09-15 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/2014#issuecomment-55611558 Yes, I know that the scala-maven-plugin will throw warnings if zinc isn't being used. I also know that many users are either confused by those warnings or ignore

[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...

2014-09-15 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/2014#discussion_r17570971 --- Diff: docs/building-spark.md --- @@ -159,4 +160,21 @@ then ship it over to the cluster. We are investigating the exact cause

[GitHub] spark pull request: [SPARK-3411] Improve load-balancing of concurr...

2014-09-09 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1106#discussion_r17331939 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -481,13 +481,26 @@ private[spark] class Master( if (state

[GitHub] spark pull request: [SPARK-3446] Expose underlying job ids in Futu...

2014-09-09 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/2337#issuecomment-55047494 I don't understand this claim: ...for job IDs to be useful they need to be exposed there. Could you clarify, please? --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-3411] Improve load-balancing of concurr...

2014-09-08 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1106#discussion_r17253072 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -481,13 +481,23 @@ private[spark] class Master( if (state

[GitHub] spark pull request: [SPARK-3411] Improve load-balancing of concurr...

2014-09-08 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1106#discussion_r17253575 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -481,13 +481,23 @@ private[spark] class Master( if (state

[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1360#discussion_r17254222 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -295,28 +295,34 @@ private[spark] class Master( val

[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1360#discussion_r17255027 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -234,7 +234,7 @@ private[spark] class Worker( try

[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-09-08 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1360#discussion_r17255465 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -295,28 +295,34 @@ private[spark] class Master( val

[GitHub] spark pull request: [SPARK-3411] Improve load-balancing of concurr...

2014-09-08 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1106#discussion_r17283247 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -481,13 +481,27 @@ private[spark] class Master( if (state

[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-08-23 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1360#issuecomment-53163415 I'm not sure I'm following, @mridulm. The problem is not one of removing Executors, but rather of removing Applications that could and should still be left running

[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-08-23 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1360#issuecomment-53164183 @mridulm Is this blacklisting behavior a customization that you have made to Spark? If not, could you point me to where and how it is implemented? What you

[GitHub] spark pull request: [SPARK-3010] fix redundant conditional

2014-08-16 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1992#issuecomment-52413400 Something seems to be amiss in this PR: Several of your changes are neither in `master` already nor are they showing up in `Files changed`. Mismerged

[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...

2014-08-15 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-52339006 Excellent! I'll try to find some time to review this soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [WIP][SPARK-1720] Add the value of LD_LIBRARY_...

2014-08-12 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1031#issuecomment-51945400 It's definitely not cross-platform. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-2425 Don't kill a still-running Applicat...

2014-08-12 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1360#issuecomment-51959659 @pwendell Still should go into 1.1.0... The change is fairly small, and the unpatched behavior is pretty nasty for long-running applications. --- If your project

[GitHub] spark pull request: SPARK-2830

2014-08-12 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1908#issuecomment-51962565 nit: That's not really an adequate title for this PR, Ameet. It should include enough description so that we can tell what it is about in the corresponding subject

[GitHub] spark pull request: [SPARK-2991] Implement RDD lazy transforms for...

2014-08-12 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1909#issuecomment-51965072 Erik, you've been doing some great work on making non-lazy transforms lazy! I haven't had time to thoroughly review your recent PRs, but can you do some checks

[GitHub] spark pull request: [SPARK-2886] Use more specific actor system na...

2014-08-09 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1810#discussion_r16028829 --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala --- @@ -146,9 +146,9 @@ object SparkEnv extends Logging { } val

[GitHub] spark pull request: [SPARK-2897][SPARK-2920]TorrentBroadcast does ...

2014-08-08 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1836#discussion_r16006978 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -17,14 +17,14 @@ package org.apache.spark.broadcast

[GitHub] spark pull request: [SPARK-1022][Streaming] Add Kafka real unit te...

2014-08-05 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1751#issuecomment-51257040 @tdas @pwendell This broke the Maven build: ``` ~/Apache/spark(branch-1.1|✔) ➤ mvn -U -DskipTests clean install . . . [error] Apache/spark

[GitHub] spark pull request: [SPARK-1812] Enable cross build for scala 2.11...

2014-08-04 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/996#discussion_r15743162 --- Diff: assembly/pom.xml --- @@ -26,7 +26,7 @@ /parent groupIdorg.apache.spark/groupId - artifactIdspark-assembly_2.10

[GitHub] spark pull request: [SPARK-1812] Enable cross build for scala 2.11...

2014-08-04 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/996#discussion_r15770288 --- Diff: assembly/pom.xml --- @@ -26,7 +26,7 @@ /parent groupIdorg.apache.spark/groupId - artifactIdspark-assembly_2.10

[GitHub] spark pull request: [SPARK-1812][wip]

2014-08-01 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/996#discussion_r15702768 --- Diff: assembly/pom.xml --- @@ -26,7 +26,7 @@ /parent groupIdorg.apache.spark/groupId - artifactIdspark-assembly_2.10

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15538656 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2638 MapOutputTracker concurrency improv...

2014-07-29 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1542#issuecomment-50511022 I dunno, merging a PR with no changed files doesn't sound too scary to me. Something is definitely messed up in this PR, with both `Commits` and `Files

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15542781 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -348,4 +353,48 @@ private[spark] class Executor

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15543587 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -38,8 +37,10 @@ import org.apache.spark._ import

[GitHub] spark pull request: SPARK-2380: Support displaying accumulator val...

2014-07-29 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1309#discussion_r15557746 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -809,12 +810,25 @@ class DAGScheduler( listenerBus.post

[GitHub] spark pull request: [SPARK-2714] DAGScheduler logs jobid when runJ...

2014-07-29 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1617#issuecomment-50552553 What is the need to expose the jobId after the job is finished? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15559948 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -348,4 +353,48 @@ private[spark] class Executor

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15560393 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -348,4 +353,48 @@ private[spark] class Executor

[GitHub] spark pull request: [SPARK-2521] Broadcast RDD object (instead of ...

2014-07-28 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1498#discussion_r15498156 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -17,7 +17,7 @@ package org.apache.spark.scheduler

[GitHub] spark pull request: [SPARK-2521] Broadcast RDD object (instead of ...

2014-07-28 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1498#discussion_r15499105 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -691,25 +689,41 @@ class DAGScheduler

[GitHub] spark pull request: SPARK-2684: Update ExternalAppendOnlyMap to ta...

2014-07-27 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1607#discussion_r15439805 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -110,42 +110,56 @@ class ExternalAppendOnlyMap[K, V, C

<    1   2   3   4   5   6   7   >