[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...

2017-11-09 Thread nkronenfeld
Github user nkronenfeld commented on the issue: https://github.com/apache/spark/pull/19705 ok, now I question my own testing... does maven not run scalastyle tests? Or did I not run the tests properly somehow? I just ran mvn test from root, and it all seemed to work on my machine

[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...

2017-11-08 Thread nkronenfeld
Github user nkronenfeld commented on the issue: https://github.com/apache/spark/pull/19705 @gatorsmile @srowen I think this is set now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #19705: [SPARK-22308][test-maven] Support alternative uni...

2017-11-08 Thread nkronenfeld
GitHub user nkronenfeld opened a pull request: https://github.com/apache/spark/pull/19705 [SPARK-22308][test-maven] Support alternative unit testing styles in external applications Continuation of PR#19528 (https://github.com/apache/spark/pull/19529#issuecomment-340252119

[GitHub] spark issue #19529: [SPARK-22308] Support alternative unit testing styles in...

2017-10-30 Thread nkronenfeld
Github user nkronenfeld commented on the issue: https://github.com/apache/spark/pull/19529 hm... I was always testing with sbt, because maven was so slow to do anything. Will do --- - To unsubscribe, e

[GitHub] spark pull request #19529: [SPARK-22308] Support alternative unit testing st...

2017-10-25 Thread nkronenfeld
Github user nkronenfeld commented on a diff in the pull request: https://github.com/apache/spark/pull/19529#discussion_r147038647 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/SharedSQLContext.scala --- @@ -17,86 +17,8 @@ package org.apache.spark.sql.test

[GitHub] spark pull request #19529: [SPARK-22308] Support alternative unit testing st...

2017-10-25 Thread nkronenfeld
Github user nkronenfeld commented on a diff in the pull request: https://github.com/apache/spark/pull/19529#discussion_r147038487 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala --- @@ -29,7 +31,14 @@ import

[GitHub] spark pull request #19529: [SPARK-22308] Support alternative unit testing st...

2017-10-25 Thread nkronenfeld
Github user nkronenfeld commented on a diff in the pull request: https://github.com/apache/spark/pull/19529#discussion_r147038504 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/SharedSQLContext.scala --- @@ -17,86 +17,8 @@ package org.apache.spark.sql.test

[GitHub] spark pull request #19529: [SPARK-22308] Support alternative unit testing st...

2017-10-25 Thread nkronenfeld
Github user nkronenfeld commented on a diff in the pull request: https://github.com/apache/spark/pull/19529#discussion_r146887435 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/SharedSQLContext.scala --- @@ -17,86 +17,8 @@ package org.apache.spark.sql.test

[GitHub] spark pull request #19529: [SPARK-22308] Support alternative unit testing st...

2017-10-25 Thread nkronenfeld
Github user nkronenfeld commented on a diff in the pull request: https://github.com/apache/spark/pull/19529#discussion_r146885975 --- Diff: core/src/test/scala/org/apache/spark/SharedSparkContext.scala --- @@ -29,10 +29,25 @@ trait SharedSparkContext extends BeforeAndAfterAll

[GitHub] spark pull request #19529: [SPARK-22308] Support alternative unit testing st...

2017-10-25 Thread nkronenfeld
Github user nkronenfeld commented on a diff in the pull request: https://github.com/apache/spark/pull/19529#discussion_r146885926 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala --- @@ -52,17 +55,142 @@ import org.apache.spark.util

[GitHub] spark issue #19529: [SPARK-22308] Support alternative unit testing styles in...

2017-10-24 Thread nkronenfeld
Github user nkronenfeld commented on the issue: https://github.com/apache/spark/pull/19529 I assume the "unknown error code, -9" is: [error] running /home/jenkins/workspace/SparkPullRequestBuilder/build/sbt -Phadoop-2.6 -Pflume -Phive-thriftserver -Pyarn -

[GitHub] spark issue #19529: [SPARK-22308] Support alternative unit testing styles in...

2017-10-23 Thread nkronenfeld
Github user nkronenfeld commented on the issue: https://github.com/apache/spark/pull/19529 Yeah, as predicted, that made PlanTest very easy to review, but didn't do as well with SQLTestUtils. I suspect I reordered functions and what-not when I was moving stuff around

[GitHub] spark issue #19529: [SPARK-22308] Support alternative unit testing styles in...

2017-10-23 Thread nkronenfeld
Github user nkronenfeld commented on the issue: https://github.com/apache/spark/pull/19529 @gatorsmile sounds good, giving that a try now... assuming tests pass, I'll check it in and see if it's any better. I've so far done this for PlanTest and SQLTestUtils PlanTest I

[GitHub] spark issue #19529: [SPARK-22308] Support alternative unit testing styles in...

2017-10-23 Thread nkronenfeld
Github user nkronenfeld commented on the issue: https://github.com/apache/spark/pull/19529 @gatorsmile the code changes aren't huge - there's almost no new code here, it's all just moving code around from one file to another in order to expose a SharedSparkSession with no dependence

[GitHub] spark pull request #19529: [SPARK-22308] Support alternative unit testing st...

2017-10-23 Thread nkronenfeld
Github user nkronenfeld commented on a diff in the pull request: https://github.com/apache/spark/pull/19529#discussion_r146312166 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/GenericFlatSpecSuite.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19529: [SPARK-22308] Support alternative unit testing st...

2017-10-23 Thread nkronenfeld
Github user nkronenfeld commented on a diff in the pull request: https://github.com/apache/spark/pull/19529#discussion_r146311684 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala --- @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19529: [SPARK-22308] Support alternative unit testing st...

2017-10-23 Thread nkronenfeld
Github user nkronenfeld commented on a diff in the pull request: https://github.com/apache/spark/pull/19529#discussion_r146311180 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala --- @@ -52,249 +36,23 @@ import org.apache.spark.util

[GitHub] spark pull request #19529: [SPARK-22308] Support alternative unit testing st...

2017-10-23 Thread nkronenfeld
Github user nkronenfeld commented on a diff in the pull request: https://github.com/apache/spark/pull/19529#discussion_r146308234 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala --- @@ -18,158 +18,9 @@ package

[GitHub] spark issue #19529: [SPARK-22308] Support alternative unit testing styles in...

2017-10-22 Thread nkronenfeld
Github user nkronenfeld commented on the issue: https://github.com/apache/spark/pull/19529 Documentation removed as per @srowen 's request in the associated JIRA issue [SPARK-22308] --- - To unsubscribe, e-mail

[GitHub] spark issue #19529: [SPARK-22308] Support alternative unit testing styles in...

2017-10-18 Thread nkronenfeld
Github user nkronenfeld commented on the issue: https://github.com/apache/spark/pull/19529 nope, using lazy val initialization won't work - at the very least, UnsafeKryoSerializerSuite modifies conf before context construction

[GitHub] spark issue #19529: Support alternative unit testing styles in external appl...

2017-10-18 Thread nkronenfeld
Github user nkronenfeld commented on the issue: https://github.com/apache/spark/pull/19529 There is one small hack in the way this was done, which is documented - see the comments and documentation on SharedSparkSession.initializeSession and SharedSparkContext.initializeContext. I

[GitHub] spark issue #19529: Support alternative unit testing styles in external appl...

2017-10-18 Thread nkronenfeld
Github user nkronenfeld commented on the issue: https://github.com/apache/spark/pull/19529 I made my original changes here by using git mv PlanTest.scala PlanTestBase.scala git mv SQLTestUnit.scala SQLTestUnitBase.scala git mv SharedSQLContext.scala

[GitHub] spark pull request #19529: Support alternative unit testing styles in extern...

2017-10-18 Thread nkronenfeld
GitHub user nkronenfeld opened a pull request: https://github.com/apache/spark/pull/19529 Support alternative unit testing styles in external applications ## What changes were proposed in this pull request? Support unit tests of external code (i.e., applications that use spark

[GitHub] spark pull request: [WIP] Common interfaces between RDD, DStream, ...

2015-04-27 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/5565#issuecomment-96656969 I'll see, I guess I can always reopen. If I can get some work towards this done, some deprecation with new versions of the necessary functions, I'll resubmit

[GitHub] spark pull request: [WIP] Common interfaces between RDD, DStream, ...

2015-04-27 Thread nkronenfeld
Github user nkronenfeld closed the pull request at: https://github.com/apache/spark/pull/5565 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: Common interfaces between RDD, DStream, and Da...

2015-04-22 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/5565#issuecomment-95221601 Could you leave it open a little bit longer? I want to see if I can shorten the list of differences, and having it active here makes it a little easier to see

[GitHub] spark pull request: Common interfaces between RDD, DStream, and Da...

2015-04-20 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/5565#issuecomment-94472545 @srowen - the examples are pretty simple - it happens as soon as you are passing an RDD to something, rather than passing something to an RDD. For instance

[GitHub] spark pull request: Common interfaces between RDD, DStream, and Da...

2015-04-20 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/5565#issuecomment-94471584 @pwendell - something like reduceByKey having the opposite argument order of everything else in the code seems like an obvious mistake in the API

[GitHub] spark pull request: Common interfaces between RDD, DStream, and Da...

2015-04-20 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/5565#issuecomment-94519103 @srowen - If I understand your question, you're asking why we can't take our application, written for an RDD, and just put the whole thing inside a call

[GitHub] spark pull request: Common interfaces between RDD, DStream, and Da...

2015-04-20 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/5565#issuecomment-94507802 @pwendell I get all that... does that mean we are stuck with those deprecated versions forever then? Or is there some timeframe over which deprecated functions

[GitHub] spark pull request: Common interfaces between RDD, DStream, and Da...

2015-04-20 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/5565#issuecomment-94510832 @srowen are you suggesting putting the entire application inside the foreachRDD call? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: Common interfaces between RDD, DStream, and Da...

2015-04-20 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/5565#issuecomment-94554558 @srowen I'm sure, for what it's worth, that part works out - it's whether the surrounding application structure can fit in that paradigm that is the issue

[GitHub] spark pull request: Common interfaces between RDD, DStream, and Da...

2015-04-19 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/5565#issuecomment-94330557 Thanks for taking a look at it. I'll take a look and write up here exactly what API changes this makes - while I don't think there are many, there aren't

[GitHub] spark pull request: Common interfaces between RDD, DStream, and Da...

2015-04-19 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/5565#issuecomment-94332668 Ok, I think there are only 3 real API changes, though one of them appears in multiple places. * `RDD` * `flatMap[U](f: T = TraversableOnce[U])` changed

[GitHub] spark pull request: Common interfaces between RDD, DStream, and Da...

2015-04-19 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/5565#issuecomment-94351472 No, we do that at the moment. But doing it that way results in a few rather ugly constructs in the application code that can be rather painful, as soon

[GitHub] spark pull request: Common interfaces between RDD, DStream, and Da...

2015-04-17 Thread nkronenfeld
GitHub user nkronenfeld opened a pull request: https://github.com/apache/spark/pull/5565 Common interfaces between RDD, DStream, and DataFrame This PR is the beginning of an attempt to put a common interface between the main distributed collection types in Spark. I've

[GitHub] spark pull request: [Spark-4848] Allow different Worker configurat...

2015-04-13 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/5140#issuecomment-92497626 @andrewor14 - is there anything else that needs to get done here? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [Spark-4848] Stand-alone cluster: Allow differ...

2015-03-23 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/5140#issuecomment-85117276 I'm not sure how mesos and yarn clusters are started/stopped (nor do I have such clusters on which to test), so I'm not sure how this will affect them. I think

[GitHub] spark pull request: [SPARK-4848] Stand-alone cluster: Allow differ...

2015-03-23 Thread nkronenfeld
Github user nkronenfeld closed the pull request at: https://github.com/apache/spark/pull/3699 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-4848] Stand-alone cluster: Allow differ...

2015-03-23 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/3699#issuecomment-85112967 I'm redoing this in the latest code, remaking the PR from scratch, to alleviate merge issues. I'll post the new PR here when it's made. --- If your project is set

[GitHub] spark pull request: [Spark-4848] Stand-alone cluster: Allow differ...

2015-03-23 Thread nkronenfeld
GitHub user nkronenfeld opened a pull request: https://github.com/apache/spark/pull/5140 [Spark-4848] Stand-alone cluster: Allow differences between workers with multiple instances This refixes #3699 with the latest code. This fixes SPARK-4848 I've changed the stand

[GitHub] spark pull request: [SPARK-4848] Stand-alone cluster: Allow differ...

2014-12-15 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/3699#issuecomment-66998011 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-4848] Stand-alone cluster: Allow differ...

2014-12-14 Thread nkronenfeld
GitHub user nkronenfeld opened a pull request: https://github.com/apache/spark/pull/3699 [SPARK-4848] Stand-alone cluster: Allow differences between workers with multiple instances I've changed the stand-alone cluster run scripts to allow different workers to have different

[GitHub] spark pull request: [SPARK-4848] Stand-alone cluster: Allow differ...

2014-12-14 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/3699#issuecomment-66957137 I'm not sure how mesos and yarn clusters are started/stopped (nor do I have such clusters on which to test), so I'm not sure how this will affect them. I think

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-09 Thread nkronenfeld
Github user nkronenfeld commented on a diff in the pull request: https://github.com/apache/spark/pull/3570#discussion_r21572834 --- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala --- @@ -281,7 +282,9 @@ object AccumulatorParam { private object Accumulators

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-09 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/3570#issuecomment-66380241 comment fixed. I'm trying to test the MiMa related changes to see if they work, and having problems running mima on my machine. I'll probably just push them

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-09 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/3570#issuecomment-66398560 I thought I'd done so, it looks like it lost my changes I'll fix that asap --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-09 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/3570#issuecomment-66399588 sorry, must have accidentally hit cancel instead of comment the first time. Should be set now. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-08 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/3570#issuecomment-66238332 Any word on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-05 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/3570#issuecomment-65883687 Should I back out the correction to the mima failure? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-05 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/3570#issuecomment-65883762 great... I think outside the mima issue, it should be all set, unless I can figure out a way to unit test it. So far, my best methods of testing it involve

[GitHub] spark pull request: [SPARK-4772] Clear local copies of accumulator...

2014-12-05 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/3570#issuecomment-65883848 oh, a note for when you're reviewing - I didn't move the clear call, I just added a second one; I saw no particular harm in leaving the old one there too, just

[GitHub] spark pull request: Clear local copies of accumulators as soon as ...

2014-12-02 Thread nkronenfeld
GitHub user nkronenfeld opened a pull request: https://github.com/apache/spark/pull/3570 Clear local copies of accumulators as soon as we're done with them I suspect this PR is not really quite ready for merging, but this seemed the best way of getting a few comments

[GitHub] spark pull request: Rectify gereneric parameter names between Spar...

2014-10-02 Thread nkronenfeld
GitHub user nkronenfeld opened a pull request: https://github.com/apache/spark/pull/2637 Rectify gereneric parameter names between SparkContext and AccumulablePa... AccumulableParam gave its generic parameters as 'R, T', whereas SparkContext labeled them 'T, R'. Trivial

[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-31 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/1535#issuecomment-50836302 OK, @pwendel, I think it's set now. Let me know if there are merge problems, I can resubmit on a clean branch if necessary. --- If your project is set up

[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-30 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/1535#issuecomment-50621396 Thanks, @pwendel. I can revert it back if you want - is that preferable to the way it is now, with the option to include the memory info or not? I'll start

[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-29 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/1535#issuecomment-50487526 If I'm reading that correctly, that test failure is from an MLLib change that's nothing to do with what I've done? Perhaps I'll just try it again, maybe it's a bad

[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-28 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/1535#issuecomment-50388139 I just parameterized the memory so one can display it or not as desired (with not displaying it the default) - is that sufficient? I forgot to put

[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-23 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/1535#issuecomment-49870919 I'm not sure what to do about this test failure; all I've changed is toDebugString, and this is in a spark streaming test which never calls that, so I'm pretty sure

[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-23 Thread nkronenfeld
Github user nkronenfeld commented on a diff in the pull request: https://github.com/apache/spark/pull/1535#discussion_r15308845 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1269,6 +1269,19 @@ abstract class RDD[T: ClassTag]( /** A description

[GitHub] spark pull request: Add caching information to RDD.toDebugString

2014-07-22 Thread nkronenfeld
GitHub user nkronenfeld opened a pull request: https://github.com/apache/spark/pull/1532 Add caching information to RDD.toDebugString I find it useful to see where in an RDD's DAG data is cached, so I figured others might too. I've added both the caching level

[GitHub] spark pull request: Add caching information to RDD.toDebugString

2014-07-22 Thread nkronenfeld
Github user nkronenfeld closed the pull request at: https://github.com/apache/spark/pull/1532 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread nkronenfeld
GitHub user nkronenfeld opened a pull request: https://github.com/apache/spark/pull/1535 Add caching information to rdd.toDebugString I find it useful to see where in an RDD's DAG data is cached, so I figured others might too. I've added both the caching level

[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/1535#issuecomment-49783708 Done, and I also left a comment on Greg Owen's PR from yesterday asking him for formatting comments --- If your project is set up for it, you can reply

[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/1535#issuecomment-49797427 Sorry, forgot to move one small formatting issue over from the old branch, I'll check that in as soon as I test it. --- If your project is set up for it, you can

[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/1535#issuecomment-49825329 thanks mark, I had no idea that existed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-24 Thread nkronenfeld
Github user nkronenfeld commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-38506453 I think my initial sliding method PR addresses @pwendell 's concerns here - but runs afoul of some other concerns raised by @rxin about code complexity