[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-08-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1535


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-08-14 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-52275155
  
Hey this looks good. Merging it now into mater. Sorry about the delay.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-31 Thread nkronenfeld
Github user nkronenfeld commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-50836302
  
OK, @pwendel, I think it's set now.   Let me know if there are merge 
problems, I can resubmit on a clean branch if necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-50832292
  
QA results for PR 1535:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17612/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-50828214
  
QA tests have started for PR 1535. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17612/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-30 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-50660639
  
yeah to keep it simple let's just always have it show memory. I'd rather 
not add a new public API for this `showMemory` thing at the moment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-30 Thread nkronenfeld
Github user nkronenfeld commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-50621396
  
Thanks, @pwendel.  I can revert it back if you want - is that preferable to 
the way it is now, with the option to include the memory info or not?

I'll start with taking out the DeveloperAPI and adjusting the docs; I'll 
leave off taking out the optional memory parameter until I hear from you again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-29 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-50579659
  
Hey @nkronenfeld - I traced through the exact function call more closely 
and I actually think it's fine. The issue I pointed out in the JIRA is 
orthogonal. So I'm fine to just revert this back to always showing the status. 
However, we should not mark this as a developer API. This is a stable API we 
are happy to support forever.

Still, this will cause a significant amount of object allocation due to the 
way other internal function calls happen (it is basically O(all blocks)) for an 
application. It might be nice to add a note to the docs that the operation 
might be expensive and should not be called inside of a critical code path. 
Thought we could likely optimize those things down the road.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-50507822
  
QA results for PR 1535:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17363/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-50501244
  
QA tests have started for PR 1535. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17363/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-29 Thread nkronenfeld
Github user nkronenfeld commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-50487526
  
If I'm reading that correctly, that test failure is from an MLLib change 
that's nothing to do with what I've done?  Perhaps I'll just try it again, 
maybe it's a bad sync with master:

Jenkins, please test this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-50419441
  
QA results for PR 1535:- This patch FAILED unit tests.For more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17310/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-50410853
  
QA tests have started for PR 1535. This patch DID NOT merge cleanly! 
View progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17310/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-50407668
  
QA results for PR 1535:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17304/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-50401197
  
QA tests have started for PR 1535. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17304/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-50388630
  
QA results for PR 1535:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17302/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-50388544
  
QA tests have started for PR 1535. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17302/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-28 Thread nkronenfeld
Github user nkronenfeld commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-50388139
  
I just parameterized the memory so one can display it or not as desired 
(with not displaying it the default) - is that sufficient?

I forgot to put in the note about the JIRA into the code, I'll definitely 
add that too, or I can back out the optional nature and just leave in the code 
comment about the JIRA

Let me know which you want, please.

Thanks,
 -Nathan



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-23 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1535#discussion_r15324282
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1269,6 +1269,19 @@ abstract class RDD[T: ClassTag](
 
   /** A description of this RDD and its recursive dependencies for 
debugging. */
   def toDebugString: String = {
+// Get a debug description of an rdd without its children
+def debugSelf (rdd: RDD[_]): Seq[String] = {
+  import Utils.bytesToString
+
+  val persistence = storageLevel.description
+  val storageInfo = rdd.context.getRDDStorageInfo.filter(_.id == 
rdd.id).map(info =>
--- End diff --

BTW - we create add JIRA to add this back once SPARK-2316 is fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-23 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1535#discussion_r15324267
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1269,6 +1269,19 @@ abstract class RDD[T: ClassTag](
 
   /** A description of this RDD and its recursive dependencies for 
debugging. */
   def toDebugString: String = {
+// Get a debug description of an rdd without its children
+def debugSelf (rdd: RDD[_]): Seq[String] = {
+  import Utils.bytesToString
+
+  val persistence = storageLevel.description
+  val storageInfo = rdd.context.getRDDStorageInfo.filter(_.id == 
rdd.id).map(info =>
--- End diff --

Ah sorry, yeah I mean this very costly. I'd rather not do this in a debug 
function - because people will do things like print debug statements inside of 
loops. In that case the debugging will significantly alter the performance of 
their application. There is a separate JIRA to make this function faster (it's 
a function also used in the UI), but until that's fixed I'd rather not call it 
here:

https://issues.apache.org/jira/browse/SPARK-2316


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-23 Thread GregOwen
Github user GregOwen commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-49926584
  
Looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-23 Thread nkronenfeld
Github user nkronenfeld commented on a diff in the pull request:

https://github.com/apache/spark/pull/1535#discussion_r15308845
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1269,6 +1269,19 @@ abstract class RDD[T: ClassTag](
 
   /** A description of this RDD and its recursive dependencies for 
debugging. */
   def toDebugString: String = {
+// Get a debug description of an rdd without its children
+def debugSelf (rdd: RDD[_]): Seq[String] = {
+  import Utils.bytesToString
+
+  val persistence = storageLevel.description
+  val storageInfo = rdd.context.getRDDStorageInfo.filter(_.id == 
rdd.id).map(info =>
--- End diff --

I'm not sure what you mean - do you mean "an extremely costly operation"?

Assuming that to be the case, two comments::

 * I though about attaching flags to the function so one could specify the 
type of debug information desired; I think that makes the function too complex, 
but I'm hardly firm in that idea.
 * This whole function is specifically to help a developer with debugging.  
I don't _think_ having it be costly is all that bad.





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-23 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1535#discussion_r15307173
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1269,6 +1269,19 @@ abstract class RDD[T: ClassTag](
 
   /** A description of this RDD and its recursive dependencies for 
debugging. */
   def toDebugString: String = {
+// Get a debug description of an rdd without its children
+def debugSelf (rdd: RDD[_]): Seq[String] = {
+  import Utils.bytesToString
+
+  val persistence = storageLevel.description
+  val storageInfo = rdd.context.getRDDStorageInfo.filter(_.id == 
rdd.id).map(info =>
--- End diff --

Hey on this one, this is actually an extremely operation... I wonder if 
maybe for now it's better to not put this in there and only put the storage 
level.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-49890189
  
QA results for PR 1535:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17035/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-49875467
  
QA tests have started for PR 1535. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17035/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-23 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-49874874
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-23 Thread nkronenfeld
Github user nkronenfeld commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-49870919
  
I'm not sure what to do about this test failure; all I've changed is 
toDebugString, and this is in a spark streaming test which never calls that, so 
I'm pretty sure it's nothing to do with me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-49831636
  
QA results for PR 1535:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17008/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-49827241
  
QA tests have started for PR 1535. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17008/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-49825506
  
QA results for PR 1535:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17005/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-49825462
  
QA tests have started for PR 1535. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17005/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread nkronenfeld
Github user nkronenfeld commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-49825329
  
thanks mark, I had no idea that existed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/1535#discussion_r15259034
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1294,7 +1307,11 @@ abstract class RDD[T: ClassTag](
   val partitionStr = "(" + rdd.partitions.size + ")"
   val leftOffset = (partitionStr.length - 1) / 2
   val nextPrefix = (" " * leftOffset) + "|" + (" " * 
(partitionStr.length - leftOffset))
-  Seq(partitionStr + " " + rdd) ++ debugChildren(rdd, nextPrefix)
+
+  debugSelf(rdd).zipWithIndex.map{
+case (desc: String, 0) => partitionStr+" "+desc
+case (desc: String, _) => nextPrefix+" "+desc
--- End diff --

And elsewhere in this PR, avoid string concatenation with `+` when string 
interpolation would be equally clear or clearer. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/1535#discussion_r15258957
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1294,7 +1307,11 @@ abstract class RDD[T: ClassTag](
   val partitionStr = "(" + rdd.partitions.size + ")"
   val leftOffset = (partitionStr.length - 1) / 2
   val nextPrefix = (" " * leftOffset) + "|" + (" " * 
(partitionStr.length - leftOffset))
-  Seq(partitionStr + " " + rdd) ++ debugChildren(rdd, nextPrefix)
+
+  debugSelf(rdd).zipWithIndex.map{
+case (desc: String, 0) => partitionStr+" "+desc
+case (desc: String, _) => nextPrefix+" "+desc
--- End diff --

s"$partitionStr $desc"
s"$nextPrefix $desc"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-49799385
  
@gowen mind taking a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-49798677
  
QA results for PR 1535:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16987/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-49798572
  
QA tests have started for PR 1535. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16987/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread nkronenfeld
Github user nkronenfeld commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-49797427
  
Sorry, forgot to move one small formatting issue over from the old branch, 
I'll check that in as soon as I test it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread nkronenfeld
Github user nkronenfeld commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-49783708
  
Done, and I also left a comment on Greg Owen's PR from yesterday asking him 
for formatting comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1535#issuecomment-49782942
  
Hey, do you mind putting an example of what the output looks like in the PR 
description?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to rdd.toDebugString

2014-07-22 Thread nkronenfeld
GitHub user nkronenfeld opened a pull request:

https://github.com/apache/spark/pull/1535

Add caching information to rdd.toDebugString

I find it useful to see where in an RDD's DAG data is cached, so I figured 
others might too.

I've added both the caching level, and the actual memory state of the RDD.

Some of this is redundant with the web UI (notably the actual memory 
state), but (a) that is temporary, and (b) putting it in the DAG tree shows 
some context that can help a lot.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nkronenfeld/spark-1 feature/debug-caching2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1535.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1535


commit 8fbecb6eb47505e7e56949c00107b917c6c5e945
Author: Nathan Kronenfeld 
Date:   2014-07-22T18:44:58Z

Add caching information to rdd.toDebugString




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to RDD.toDebugString

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1532#issuecomment-49749243
  
QA results for PR 1532:- This patch FAILED unit tests.For more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16966/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to RDD.toDebugString

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1532#issuecomment-49749136
  
QA tests have started for PR 1532. This patch DID NOT merge cleanly! 
View progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16966/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to RDD.toDebugString

2014-07-22 Thread nkronenfeld
Github user nkronenfeld closed the pull request at:

https://github.com/apache/spark/pull/1532


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add caching information to RDD.toDebugString

2014-07-22 Thread nkronenfeld
GitHub user nkronenfeld opened a pull request:

https://github.com/apache/spark/pull/1532

Add caching information to RDD.toDebugString

I find it useful to see where in an RDD's DAG data is cached, so I figured 
others might too.

I've added both the caching level, and the actual memory state of the RDD.

Some of this is redundant with the web UI (notably the actual memory 
state), but (a) that is temporary, and (b) putting it in the DAG tree shows 
some context that can help a lot.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nkronenfeld/spark-1 feature/debug-caching

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1532.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1532


commit 06c76ab961afc42e8305d6a3f186361c1e20e04d
Author: Nathan Kronenfeld 
Date:   2014-07-22T14:39:41Z

Add caching information to RDD.toDebugString




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---