[GitHub] spark issue #21096: [SPARK-24011][CORE] cache rdd's immediate parent Shuffle...

2018-04-18 Thread Ngone51
Github user Ngone51 commented on the issue:

https://github.com/apache/spark/pull/21096
  
Thank you for your comments @markhamstra .
Yeah, I'm considering adding a UT to support this change. And thank for 
reminding me of DAGScheduler's basic principle.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21096: [SPARK-24011][CORE] cache rdd's immediate parent Shuffle...

2018-04-18 Thread markhamstra
Github user markhamstra commented on the issue:

https://github.com/apache/spark/pull/21096
  
As is, this PR isn't acceptable on multiple levels. Even if I were 
convinced (which I am not presently) that the sequence of 
`getShuffleDependencies` calls covered in this PR is the only one possible and 
therefore can be cached/memoized as this PR does, a complete lack of test 
coverage for the new code is not something we want in something as critical as 
the DAGScheduler. And I can tell you right now that there is at least one 
missing test that this PR will fail. You are adding a new data structure to the 
DAGScheduler, so at a bare minimum you also need to add that HashMap to 
`assertDataStructuresEmpty` in the DAGSchedulerSuite. Never deleting unneeded 
elements from a DAGScheduler data structure is an unacceptable resource leak.   


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21096: [SPARK-24011][CORE] cache rdd's immediate parent Shuffle...

2018-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21096
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org