[GitHub] spark issue #21096: [SPARK-24011][CORE][WIP] cache rdd's immediate parent Sh...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21096 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21096: [SPARK-24011][CORE][WIP] cache rdd's immediate parent Sh...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/21096 Thanks for your opinions @squito @markhamstra . Maybe, I should leave it for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21096: [SPARK-24011][CORE][WIP] cache rdd's immediate parent Sh...
Github user squito commented on the issue: https://github.com/apache/spark/pull/21096 its not a bad idea, but as @markhamstra mentions we can't have an `rddToImmediateShuffleDependency` data structure which keeps growing. You could keep it local to one job submission, which would also slightly diminish its utility. Did you observe this as a bottleneck from some profiling? Otherwise I'm inclined to say its not worth the complexity right now. I'd normally expect to only have to walk through a very small number of RDDs and so it'll be quick. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21096: [SPARK-24011][CORE][WIP] cache rdd's immediate parent Sh...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/21096 ping @jiangxb1987 @squito Would you please have a look at this PR? What's your opinions on the cache strategy? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org