Fwd: the life cycle shuffle Dependency

2023-12-27 Thread yang chen
hi, I'm learning spark, and wonder when to delete shuffle data, I find the ContextCleaner class which clean the shuffle data when shuffle dependency is GC-ed. Based on source code, the shuffle dependency is gc-ed only when active job finish, but i'm not sure, Could you explain the life cycle of

Re: Combining Many RDDs

2015-03-27 Thread Yang Chen
= sc.parallelize(data)// Create and partition the 0.5M items in a single RDD. .flatMap(compute(_)) // You still have only one RDD with each item joined with external data already Hope this help. Kelvin On Thu, Mar 26, 2015 at 2:37 PM, Yang Chen y...@yang-cs.com wrote: Hi Mark

Re: Combining Many RDDs

2015-03-26 Thread Yang Chen
Hi Mark, That's true, but in neither way can I combine the RDDs, so I have to avoid unions. Thanks, Yang On Thu, Mar 26, 2015 at 5:31 PM, Mark Hamstra m...@clearstorydata.com wrote: RDD#union is not the same thing as SparkContext#union On Thu, Mar 26, 2015 at 2:27 PM, Yang Chen y...@yang