Okay from looking closer at some of the code, I'm not sure that what I'm asking for in terms of adaptive execution makes much sense as it can only happen between stages. I.e. optimising future /stages/ based on the results of previous stages. Thus an "on-demand" adaptive coalesce doesn't make much sense as it wouldn't necessarily occur at a stage boundary.
However I think my original question still stands of: - How to /dynamically/ deal with poorly partitioned data without incurring a shuffle or extra computation. I think the only thing that's changed is that I no longer have any good ideas on how to do it :/ -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org