You don’t need to know rdd dependencies to maximize dependencies. Internally 
the scheduler will construct the DAG and trigger the execution if there is no 
shuffle dependencies in between RDDs.

Thanks.

Zhan Zhang 
On Feb 26, 2015, at 1:28 PM, Corey Nolet <cjno...@gmail.com> wrote:

> Let's say I'm given 2 RDDs and told to store them in a sequence file and they 
> have the following dependency:
> 
> val rdd1 = sparkContext.sequenceFile().....cache()
> val rdd2 = rdd1.map(....)....
> 
> 
> How would I tell programmatically without being the one who built rdd1 and 
> rdd2 whether or not rdd2 depends on rdd1?
> 
> I'm working on a concurrency model for my application and I won't necessarily 
> know how the two rdds are constructed. What I will know is whether or not 
> rdd1 is cached but i want to maximum concurrency and run rdd1 and rdd2 
> together if rdd2 does not depend on rdd1.
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to