Rdd id is immutable and when rdd object created, the rdd id is generated. So why there is race condition in "rdd id" ?
On Mon, Nov 25, 2019 at 11:31 AM Chang Chen <baibaic...@gmail.com> wrote: > I am wonder the concurrent semantics for reason about the correctness. If > the two query simultaneously run the DAGs which use the same cached > DF\RDD,but before cache data actually happen, what will happen? > > By looking into code a litter, I suspect they have different BlockID for > same Dataset which is unexpected behavior, but there is no race condition. > > However RDD id is not lazy, so there is race condition. > > Thanks > Chang > > > Weichen Xu <weichen...@databricks.com> 于2019年11月12日周二 下午1:22写道: > >> Hi Chang, >> >> RDD/Dataframe is immutable and lazy computed. They are thread safe. >> >> Thanks! >> >> On Tue, Nov 12, 2019 at 12:31 PM Chang Chen <baibaic...@gmail.com> wrote: >> >>> Hi all >>> >>> I meet a case where I need cache a source RDD, and then create different >>> DataFrame from it in different threads to accelerate query. >>> >>> I know that SparkSession is thread safe( >>> https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure >>> whether RDD is thread safe or not >>> >>> Thanks >>> >>