Rdd id is immutable and when rdd object created, the rdd id is generated.
So why there is race condition in "rdd id" ?

On Mon, Nov 25, 2019 at 11:31 AM Chang Chen <baibaic...@gmail.com> wrote:

> I am wonder the concurrent semantics for reason about the correctness. If
> the two query simultaneously run the DAGs which use the same cached
> DF\RDD,but before cache data actually happen, what will happen?
>
> By looking into code a litter, I suspect they have different BlockID for
> same Dataset which is unexpected behavior, but there is no race condition.
>
> However RDD id is not lazy, so there is race condition.
>
> Thanks
> Chang
>
>
> Weichen Xu <weichen...@databricks.com> 于2019年11月12日周二 下午1:22写道:
>
>> Hi Chang,
>>
>> RDD/Dataframe is immutable and lazy computed. They are thread safe.
>>
>> Thanks!
>>
>> On Tue, Nov 12, 2019 at 12:31 PM Chang Chen <baibaic...@gmail.com> wrote:
>>
>>> Hi all
>>>
>>> I meet a case where I need cache a source RDD, and then create different
>>> DataFrame from it in different threads to accelerate query.
>>>
>>> I know that SparkSession is thread safe(
>>> https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure
>>> whether RDD  is thread safe or not
>>>
>>> Thanks
>>>
>>

Reply via email to