Re: Is RDD thread safe?

Weichen Xu Sun, 24 Nov 2019 19:53:30 -0800

Rdd id is immutable and when rdd object created, the rdd id is generated.
So why there is race condition in "rdd id" ?


On Mon, Nov 25, 2019 at 11:31 AM Chang Chen <[email protected]> wrote:

> I am wonder the concurrent semantics for reason about the correctness. If
> the two query simultaneously run the DAGs which use the same cached
> DF\RDD，but before cache data actually happen， what will happen?
>
> By looking into code a litter, I suspect they have different BlockID for
> same Dataset which is unexpected behavior, but there is no race condition.
>
> However RDD id is not lazy, so there is race condition.
>
> Thanks
> Chang
>
>
> Weichen Xu <[email protected]> 于2019年11月12日周二 下午1:22写道：
>
>> Hi Chang,
>>
>> RDD/Dataframe is immutable and lazy computed. They are thread safe.
>>
>> Thanks!
>>
>> On Tue, Nov 12, 2019 at 12:31 PM Chang Chen <[email protected]> wrote:
>>
>>> Hi all
>>>
>>> I meet a case where I need cache a source RDD, and then create different
>>> DataFrame from it in different threads to accelerate query.
>>>
>>> I know that SparkSession is thread safe(
>>> https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure
>>> whether RDD  is thread safe or not
>>>
>>> Thanks
>>>
>>

Re: Is RDD thread safe?

Reply via email to