Re: Is RDD thread safe?

2019-11-24 Thread Chang Chen
I need to cache the DataFrame for accelerating query.  In such case, the
two query may simultaneously run the DAG before cache data actually happen.

Sonal Goyal  于2019年11月19日周二 下午9:46写道:

> the RDD or the dataframe is distributed and partitioned by Spark so as to
> leverage all your workers (CPUs) effectively. So all the Dataframe
> operations are actually happening simultaneously on a section of the data.
> Why do you want to use threading here?
>
> Thanks,
> Sonal
> Nube Technologies 
>
> 
>
>
>
>
> On Tue, Nov 12, 2019 at 7:18 AM Chang Chen  wrote:
>
>>
>> Hi all
>>
>> I meet a case where I need cache a source RDD, and then create different
>> DataFrame from it in different threads to accelerate query.
>>
>> I know that SparkSession is thread safe(
>> https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure
>> whether RDD  si thread safe or not
>>
>> Thanks
>> Chang
>>
>


Re: Is RDD thread safe?

2019-11-19 Thread Sonal Goyal
the RDD or the dataframe is distributed and partitioned by Spark so as to
leverage all your workers (CPUs) effectively. So all the Dataframe
operations are actually happening simultaneously on a section of the data.
Why do you want to use threading here?

Thanks,
Sonal
Nube Technologies 






On Tue, Nov 12, 2019 at 7:18 AM Chang Chen  wrote:

>
> Hi all
>
> I meet a case where I need cache a source RDD, and then create different
> DataFrame from it in different threads to accelerate query.
>
> I know that SparkSession is thread safe(
> https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure
> whether RDD  si thread safe or not
>
> Thanks
> Chang
>