Re: Caching tables in spark

2019-08-28 Thread Tzahi File
I mean two separate spark jobs



On Wed, Aug 28, 2019 at 2:25 PM Subash Prabakar 
wrote:

> When you mean by process is it two separate spark jobs? Or two stages
> within same spark code?
>
> Thanks
> Subash
>
> On Wed, 28 Aug 2019 at 19:06,  wrote:
>
>> Take a look at this article
>>
>>
>>
>>
>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-caching.html
>>
>>
>>
>> *From:* Tzahi File 
>> *Sent:* Wednesday, August 28, 2019 5:18 AM
>> *To:* user 
>> *Subject:* Caching tables in spark
>>
>>
>>
>> Hi,
>>
>>
>>
>> Looking for your knowledge with some question.
>>
>> I have 2 different processes that read from the same raw data table
>> (around 1.5 TB).
>>
>> Is there a way to read this data once and cache it somehow and to use
>> this data in both processes?
>>
>>
>>
>>
>>
>> Thanks
>>
>> --
>>
>> *Tzahi File*
>> Data Engineer
>>
>> [image: ironSource] 
>>
>> *email* tzahi.f...@ironsrc.com
>>
>> *mobile* +972-546864835
>>
>> *fax* +972-77-5448273
>>
>> ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv
>> 
>>
>> *ironsrc.com* 
>>
>> [image: linkedin] [image:
>> twitter] [image: facebook]
>> [image: googleplus]
>> 
>>
>> This email (including any attachments) is for the sole use of the
>> intended recipient and may contain confidential information which may be
>> protected by legal privilege. If you are not the intended recipient, or the
>> employee or agent responsible for delivering it to the intended recipient,
>> you are hereby notified that any use, dissemination, distribution or
>> copying of this communication and/or its content is strictly prohibited. If
>> you are not the intended recipient, please immediately notify us by reply
>> email or by telephone, delete this email and destroy any copies. Thank you.
>>
>>
>>
>

-- 
Tzahi File
Data Engineer
[image: ironSource] 

email tzahi.f...@ironsrc.com
mobile +972-546864835
fax +972-77-5448273
ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv
ironsrc.com 
[image: linkedin] [image:
twitter] [image: facebook]
[image: googleplus]

This email (including any attachments) is for the sole use of the intended
recipient and may contain confidential information which may be protected
by legal privilege. If you are not the intended recipient, or the employee
or agent responsible for delivering it to the intended recipient, you are
hereby notified that any use, dissemination, distribution or copying of
this communication and/or its content is strictly prohibited. If you are
not the intended recipient, please immediately notify us by reply email or
by telephone, delete this email and destroy any copies. Thank you.


Re: Caching tables in spark

2019-08-28 Thread Subash Prabakar
When you mean by process is it two separate spark jobs? Or two stages
within same spark code?

Thanks
Subash

On Wed, 28 Aug 2019 at 19:06,  wrote:

> Take a look at this article
>
>
>
>
> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-caching.html
>
>
>
> *From:* Tzahi File 
> *Sent:* Wednesday, August 28, 2019 5:18 AM
> *To:* user 
> *Subject:* Caching tables in spark
>
>
>
> Hi,
>
>
>
> Looking for your knowledge with some question.
>
> I have 2 different processes that read from the same raw data table
> (around 1.5 TB).
>
> Is there a way to read this data once and cache it somehow and to use this
> data in both processes?
>
>
>
>
>
> Thanks
>
> --
>
> *Tzahi File*
> Data Engineer
>
> [image: ironSource] 
>
> *email* tzahi.f...@ironsrc.com
>
> *mobile* +972-546864835
>
> *fax* +972-77-5448273
>
> ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv
> 
>
> *ironsrc.com* 
>
> [image: linkedin] [image:
> twitter] [image: facebook]
> [image: googleplus]
> 
>
> This email (including any attachments) is for the sole use of the intended
> recipient and may contain confidential information which may be protected
> by legal privilege. If you are not the intended recipient, or the employee
> or agent responsible for delivering it to the intended recipient, you are
> hereby notified that any use, dissemination, distribution or copying of
> this communication and/or its content is strictly prohibited. If you are
> not the intended recipient, please immediately notify us by reply email or
> by telephone, delete this email and destroy any copies. Thank you.
>
>
>


RE: Caching tables in spark

2019-08-28 Thread email
Take a look at this article 

 

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-caching.html

 

From: Tzahi File  
Sent: Wednesday, August 28, 2019 5:18 AM
To: user 
Subject: Caching tables in spark

 

Hi, 

 

Looking for your knowledge with some question. 

I have 2 different processes that read from the same raw data table (around 1.5 
TB). 

Is there a way to read this data once and cache it somehow and to use this data 
in both processes? 

 

 

Thanks

-- 


Tzahi File
Data Engineer


  


email   tzahi.f...@ironsrc.com

mobile   +972-546864835

fax +972-77-5448273

ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv


  ironsrc.com


   
    
 


This email (including any attachments) is for the sole use of the intended 
recipient and may contain confidential information which may be protected by 
legal privilege. If you are not the intended recipient, or the employee or 
agent responsible for delivering it to the intended recipient, you are hereby 
notified that any use, dissemination, distribution or copying of this 
communication and/or its content is strictly prohibited. If you are not the 
intended recipient, please immediately notify us by reply email or by 
telephone, delete this email and destroy any copies. Thank you.