Re: retention policy for spark structured streaming dataset

Lian Jiang Wed, 14 Mar 2018 12:59:47 -0700

It is already partitioned by timestamp. But is it right retention policy
process to stop the streaming job, trim the parquet file and restart the
streaming job? Thanks.


On Wed, Mar 14, 2018 at 12:51 PM, Sunil Parmar <sunilosu...@gmail.com>
wrote:

> Can you use partitioning ( by day ) ? That will  make it easier to drop
> data older than x days outside streaming job.
>
> Sunil Parmar
>
> On Wed, Mar 14, 2018 at 11:36 AM, Lian Jiang <jiangok2...@gmail.com>
> wrote:
>
>> I have a spark structured streaming job which dump data into a parquet
>> file. To avoid the parquet file grows infinitely, I want to discard 3 month
>> old data. Does spark streaming supports this? Or I need to stop the
>> streaming job, trim the parquet file and restart the streaming job? Thanks
>> for any hints.
>>
>
>

Re: retention policy for spark structured streaming dataset

Reply via email to