Re: retention policy for spark structured streaming dataset

2018-03-14 Thread Lian Jiang
It is already partitioned by timestamp. But is it right retention policy process to stop the streaming job, trim the parquet file and restart the streaming job? Thanks. On Wed, Mar 14, 2018 at 12:51 PM, Sunil Parmar wrote: > Can you use partitioning ( by day ) ? That will

Re: retention policy for spark structured streaming dataset

2018-03-14 Thread Sunil Parmar
Can you use partitioning ( by day ) ? That will make it easier to drop data older than x days outside streaming job. Sunil Parmar On Wed, Mar 14, 2018 at 11:36 AM, Lian Jiang wrote: > I have a spark structured streaming job which dump data into a parquet > file. To

retention policy for spark structured streaming dataset

2018-03-14 Thread Lian Jiang
I have a spark structured streaming job which dump data into a parquet file. To avoid the parquet file grows infinitely, I want to discard 3 month old data. Does spark streaming supports this? Or I need to stop the streaming job, trim the parquet file and restart the streaming job? Thanks for any