Re: How save streaming aggregations on 'Structured Streams' in parquet format ?

2017-06-19 Thread kaniska Mandal
Thanks Tathagata for the pointer. On Mon, Jun 19, 2017 at 8:24 PM, Tathagata Das wrote: > That is not the write way to use watermark + append output mode. The > `withWatermark` must be before the aggregation. Something like this. > > df.withWatermark("timestamp", "1

Re: How save streaming aggregations on 'Structured Streams' in parquet format ?

2017-06-19 Thread Felix Cheung
And perhaps the error message can be improved here? From: Tathagata Das <tathagata.das1...@gmail.com> Sent: Monday, June 19, 2017 8:24:01 PM To: kaniska Mandal Cc: Burak Yavuz; user Subject: Re: How save streaming aggregations on 'Structured Streams' in p

Re: How save streaming aggregations on 'Structured Streams' in parquet format ?

2017-06-19 Thread Tathagata Das
That is not the write way to use watermark + append output mode. The `withWatermark` must be before the aggregation. Something like this. df.withWatermark("timestamp", "1 hour") .groupBy(window("timestamp", "30 seconds")) .agg(...) Read more here -

Re: How save streaming aggregations on 'Structured Streams' in parquet format ?

2017-06-19 Thread kaniska Mandal
Hi Burak, Per your suggestion, I have specified > deviceBasicAgg.withWatermark("eventtime", "30 seconds"); before invoking deviceBasicAgg.writeStream()... But I am still facing ~ org.apache.spark.sql.AnalysisException: Append output mode not supported when there are streaming aggregations on

Re: How save streaming aggregations on 'Structured Streams' in parquet format ?

2017-06-19 Thread Burak Yavuz
Hi Kaniska, In order to use append mode with aggregations, you need to set an event time watermark (using `withWatermark`). Otherwise, Spark doesn't know when to output an aggregation result as "final". Best, Burak On Mon, Jun 19, 2017 at 11:03 AM, kaniska Mandal

How save streaming aggregations on 'Structured Streams' in parquet format ?

2017-06-19 Thread kaniska Mandal
Hi, My goal is to ~ (1) either chain streaming aggregations in a single query OR (2) run multiple streaming aggregations and save data in some meaningful format to execute low latency / failsafe OLAP queries So my first choice is parquet format , but I failed to make it work ! I am using