Thanks Tathagata for the pointer.
On Mon, Jun 19, 2017 at 8:24 PM, Tathagata Das
wrote:
> That is not the write way to use watermark + append output mode. The
> `withWatermark` must be before the aggregation. Something like this.
>
> df.withWatermark("timestamp", "1
And perhaps the error message can be improved here?
From: Tathagata Das <tathagata.das1...@gmail.com>
Sent: Monday, June 19, 2017 8:24:01 PM
To: kaniska Mandal
Cc: Burak Yavuz; user
Subject: Re: How save streaming aggregations on 'Structured Streams' in p
That is not the write way to use watermark + append output mode. The
`withWatermark` must be before the aggregation. Something like this.
df.withWatermark("timestamp", "1 hour")
.groupBy(window("timestamp", "30 seconds"))
.agg(...)
Read more here -
Hi Burak,
Per your suggestion, I have specified
> deviceBasicAgg.withWatermark("eventtime", "30 seconds");
before invoking deviceBasicAgg.writeStream()...
But I am still facing ~
org.apache.spark.sql.AnalysisException: Append output mode not supported
when there are streaming aggregations on
Hi Kaniska,
In order to use append mode with aggregations, you need to set an event
time watermark (using `withWatermark`). Otherwise, Spark doesn't know when
to output an aggregation result as "final".
Best,
Burak
On Mon, Jun 19, 2017 at 11:03 AM, kaniska Mandal
Hi,
My goal is to ~
(1) either chain streaming aggregations in a single query OR
(2) run multiple streaming aggregations and save data in some meaningful
format to execute low latency / failsafe OLAP queries
So my first choice is parquet format , but I failed to make it work !
I am using