subject:"Handling of watermark in structured streaming"

Re: Handling of watermark in structured streaming

2019-05-14 Thread Joe Ammann

Hi Suket, Anastasios Many thanks for your time and your suggestions! I tried again with various settings for the watermarks and the trigger time - watermark 20sec, trigger 2sec - watermark 10sec, trigger 1sec - watermark 20sec, trigger 0sec I also tried continuous processing mode, but since I

Re: Handling of watermark in structured streaming

2019-05-14 Thread Suket Arora

df = inputStream.withWatermark("eventtime", "20 seconds").groupBy("sharedId", window("20 seconds", "10 seconds") // ProcessingTime trigger with two-seconds micro-batch interval df.writeStream .format("console") .trigger(Trigger.ProcessingTime("2 seconds")) .start() On Tue, 14 May

Re: Handling of watermark in structured streaming

2019-05-14 Thread Joe Ammann

Hi Anastasios On 5/14/19 4:15 PM, Anastasios Zouzias wrote: > Hi Joe, > > How often do you trigger your mini-batch? Maybe you can specify the trigger > time explicitly to a low value or even better set it off. > > See: >

Re: Handling of watermark in structured streaming

2019-05-14 Thread Joe Ammann

Hi Suket Sorry, this was a typo in the pseudo-code I sent. Of course that what you suggested (using the same eventtime attribute for the watermark and the window) is what my code does in reality. Sorry, to confuse people. On 5/14/19 4:14 PM, suket arora wrote: > Hi Joe, > As per the spark

Re: Handling of watermark in structured streaming

2019-05-14 Thread Anastasios Zouzias

Hi Joe, How often do you trigger your mini-batch? Maybe you can specify the trigger time explicitly to a low value or even better set it off. See: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers Best, Anastasios On Tue, May 14, 2019 at 3:49 PM Joe

Re: Handling of watermark in structured streaming

2019-05-14 Thread suket arora

Hi Joe, As per the spark structured streaming documentation and I quote "withWatermark must be called on the same column as the timestamp column used in the aggregate. For example, df.withWatermark("time", "1 min").groupBy("time2").count() is invalid in Append output mode, as watermark is defined

Handling of watermark in structured streaming

2019-05-14 Thread Joe Ammann

Hi all I'm fairly new to Spark structured streaming and I'm only starting to develop an understanding for the watermark handling. Our application reads data from a Kafka input topic and as one of the first steps, it has to group incoming messages. Those messages come in bulks, e.g. 5 messages

Re: Handling of watermark in structured streaming

Re: Handling of watermark in structured streaming

Re: Handling of watermark in structured streaming

Re: Handling of watermark in structured streaming

Re: Handling of watermark in structured streaming

Re: Handling of watermark in structured streaming

Handling of watermark in structured streaming

7 matches

Site Navigation

Mail list logo

Footer information