Re: Spark 2.2 streaming with append mode: empty output

2017-08-15 Thread Ashwin Raju
The input dataset has multiple days worth of data, so I thought the watermark should have been crossed. To debug, I changed the query to the code below. My expectation was that since I am doing 1 day windows with late arrivals permitted for 1 second, when it sees records for the next day, it would

Re: Spark 2.2 streaming with append mode: empty output

2017-08-14 Thread Tathagata Das
In append mode, the aggregation outputs a row only when the watermark has been crossed and the corresponding aggregate is *final*, that is, will not be updated any more. See http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking On Mon,

Spark 2.2 streaming with append mode: empty output

2017-08-14 Thread Ashwin Raju
Hi, I am running Spark 2.2 and trying out structured streaming. I have the following code: from pyspark.sql import functions as F df=frame \ .withWatermark("timestamp","1 minute") \ .groupby(F.window("timestamp","1 day"),*groupby_cols) \ .agg(f.sum('bytes')) query =