Re: Structured Streaming: distinct (Spark 2.2)

2018-03-19 Thread Burak Yavuz
I believe the docs are out of date regarding distinct. The behavior should be as follows: - Distinct should be applied across triggers - In order to prevent the state from growing indefinitely, you need to add a watermark - If you don't have a watermark, but your key space is small, that's

Structured Streaming: distinct (Spark 2.2)

2018-03-19 Thread Geoff Von Allmen
I see in the documentation that the distinct operation is not supported in Structured Streaming. That being said, I have noticed that you are able to successfully call distinct() on a data