On Thu, 27 Feb 2020, 9:30 pm lec ssmi, <shicheng31...@gmail.com> wrote:

> Hi:
>     I'm new to structured streaming. Because the built-in API cannot
> perform the Count Distinct operation of Window, I want to use
> dropDuplicates first, and then perform the window count.
>    But in the process of using, there are two problems:
>            1. Because it is streaming computing, in the process of
> deduplication, the state needs to be cleared in time, which requires the
> cooperation of watermark. Assuming my event time field is consistently
>               increasing, and I set the watermark to 1 hour, does it mean
> that the data at 10 o'clock will only be compared in these data from 9
> o'clock to 10 o'clock, and the data before 9 o'clock will be cleared ?
>            2. Because it is window deduplication, I set the watermark
> before deduplication to the window size.But after deduplication, I need to
> call withWatermark () again to set the watermark to the real
>                watermark. Will setting the watermark again take effect?
>
>      Thanks a lot !
>

Reply via email to