Hi Seth,
sorry for taking so long to get back to you on this. I think the watermark
thing might have been misleading by me, I don't even know anymore what I
was thinking back then.
Were you able to confirm that the results were in fact correct for the runs
with the different parallelism? I know
Watermarks are generated using the PeriodicWatermarkAssigner using a timestamp
field from within the records. We are processing log data from an S3 bucket and
logs are always processed in chronological order using a custom
ContinuousFileMonitoringFunction but the standard
Hi,
how are you generating your watermarks? Could it be that they advance
faster when the job is processing more data?
Cheers,
Aljoscha
On Fri, 16 Dec 2016 at 21:01 Seth Wiesman wrote:
> Hi,
>
>
>
> I’ve noticed something peculiar about the relationship between state
Hi,
I’ve noticed something peculiar about the relationship between state size and
cluster size and was wondering if anyone here knows of the reason. I am running
a job with 1 hour tumbling event time windows which have an allowed lateness of
7 days. When I run on a 20-node cluster with FsState