Re: state size in relation to cluster size and processing speed

2017-02-21 Thread Aljoscha Krettek
Hi Seth, sorry for taking so long to get back to you on this. I think the watermark thing might have been misleading by me, I don't even know anymore what I was thinking back then. Were you able to confirm that the results were in fact correct for the runs with the different parallelism? I know

Re: state size in relation to cluster size and processing speed

2016-12-23 Thread Seth Wiesman
Watermarks are generated using the PeriodicWatermarkAssigner using a timestamp field from within the records. We are processing log data from an S3 bucket and logs are always processed in chronological order using a custom ContinuousFileMonitoringFunction but the standard

Re: state size in relation to cluster size and processing speed

2016-12-23 Thread Aljoscha Krettek
Hi, how are you generating your watermarks? Could it be that they advance faster when the job is processing more data? Cheers, Aljoscha On Fri, 16 Dec 2016 at 21:01 Seth Wiesman wrote: > Hi, > > > > I’ve noticed something peculiar about the relationship between state

state size in relation to cluster size and processing speed

2016-12-16 Thread Seth Wiesman
Hi, I’ve noticed something peculiar about the relationship between state size and cluster size and was wondering if anyone here knows of the reason. I am running a job with 1 hour tumbling event time windows which have an allowed lateness of 7 days. When I run on a 20-node cluster with FsState