Hi Gregory, The easiest solution would be to include the site in your key so that at query time the rows from each site can be aggregated together.
Instead of <Key, Value>, the table would be <Key, Site, Value> and your query would become Select sum(value) FROM table GROUP BY key; Otherwise, you will need to get all that data into a single site to perform a final aggregation prior to writing to Cassandra. On Wed, May 15, 2019 at 3:45 AM Melekh, Gregory <gregory.mel...@intl.att.com> wrote: > Hello Flink Experts. > > > > We have Flink job consuming data from Kafka and ingest it to multi-site > (Azure-east – Azure-west) replicated Cassandra. > > Now we have to aggregate data hourly. The problem is that device X can > report once on site A and once on site B. This means that some messages for > that device, will be processed by Flink in site A and some messages will be > processed on site B. > > I want an aggregation result that will reflect all messages transmitted by > specific device X. > > Are there any best practices to handle multi-site ingestion? > > Any idea how to handle the scenario above? > > Thanks in advance. > > -- Seth Wiesman | Solutions Architect +1 314 387 1463 <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Data Artisans GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen