Hi Gregory,

The easiest solution would be to include the site in your key so that at
query time the rows from each site can be aggregated together.

Instead of <Key, Value>, the table would be <Key, Site, Value> and your
query would become Select sum(value) FROM table GROUP BY key;

Otherwise, you will need to get all that data into a single site to perform
a final aggregation prior to writing to Cassandra.

On Wed, May 15, 2019 at 3:45 AM Melekh, Gregory <gregory.mel...@intl.att.com>
wrote:

> Hello Flink Experts.
>
>
>
> We have Flink job consuming data from Kafka and ingest it to multi-site
> (Azure-east – Azure-west) replicated Cassandra.
>
> Now we have to aggregate data hourly. The problem is that device X can
> report once on site A and once on site B. This means that some messages for
> that device, will be processed by Flink in site A and some messages will be
> processed on site B.
>
> I want an aggregation result that will reflect all messages transmitted by
> specific device X.
>
> Are there any best practices to handle multi-site ingestion?
>
> Any idea how to handle the scenario above?
>
> Thanks in advance.
>
>

-- 

Seth Wiesman | Solutions Architect

+1 314 387 1463

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen

Reply via email to