Re: Flink solution to active - active Multi site cloud data ingestion

Seth Wiesman Wed, 15 May 2019 10:07:03 -0700

Hi Gregory,

The easiest solution would be to include the site in your key so that at
query time the rows from each site can be aggregated together.


Instead of <Key, Value>, the table would be <Key, Site, Value> and your
query would become Select sum(value) FROM table GROUP BY key;

Otherwise, you will need to get all that data into a single site to perform
a final aggregation prior to writing to Cassandra.

On Wed, May 15, 2019 at 3:45 AM Melekh, Gregory <gregory.mel...@intl.att.com>
wrote:

> Hello Flink Experts.
>
>
>
> We have Flink job consuming data from Kafka and ingest it to multi-site
> (Azure-east – Azure-west) replicated Cassandra.
>
> Now we have to aggregate data hourly. The problem is that device X can
> report once on site A and once on site B. This means that some messages for
> that device, will be processed by Flink in site A and some messages will be
> processed on site B.
>
> I want an aggregation result that will reflect all messages transmitted by
> specific device X.
>
> Are there any best practices to handle multi-site ingestion?
>
> Any idea how to handle the scenario above?
>
> Thanks in advance.
>
>

-- 

Seth Wiesman | Solutions Architect

+1 314 387 1463

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen

Re: Flink solution to active - active Multi site cloud data ingestion

Reply via email to