Hello Team,

Here is the snapshot of the existing application:

TechStack: Postgres DB, Hive, Tableau UI
Postgres Plugin: DataSketches

Flow in brief:

  *   Hadoop Data pipeline job pushes pre-aggregated(using hive datasketches 
algo) active card data, along with other details to Hive.
  *   Another job populates that data to Postgres DB, finally having 3 years 
data of 4 regions for multiple countries.
  *   Tableau dashboard having live connection to Postgres DB.
  *   Tableau Query calling Postgres DB, to aggregate the binary/pre-aggregated 
data to get distinct card count (using DataSketches algorithm) and fetch data 
based on multiple filter conditions.
  *   Usually data would be of 3yrs for the span of 2 months, means total 6 
months of data to aggregate for a country on multiple conditions.

Usually this aggregation query response is quite slow. We have tried lot of 
different ways to resolve this,

Mainly datasketches part is making most of the time in execution.

Thanks & Regards,
Rima Bhowmick
Marketing Brand Analytics
[Logo  Description automatically generated]

Reply via email to