Re: [E] Postgres HLL is very slow

Alexander Saydakov Fri, 14 Apr 2023 11:35:24 -0700

I am not sure about the date. I think the development should take a few
days. A formal Apache release will take substantially more time just to go
through the required steps of voting for the core library release (not
really necessary for the parallel execution, but necessary to bring the
latest speed improvements into PostgreSQL extension), and then going
through the same procedure to release the extension.
Of course, you don't have to wait for the formal release to start testing.
Could you clarify your issues building the latest version please? I believe
that the datasketches-postgresql code in the master branch is compatible
with the latest datasketches-cpp code.


On Fri, Apr 14, 2023 at 11:22 AM Bhowmick, Rima <[email protected]>
wrote:

> Hello Alexander,
>
>
>
> Do you have any date in mind, for releasing the same to have parallel
> execution?
>
> Also we tried upgrading datasketches version from latest documentation, we
> are getting lot of C++ version issues.
>
> Its very tough to install the new version. Any thoughts?
>
>
>
> Thanks,
>
> Rima Bhowmick.
>
>
>
> *From: *Alexander Saydakov <[email protected]>
> *Reply-To: *"[email protected]" <[email protected]>
> *Date: *Friday, 14 April 2023 at 10:58 PM
> *To: *"[email protected]" <[email protected]>
> *Subject: *Re: [E] Postgres HLL is very slow
>
>
>
> Hi Rima,
>
> I am working on the datasketches extension to support parallel queries
> (distributed aggregation).
>
> I expect to get this done in a matter of days.
>
> Also we have just made some improvements to HLL merge speed in the core
> library. These changes were not released yet, but available in the master
> branch.
>
> We have another HLL performance improvement in mind. I will work on it
> once I finish the parallel query support.
>
>
>
>
>
> On Fri, Apr 14, 2023 at 3:33 AM Bhowmick, Rima <[email protected]>
> wrote:
>
> Hello Team,
>
>
>
> Here is the snapshot of the existing application:
>
>
>
> TechStack: Postgres DB, Hive, Tableau UI
>
> Postgres Plugin: DataSketches
>
>
>
> Flow in brief:
>
>    - Hadoop Data pipeline job pushes pre-aggregated(using hive
>    datasketches algo) active card data, along with other details to Hive.
>    - Another job populates that data to Postgres DB, finally having 3
>    years data of 4 regions for multiple countries.
>    - Tableau dashboard having live connection to Postgres DB.
>    - Tableau Query calling Postgres DB, to aggregate the
>    binary/pre-aggregated data to get distinct card count (using DataSketches
>    algorithm) and fetch data based on multiple filter conditions.
>    - Usually data would be of 3yrs for the span of 2 months, means total
>    6 months of data to aggregate for a country on multiple conditions.
>
>
>
> Usually this aggregation query response is quite slow. We have tried lot
> of different ways to resolve this,
>
>
>
> Mainly datasketches part is making most of the time in execution.
>
>
>
> Thanks & Regards,
>
> Rima Bhowmick
>
> Marketing Brand Analytics
>
> [image: Logo Description automatically generated]
>
>

Re: [E] Postgres HLL is very slow

Reply via email to