Thanks for making the dataSketches1.6 version live, it will help us a lot.
Today we downloaded the package PGXN<https://pgxn.org/dist/datasketches/> 
website, is it mandatory to install the Boost package too?
While installing 1.3 version of Postgres dataSketches plugin earlier, we didn’t 
use Boost then.

Also to install are the below steps are sufficient as mentioned in 
documentation?
Building and installing

  *   make
  *   sudo make install
Thanks in advance!

Regards,
Rima Bhowmick.

From: Alexander Saydakov <[email protected]>
Reply to: "[email protected]" <[email protected]>
Date: Thursday, 27 April 2023 at 1:25 AM
To: "[email protected]" <[email protected]>
Subject: Re: [E] Postgres HLL is very slow

The changes in question have been merged to the master branch.
We have just started the release process for datasketches-cpp (version 4.1.0). 
Once this is done, we will start the release process for datasketches-postgress 
1.6.0. In the meantime you may want to try the latest code with the latest 
datasketches-cpp from the master branch.

On Wed, Apr 19, 2023 at 12:58 AM Jon Malkin 
<[email protected]<mailto:[email protected]>> wrote:
As noted in the linked issue, the postgresql 1.5 package is compatible with the 
cpp 3.x line, not 4.x. It should work fine with the last datasketches-cpp 3.x 
release.

In the meantime, as noted, we are actively trying to work on speed improvements 
for HLL as requested at the start of this thread.

Additionally, one thing that can help speed releases is to vote whenever 
there's a vote announcement -- even a non-binding vote is valuable!

  jon

On Wed, Apr 19, 2023, 12:13 AM Bhowmick, Rima <[email protected]> wrote:

Hello All,

We are trying to install new version of datasketches in our postgres instance. 
I have downloaded datasketches-postgresql 1.5.0 
(apache-datasketches-postgresql-1.5.0-src.zip), datasketches-cpp 4.0.1 
(apache-datasketches-cpp-4.0.1-src.zip) from apache website and boost 1.81.0. I 
have followed the same steps as mentioned in the readme file. While executing 
the make command, I faced an error:

g++ -Wall -Wpointer-arith -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -O2 -std=c++11 -fPIC -fPIC 
-I/usr/local/include -Iboost -Idatasketches-cpp/common/include 
-Idatasketches-cpp/kll/include -Idatasketches-cpp/cpc/include 
-Idatasketches-cpp/theta/include -Idatasketches-cpp/fi/include 
-Idatasketches-cpp/hll/include -Idatasketches-cpp/tuple/include 
-Idatasketches-cpp/req/include -I. -I./ 
-I/pgbin/mbi1d/12.x/include/postgresql/server 
-I/pgbin/mbi1d/12.x/include/postgresql/internal  -D_GNU_SOURCE 
-I/pgbin/mbi1d/12.x//include/libxml2   -c -o src/kll_float_sketch_c_adapter.o 
src/kll_float_sketch_c_adapter.cpp
src/kll_float_sketch_c_adapter.cpp:26:109: error: wrong number of template 
arguments (4, should be 3)
typedef datasketches::kll_sketch<float, std::less<float>, 
datasketches::serde<float>, palloc_allocator<float>> kll_float_sketch;
                                                                                
                             ^
In file included from src/kll_float_sketch_c_adapter.cpp:24:0:
datasketches-cpp/kll/include/kll_sketch.hpp:158:7: error: provided for 
‘template<class T, class C, class A> class datasketches::kll_sketch’
class kll_sketch {

Looks like there is a mismatch of arguments in kll_float_sketch_c_adapter.cpp 
and kll_sketch.hpp.
Could you please suggest a solution. Thank you!

https://github.com/apache/datasketches-postgresql/issues/62<https://urldefense.com/v3/__https://github.com/apache/datasketches-postgresql/issues/62__;!!Op6eflyXZCqGR5I!AXYYf_BpeznMsFEbt8pJ4V5PV7QlzoTCJBji7ph7ERc1GUSjX1JBNUm6yS8ThWoqZNtMlh5R5l4DZo9-Lw$>
Datasketches Distinct count postgres extension algorithm is used in our 
applications to get very prominent business value, therefor if we cannot 
upgrade the versions, it would be a bigg loss for us.
Could you please guide us what could be the best approach to overcome this?

Thanks,
Rima Bhowmick.

From: Alexander Saydakov <[email protected]>
Reply to: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Saturday, 15 April 2023 at 12:05 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [E] Postgres HLL is very slow

I am not sure about the date. I think the development should take a few days. A 
formal Apache release will take substantially more time just to go through the 
required steps of voting for the core library release (not really necessary for 
the parallel execution, but necessary to bring the latest speed improvements 
into PostgreSQL extension), and then going through the same procedure to 
release the extension.
Of course, you don't have to wait for the formal release to start testing.
Could you clarify your issues building the latest version please? I believe 
that the datasketches-postgresql code in the master branch is compatible with 
the latest datasketches-cpp code.

On Fri, Apr 14, 2023 at 11:22 AM Bhowmick, Rima <[email protected]> 
wrote:
Hello Alexander,

Do you have any date in mind, for releasing the same to have parallel execution?
Also we tried upgrading datasketches version from latest documentation, we are 
getting lot of C++ version issues.
Its very tough to install the new version. Any thoughts?

Thanks,
Rima Bhowmick.

From: Alexander Saydakov <[email protected]>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Friday, 14 April 2023 at 10:58 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [E] Postgres HLL is very slow

Hi Rima,
I am working on the datasketches extension to support parallel queries 
(distributed aggregation).
I expect to get this done in a matter of days.
Also we have just made some improvements to HLL merge speed in the core 
library. These changes were not released yet, but available in the master 
branch.
We have another HLL performance improvement in mind. I will work on it once I 
finish the parallel query support.


On Fri, Apr 14, 2023 at 3:33 AM Bhowmick, Rima <[email protected]> 
wrote:
Hello Team,

Here is the snapshot of the existing application:

TechStack: Postgres DB, Hive, Tableau UI
Postgres Plugin: DataSketches

Flow in brief:

  *   Hadoop Data pipeline job pushes pre-aggregated(using hive datasketches 
algo) active card data, along with other details to Hive.
  *   Another job populates that data to Postgres DB, finally having 3 years 
data of 4 regions for multiple countries.
  *   Tableau dashboard having live connection to Postgres DB.
  *   Tableau Query calling Postgres DB, to aggregate the binary/pre-aggregated 
data to get distinct card count (using DataSketches algorithm) and fetch data 
based on multiple filter conditions.
  *   Usually data would be of 3yrs for the span of 2 months, means total 6 
months of data to aggregate for a country on multiple conditions.

Usually this aggregation query response is quite slow. We have tried lot of 
different ways to resolve this,

Mainly datasketches part is making most of the time in execution.

Thanks & Regards,
Rima Bhowmick
Marketing Brand Analytics
Error! Filename not specified.

Reply via email to