Thanks for making the dataSketches1.6 version live, it will help us a lot. Today we downloaded the package PGXN<https://pgxn.org/dist/datasketches/> website, is it mandatory to install the Boost package too? While installing 1.3 version of Postgres dataSketches plugin earlier, we didn’t use Boost then.
Also to install are the below steps are sufficient as mentioned in documentation? Building and installing * make * sudo make install Thanks in advance! Regards, Rima Bhowmick. From: Alexander Saydakov <[email protected]> Reply to: "[email protected]" <[email protected]> Date: Thursday, 27 April 2023 at 1:25 AM To: "[email protected]" <[email protected]> Subject: Re: [E] Postgres HLL is very slow The changes in question have been merged to the master branch. We have just started the release process for datasketches-cpp (version 4.1.0). Once this is done, we will start the release process for datasketches-postgress 1.6.0. In the meantime you may want to try the latest code with the latest datasketches-cpp from the master branch. On Wed, Apr 19, 2023 at 12:58 AM Jon Malkin <[email protected]<mailto:[email protected]>> wrote: As noted in the linked issue, the postgresql 1.5 package is compatible with the cpp 3.x line, not 4.x. It should work fine with the last datasketches-cpp 3.x release. In the meantime, as noted, we are actively trying to work on speed improvements for HLL as requested at the start of this thread. Additionally, one thing that can help speed releases is to vote whenever there's a vote announcement -- even a non-binding vote is valuable! jon On Wed, Apr 19, 2023, 12:13 AM Bhowmick, Rima <[email protected]> wrote: Hello All, We are trying to install new version of datasketches in our postgres instance. I have downloaded datasketches-postgresql 1.5.0 (apache-datasketches-postgresql-1.5.0-src.zip), datasketches-cpp 4.0.1 (apache-datasketches-cpp-4.0.1-src.zip) from apache website and boost 1.81.0. I have followed the same steps as mentioned in the readme file. While executing the make command, I faced an error: g++ -Wall -Wpointer-arith -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -O2 -std=c++11 -fPIC -fPIC -I/usr/local/include -Iboost -Idatasketches-cpp/common/include -Idatasketches-cpp/kll/include -Idatasketches-cpp/cpc/include -Idatasketches-cpp/theta/include -Idatasketches-cpp/fi/include -Idatasketches-cpp/hll/include -Idatasketches-cpp/tuple/include -Idatasketches-cpp/req/include -I. -I./ -I/pgbin/mbi1d/12.x/include/postgresql/server -I/pgbin/mbi1d/12.x/include/postgresql/internal -D_GNU_SOURCE -I/pgbin/mbi1d/12.x//include/libxml2 -c -o src/kll_float_sketch_c_adapter.o src/kll_float_sketch_c_adapter.cpp src/kll_float_sketch_c_adapter.cpp:26:109: error: wrong number of template arguments (4, should be 3) typedef datasketches::kll_sketch<float, std::less<float>, datasketches::serde<float>, palloc_allocator<float>> kll_float_sketch; ^ In file included from src/kll_float_sketch_c_adapter.cpp:24:0: datasketches-cpp/kll/include/kll_sketch.hpp:158:7: error: provided for ‘template<class T, class C, class A> class datasketches::kll_sketch’ class kll_sketch { Looks like there is a mismatch of arguments in kll_float_sketch_c_adapter.cpp and kll_sketch.hpp. Could you please suggest a solution. Thank you! https://github.com/apache/datasketches-postgresql/issues/62<https://urldefense.com/v3/__https://github.com/apache/datasketches-postgresql/issues/62__;!!Op6eflyXZCqGR5I!AXYYf_BpeznMsFEbt8pJ4V5PV7QlzoTCJBji7ph7ERc1GUSjX1JBNUm6yS8ThWoqZNtMlh5R5l4DZo9-Lw$> Datasketches Distinct count postgres extension algorithm is used in our applications to get very prominent business value, therefor if we cannot upgrade the versions, it would be a bigg loss for us. Could you please guide us what could be the best approach to overcome this? Thanks, Rima Bhowmick. From: Alexander Saydakov <[email protected]> Reply to: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Saturday, 15 April 2023 at 12:05 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: [E] Postgres HLL is very slow I am not sure about the date. I think the development should take a few days. A formal Apache release will take substantially more time just to go through the required steps of voting for the core library release (not really necessary for the parallel execution, but necessary to bring the latest speed improvements into PostgreSQL extension), and then going through the same procedure to release the extension. Of course, you don't have to wait for the formal release to start testing. Could you clarify your issues building the latest version please? I believe that the datasketches-postgresql code in the master branch is compatible with the latest datasketches-cpp code. On Fri, Apr 14, 2023 at 11:22 AM Bhowmick, Rima <[email protected]> wrote: Hello Alexander, Do you have any date in mind, for releasing the same to have parallel execution? Also we tried upgrading datasketches version from latest documentation, we are getting lot of C++ version issues. Its very tough to install the new version. Any thoughts? Thanks, Rima Bhowmick. From: Alexander Saydakov <[email protected]> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Friday, 14 April 2023 at 10:58 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: [E] Postgres HLL is very slow Hi Rima, I am working on the datasketches extension to support parallel queries (distributed aggregation). I expect to get this done in a matter of days. Also we have just made some improvements to HLL merge speed in the core library. These changes were not released yet, but available in the master branch. We have another HLL performance improvement in mind. I will work on it once I finish the parallel query support. On Fri, Apr 14, 2023 at 3:33 AM Bhowmick, Rima <[email protected]> wrote: Hello Team, Here is the snapshot of the existing application: TechStack: Postgres DB, Hive, Tableau UI Postgres Plugin: DataSketches Flow in brief: * Hadoop Data pipeline job pushes pre-aggregated(using hive datasketches algo) active card data, along with other details to Hive. * Another job populates that data to Postgres DB, finally having 3 years data of 4 regions for multiple countries. * Tableau dashboard having live connection to Postgres DB. * Tableau Query calling Postgres DB, to aggregate the binary/pre-aggregated data to get distinct card count (using DataSketches algorithm) and fetch data based on multiple filter conditions. * Usually data would be of 3yrs for the span of 2 months, means total 6 months of data to aggregate for a country on multiple conditions. Usually this aggregation query response is quite slow. We have tried lot of different ways to resolve this, Mainly datasketches part is making most of the time in execution. Thanks & Regards, Rima Bhowmick Marketing Brand Analytics Error! Filename not specified.
