Hi Alexander,
We are trying to upgrade datasketches-postgresSQL extension into one of our
linux servers.
We are getting this error:
sr/include/libxml2 -c -o src/aod_sketch_c_adapter.o
src/aod_sketch_c_adapter.cpp
In file included from
boost/boost/math/special_functions/detail/round_fwd.hpp:11:0,
from boost/boost/math/special_functions/math_fwd.hpp:29,
from boost/boost/math/special_functions/beta.hpp:13,
from boost/boost/math/distributions/students_t.hpp:16,
from src/aod_sketch_c_adapter.cpp:34:
boost/boost/math/tools/config.hpp:23:6: warning: #warning "The minimum language
standard to use Boost.Math will be C++14 starting in July 2023 (Boost 1.82
release)" [-Wcpp]
# warning "The minimum language standard to use Boost.Math will be C++14
starting in July 2023 (Boost 1.82 release)"
^
In file included from boost/boost/math/tools/fraction.hpp:14:0,
from boost/boost/math/special_functions/gamma.hpp:18,
from boost/boost/math/special_functions/beta.hpp:15,
from boost/boost/math/distributions/students_t.hpp:16,
from src/aod_sketch_c_adapter.cpp:34:
boost/boost/math/tools/complex.hpp: In instantiation of ‘struct
boost::math::tools::integer_scalar_type<long double, true>’:
boost/boost/math/tools/fraction.hpp:115:72: required from ‘typename
boost::math::tools::detail::fraction_traits<Gen>::result_type
boost::math::tools::continued_fraction_b(Gen&, const U&, uintmax_t&) [with Gen
= boost::math::detail::ibeta_fraction2_t<long double>; U = long double;
typename boost::math::tools::detail::fraction_traits<Gen>::result_type = long
double; uintmax_t = long unsigned int]’
boost/boost/math/tools/fraction.hpp:156:52: required from ‘typename
boost::math::tools::detail::fraction_traits<Gen>::result_type
boost::math::tools::continued_fraction_b(Gen&, const U&) [with Gen =
boost::math::detail::ibeta_fraction2_t<long double>; U = long double; typename
boost::math::tools::detail::fraction_traits<Gen>::result_type = long double]’
We are using boost_1_80_0 version, please let us know if you have any clue what
could be wrong?
Thanks,
Rima Bhowmick.
From: Alexander Saydakov <[email protected]>
Reply to: "[email protected]" <[email protected]>
Date: Friday, 19 May 2023 at 1:54 AM
To: "[email protected]" <[email protected]>
Subject: Re: [E] Postgres HLL is very slow
Yes, version 1.6.0 does depend on Boost. There is no need to install it. Just
download, unpack and make a link so that the "make" can find it.
I am afraid I don't quite understand your question. I would suggest following
the Readme and asking specific questions about what is not clear or what goes
wrong.
On Wed, May 17, 2023 at 6:32 AM Bhowmick, Rima <[email protected]>
wrote:
Thanks for making the dataSketches1.6 version live, it will help us a lot.
Today we downloaded the package
PGXN<https://urldefense.com/v3/__https://pgxn.org/dist/datasketches/__;!!Op6eflyXZCqGR5I!Hsl5aj72x0KpYPkziCaV1WKj1Qw2olWIRzEkhx95orMBJyA1UDGGwiAhuCU8KfPK9EUNX_x2AK_b_-z0gOYuCuQJ$>
website, is it mandatory to install the Boost package too?
While installing 1.3 version of Postgres dataSketches plugin earlier, we didn’t
use Boost then.
Also to install are the below steps are sufficient as mentioned in
documentation?
Building and installing
* make
* sudo make install
Thanks in advance!
Regards,
Rima Bhowmick.
From: Alexander Saydakov <[email protected]>
Reply to: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Thursday, 27 April 2023 at 1:25 AM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: [E] Postgres HLL is very slow
The changes in question have been merged to the master branch.
We have just started the release process for datasketches-cpp (version 4.1.0).
Once this is done, we will start the release process for datasketches-postgress
1.6.0. In the meantime you may want to try the latest code with the latest
datasketches-cpp from the master branch.
On Wed, Apr 19, 2023 at 12:58 AM Jon Malkin
<[email protected]<mailto:[email protected]>> wrote:
As noted in the linked issue, the postgresql 1.5 package is compatible with the
cpp 3.x line, not 4.x. It should work fine with the last datasketches-cpp 3.x
release.
In the meantime, as noted, we are actively trying to work on speed improvements
for HLL as requested at the start of this thread.
Additionally, one thing that can help speed releases is to vote whenever
there's a vote announcement -- even a non-binding vote is valuable!
jon
On Wed, Apr 19, 2023, 12:13 AM Bhowmick, Rima <[email protected]> wrote:
Hello All,
We are trying to install new version of datasketches in our postgres instance.
I have downloaded datasketches-postgresql 1.5.0
(apache-datasketches-postgresql-1.5.0-src.zip), datasketches-cpp 4.0.1
(apache-datasketches-cpp-4.0.1-src.zip) from apache website and boost 1.81.0. I
have followed the same steps as mentioned in the readme file. While executing
the make command, I faced an error:
g++ -Wall -Wpointer-arith -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv -O2 -std=c++11 -fPIC -fPIC
-I/usr/local/include -Iboost -Idatasketches-cpp/common/include
-Idatasketches-cpp/kll/include -Idatasketches-cpp/cpc/include
-Idatasketches-cpp/theta/include -Idatasketches-cpp/fi/include
-Idatasketches-cpp/hll/include -Idatasketches-cpp/tuple/include
-Idatasketches-cpp/req/include -I. -I./
-I/pgbin/mbi1d/12.x/include/postgresql/server
-I/pgbin/mbi1d/12.x/include/postgresql/internal -D_GNU_SOURCE
-I/pgbin/mbi1d/12.x//include/libxml2 -c -o src/kll_float_sketch_c_adapter.o
src/kll_float_sketch_c_adapter.cpp
src/kll_float_sketch_c_adapter.cpp:26:109: error: wrong number of template
arguments (4, should be 3)
typedef datasketches::kll_sketch<float, std::less<float>,
datasketches::serde<float>, palloc_allocator<float>> kll_float_sketch;
^
In file included from src/kll_float_sketch_c_adapter.cpp:24:0:
datasketches-cpp/kll/include/kll_sketch.hpp:158:7: error: provided for
‘template<class T, class C, class A> class datasketches::kll_sketch’
class kll_sketch {
Looks like there is a mismatch of arguments in kll_float_sketch_c_adapter.cpp
and kll_sketch.hpp.
Could you please suggest a solution. Thank you!
https://github.com/apache/datasketches-postgresql/issues/62<https://urldefense.com/v3/__https://github.com/apache/datasketches-postgresql/issues/62__;!!Op6eflyXZCqGR5I!AXYYf_BpeznMsFEbt8pJ4V5PV7QlzoTCJBji7ph7ERc1GUSjX1JBNUm6yS8ThWoqZNtMlh5R5l4DZo9-Lw$>
Datasketches Distinct count postgres extension algorithm is used in our
applications to get very prominent business value, therefor if we cannot
upgrade the versions, it would be a bigg loss for us.
Could you please guide us what could be the best approach to overcome this?
Thanks,
Rima Bhowmick.
From: Alexander Saydakov <[email protected]>
Reply to: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Saturday, 15 April 2023 at 12:05 AM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: [E] Postgres HLL is very slow
I am not sure about the date. I think the development should take a few days. A
formal Apache release will take substantially more time just to go through the
required steps of voting for the core library release (not really necessary for
the parallel execution, but necessary to bring the latest speed improvements
into PostgreSQL extension), and then going through the same procedure to
release the extension.
Of course, you don't have to wait for the formal release to start testing.
Could you clarify your issues building the latest version please? I believe
that the datasketches-postgresql code in the master branch is compatible with
the latest datasketches-cpp code.
On Fri, Apr 14, 2023 at 11:22 AM Bhowmick, Rima <[email protected]>
wrote:
Hello Alexander,
Do you have any date in mind, for releasing the same to have parallel execution?
Also we tried upgrading datasketches version from latest documentation, we are
getting lot of C++ version issues.
Its very tough to install the new version. Any thoughts?
Thanks,
Rima Bhowmick.
From: Alexander Saydakov <[email protected]>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Friday, 14 April 2023 at 10:58 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: [E] Postgres HLL is very slow
Hi Rima,
I am working on the datasketches extension to support parallel queries
(distributed aggregation).
I expect to get this done in a matter of days.
Also we have just made some improvements to HLL merge speed in the core
library. These changes were not released yet, but available in the master
branch.
We have another HLL performance improvement in mind. I will work on it once I
finish the parallel query support.
On Fri, Apr 14, 2023 at 3:33 AM Bhowmick, Rima <[email protected]>
wrote:
Hello Team,
Here is the snapshot of the existing application:
TechStack: Postgres DB, Hive, Tableau UI
Postgres Plugin: DataSketches
Flow in brief:
* Hadoop Data pipeline job pushes pre-aggregated(using hive datasketches
algo) active card data, along with other details to Hive.
* Another job populates that data to Postgres DB, finally having 3 years
data of 4 regions for multiple countries.
* Tableau dashboard having live connection to Postgres DB.
* Tableau Query calling Postgres DB, to aggregate the binary/pre-aggregated
data to get distinct card count (using DataSketches algorithm) and fetch data
based on multiple filter conditions.
* Usually data would be of 3yrs for the span of 2 months, means total 6
months of data to aggregate for a country on multiple conditions.
Usually this aggregation query response is quite slow. We have tried lot of
different ways to resolve this,
Mainly datasketches part is making most of the time in execution.
Thanks & Regards,
Rima Bhowmick
Marketing Brand Analytics
Error! Filename not specified.