Hello Štefan,

We did a major study and comparison
<https://datasketches.apache.org/docs/HLL/Hll_vs_CS_Hllpp.html> of the
DataSketches HLL sketch to the Clearspring implementation of the HLL++
sketch back in 2017 and found that the Clearspring sketch had serious error
problems, did not implement the Google HLL++ paper correctly, and is slow.

To answer your question as to whether any of your CS HLL sketch data can be
recovered, I would say no. And even if it could be recovered, with the
serious problems of the CS implementation, I wouldn't trust it.


On Tue, Mar 11, 2025 at 6:24 AM Štefan Miklošovič <smikloso...@apache.org>
wrote:

> Hello Datasketches community,
>
> I am from Apache Cassandra where we use Clearspring (1) for estimating the
> cardinalities for rows in Cassandra's SSTables. We serialize the whole
> HyperLogLog from (1) (more or less) to the disk and then we deserialize it
> back and we merge all logs together to know the final result across the
> whole data.
>
> (1) is, as you probably know, archived / not actively contributed anymore.
> Hence, we are looking for replacements.
>
> Datasketches are quite an obvious choice but I would like to know some
> answers to the questions before the transition.
>
> We need to work with old data as well. If there is an SSTable on a disk
> with HLL from Clearspring, then we can not merge this to Datasketches,
> right? In other words, this is not possible:
>
>     @Test
>     public void testMerging() throws Throwable
>     {
>         // wrapper around Clearspring
>         LegacyCardinality clearspringCardinality = new
> LegacyCardinality(new HyperLogLogPlus(13, 25));
>         clearspringCardinality.offerHashed(12345);
>
>         // wrapper around Datasketches HLL
>         DefaultCardinality datasketchesCardinality = new
> DefaultCardinality();
>         datasketchesCardinality.offerHashed(23456);
>
>         // this fails, as well as similar variations of that
>         clearspringCardinality.merge(new
> LegacyCardinality(HyperLogLogPlus.Builder.build(datasketchesCardinality.getBytes())).getCardinality());
>     }
>
> It would be great if you confirmed (or denied) that there is no way to
> merge these two together. How would you go around this problem in general?
> If they are not mergeable, then we would need to find another way to deal
> with this but that is another story.
>
> I see that there is (2) which is a great in-depth description of
> differences between two but there is no information to my knowledge which
> would say if one is convertible to another.
>
> Thank you and regards
>
> Stefan Miklosovic
>
> (1) https://github.com/addthis/stream-lib/tree/master
> (2) https://datasketches.apache.org/docs/HLL/Hll_vs_CS_Hllpp.html
>

Reply via email to