I would suggest adding it to the existing package(s) (either
sdks/java/extensions or sdks/java/zetasketch or both depending on if you're
replacing existing sketches or adding new ones) since we shouldn't expose
sketching libraries API surface. We should make the API take all the
relevant parameters since this allows us to move between variants and
choose the best sketching library.

On Wed, Jan 18, 2023 at 11:24 AM Reuven Lax via dev <dev@beam.apache.org>
wrote:

> I believe that when zetasketch was added, it was also noticeably more
> efficient than other sketch implementations. However this was a number of
> years ago, and I don't know whether it still has an advantage or not.
>
> On Wed, Jan 18, 2023 at 10:41 AM Byron Ellis via dev <dev@beam.apache.org>
> wrote:
>
>> Hi everyone,
>>
>> I was looking at adding at least a couple of the sketches from the Apache
>> Datasketches library to the Beam Java SDK and I was wondering if folks had
>> a preference for adding to the existing "sketching" extension vs splitting
>> it out into its own extension?
>>
>> The reason I ask is that there's some overlap (which already exists in
>> zetasketch) between the sketches available in Datasketches vs Beam today,
>> particularly HyperLogLog which would have 3 implementations if we were to
>> add all of them.
>>
>> I don't really have a strong opinion, though personally I'd probably lean
>> towards a single sketching extension (zetasketch being something of a
>> special case as it exists for format compatibility as far as I can tell).
>> But I could see how that could be confusing if you had the Apache
>> Datasketch implementation and the existing implementation derived from the
>> clearspring implementations.
>>
>> Any thoughts?
>>
>> Best,
>> B
>>
>

Reply via email to