Re: Cross-platform discrepancies

Alexander Saydakov Tue, 14 Apr 2020 13:47:12 -0700

Gian,
Thanks for your reply. I did not mean to suggest moving away from off-heap
memory. As I see it, the problem mostly is with preallocating fixed blocks
for each sketch of some maximum size, which, on the one hand, is wasteful
because most sketches will be small, and on the other hand, requires
operating in a "contiguous block" mode, which is very limiting.
Perhaps you might consider something like PostgreSQL does: provide an
allocator, so that the memory allocation is controlled and can be done in a
context of a query or transaction, but it does not impose a single
contiguous block for a data structure. So a custom aggregation function
allocates memory for its own state on the first call, and PostgreSQL passes
this state to the next call as a pointer and so on. At any moment the
aggregation can choose to reallocate and return a new pointer. And that
state can be a complex data structure with pointers to other blocks
allocated using the same mechanism, so it does not have to be a single
contiguous block.


This will still require some changes to our library to support memory
allocation like this, but it seems to be less challenging then the current
Direct memory mode we have.

There are some trade offs here, as usually. For instance, currently (if
implemented) wrapping a chunk of bytes as a fast alternative to creating an
on-heap object during deserialization is the same operation as interpreting
this "contiguous block" aggregation state. But if the aggregation state is
fragmented, it is not going to be equivalent to a serialized blob anymore.
This is how it is in the current C++ implementation - there is no
equivalent of wrap() as in Direct mode in Java. This is a potential
performance improvement that can be discussed separately. Some sketches can
choose to operate in this contiguous block mode if it does not go against
the algorithm.

On Mon, Apr 13, 2020 at 10:43 PM Gian Merlino <[email protected]> wrote:

> Hi Alexander,
>
> It sounds like integrating with Druid is an important concern here. It
> might be nice to cross post the discussion to dev@druid.
>
> Personally, I don't think it's likely the Druid community will move away
> from requiring that aggregators be able to work with raw memory. The
> requirement exists so it can use allocate an arena for aggregation space,
> minimizing GC load. But there have been some proposals floating around for
> adding ability to dynamically allocate new memory (perhaps out of the same
> arena or perhaps some other way — the proposals varied). I think it would
> be useful for Druid devs to understand more deeply what effect the
> allocator API has on the DataSketches implementation.
>
> (By the way, the Druid issue you're referring to might be
> https://github.com/apache/druid/issues/8126.)
>
> On Mon, Apr 13, 2020 at 5:11 PM Alexander Saydakov
> <[email protected]> wrote:
>
>> Hi everyone!
>> I would like to discuss some discrepancies we accumulated as results of
>> some decisions around prioritizing our development efforts in Java and C++,
>> and also around integration with data processing systems.
>>
>> Druid is one of the most important systems in which our library is used.
>> It has a strong requirement to have off-heap memory support and also to
>> operate in a single contiguous block. This was the main motivation behind
>> what we call Direct memory support in our Java library.
>>
>> Currently some sketches do not support Direct memory, in particular: KLL
>> quantiles sketch, Frequent Items sketch, CPC distinct counting sketch.
>> Therefore they are not integrated with Druid yet.
>>
>> Quantiles sketch:
>> In Java we have ItemsSketch<T> for any type and a specialized numeric
>> DoublesSketch with Direct memory support - this is the one integrated with
>> Druid. We consider this algorithm replaced by KLL quantiles sketch, but for
>> historical reasons in Java we have implemented KllFloatsSketch only - no
>> generic implementation and no Direct memory support. This contiguous memory
>> block mode is so limiting that we did not think that KLL algorithm can be
>> implemented efficiently in that mode. We might need to reconsider this.
>> Development is C++ happened later, so we did not implement the older
>> algorithm. We may want to do it if a strong use case arises to support
>> reading data prepared using Java library. On the other hand, in C++
>> kll_sketch is a template, and therefore it can be a container of any user
>> type. In Java we felt the need to have both: a generic implementation and a
>> specialized implementation for some numeric type to avoid object overhead.
>>
>> To summarize the situation with quantiles sketches:
>> Java: KllFloatsSketch, older quantiles ItemsSketch<T> and DoublesSketch
>> C++: kll_sketch<T> (kll_sketch<float> is compatible with Java), no older
>> quantiles sketch.
>> Druid: quantiles DoublesSketch
>> PostgreSQL: kll_sketch<float>
>>
>> Regarding distinct counting sketches:
>> Druid: Theta, HLL
>> PostgreSQL: Theta, HLL, CPC
>>
>> Regarding frequent items:
>> Druid: none
>> PostgreSQL: frequent_items_sketch<std::string>
>>
>> It seems to me that we need to do something to improve the situation.
>> Possible tasks:
>> - Discuss alternatives to Direct memory mode in Druid with Druid
>> developers (again). They were not quite happy with the current approach
>> either. It leads to gross overallocation of BufferAggredator memory. There
>> was an open issue in Druid repo about this I believe.
>> - Find some way to have KLL sketch in Druid
>> - Find some way to have Frequent Items (frequent strings?) sketch in Druid
>> - Consider KllItemsSketch<T> in Java
>> - Consider legacy quantiles_sketch<T> in C++ (and then in PostgreSQL -
>> with double values, compatible with Druid)
>>
>> I would appreciate your thoughts about this.
>> Thank you.
>> Alexander Saydakov
>>
>

Re: Cross-platform discrepancies

Reply via email to