This is interesting question. Full-scan size may be tremendously slow
operation on large data sets. On the other hand, printing total number of
tuples including old and aborted versions make little to no sense as well.
Looks like we need to choose lesser of two evils. What if we do the
following:
1) Left default behavior as is - O(1) complexity, but includes invalid
versions
2) As Sergey suggested, add new peek mode "MVCC_ALIVE_ONLY" which will
perform full scan.

Alternatively we may throw an "UnsupportedOperationException" from this
method - why not?

Thoughts?

On Tue, Apr 24, 2018 at 4:28 PM, Sergey Kalashnikov <zkilling...@gmail.com>
wrote:

> Hi Igniters,
>
> I need your advice on a task at hand.
>
> Currently cache API size() is a constant time operation, since the
> number of entries is maintained as a separate counter.
> However, for MVCC-enabled cache there can be multiple versions of the
> same entry.
> In order to calculate the size we need to obtain a MVCC snapshot and
> then iterate over data pages filtering invisible versions.
> So, it is impossible to keep the same complexity guarantees.
>
> My current implementation internally switches to "full-scan" approach
> if cache in question is a MVCC-enabled cache.
> It happens unbeknown to users, which may expect lightning-fast
> response as before.
> Perhaps, we might add a new constant to CachePeekMode enumeration that
> is passed to cache size() to make it explicit?
>
> The second concern is that cache size calculation is also included
> into Cache Metrics API and Visor functionality.
> Will it be OK for metrics and things alike to keep returning raw
> unfiltered number of entries?
> Is there any sense in showing raw unfiltered number of entries which
> may vary greatly from invokation to invokation with just simple
> updates running in background?
>
> Please share your thoughts.
>
> Thanks in advance.
> --
> Sergey
>

Reply via email to