On Oct 8, 2014, at 16:19, Radim Vansa <rva...@redhat.com> wrote: > Users expect that size() will be constant-time (or linear to cluster > size), and generally fast operation. I'd prefer to keep it that way. > Though, even the MR way (used for HotRod size() now) needs to crawl > through all the entries locally.
yes, but first of all they expect size to be correct, then fast. > > 'Heretic, not very well though of and changing too many things' idea: > what about having data container segment-aware? Then you'd just bcast > SizeCommand with given topologyId and sum up sizes of primary-owned > segments... It's not a complete solution, but at least that would enable > to get the number of locally owned entries quite fast. Though, you can't > do that easily with cache stores (without changing SPI). that would help and there were discussions to do this for other reasons as well: e.g. ST would migrate the data without iterating over the state in the DC. Not doable in the scope of ISPN 7.0, though. > > Regarding cache stores, IMO we're damned anyway: when calling > cacheStore.size(), it can report more entries as those haven't been > expired yet, it can report less entries as those can be expired due to > [1]. Or, we'll enumerate all the entries, and that's going to be slow > (btw., [1] reminded me that we should enumerate both datacontainer AND > cachestores even if passivation is not enabled). > > Radim > > [1] https://issues.jboss.org/browse/ISPN-3202 > > On 10/08/2014 04:42 PM, William Burns wrote: >> So it seems we would want to change this for 7.0 if possible since it >> would be a bigger change for something like 7.1 and 8.0 would be even >> further out. I should be able to put this together for CR2. >> >> It seems that we want to implement keySet, values and entrySet methods >> using the entry iterator approach. >> >> It is however unclear for the size method if we want to use MR entry >> counting and not worry about the rehash and passivation issues since >> it is just an estimation anyways. Or if we want to also use the entry >> iterator which should be closer approximation but will require more >> network overhead and memory usage. >> >> Also we didn't really talk about the fact that these methods would >> ignore ongoing transactions and if that is a concern or not. >> >> - Will >> >> On Wed, Oct 8, 2014 at 10:13 AM, Mircea Markus <mmar...@redhat.com> wrote: >>> On Oct 8, 2014, at 15:11, Dan Berindei <dan.berin...@gmail.com> wrote: >>> >>>> On Wed, Oct 8, 2014 at 5:03 PM, Mircea Markus <mmar...@redhat.com> wrote: >>>> On Oct 3, 2014, at 9:30, Radim Vansa <rva...@redhat.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> recently we had a discussion about what size() returns, but I've >>>>> realized there are more things that users would like to know. My >>>>> question is whether you think that they would really appreciate it, or >>>>> whether it's just my QA point of view where I sometimes compute the >>>>> 'checksums' of cache to see if I didn't lost anything. >>>>> >>>>> There are those sizes: >>>>> A) number of owned entries >>>>> B) number of entries stored locally in memory >>>>> C) number of entries stored in each local cache store >>>>> D) number of entries stored in each shared cache store >>>>> E) total number of entries in cache >>>>> >>>>> So far, we can get >>>>> B via withFlags(SKIP_CACHE_LOAD).size() >>>>> (passivation ? B : 0) + firstNonZero(C, D) via size() >>>>> E via distributed iterators / MR >>>>> A via data container iteration + distribution manager query, but only >>>>> without cache store >>>>> C or D through >>>>> getComponentRegistry().getLocalComponent(PersistenceManager.class).getStores() >>>>> >>>>> I think that it would go along with users' expectations if size() >>>>> returned E and for the rest we should have special methods on >>>>> AdvancedCache. That would of course change the meaning of size(), but >>>>> I'd say that finally to something that has firm meaning. >>>>> >>>>> WDYT? >>>> There was a lot of arguments in past whether size() and other methods that >>>> operate over all the elements (keySet, values) are useful because: >>>> - they are approximate (data changes during iteration) >>>> - they are very resource consuming and might be miss-used (this is the >>>> reason we chosen to use size() with its current local semantic) >>>> >>>> These methods (size, keys, values) are useful for people and I think we >>>> were not wise to implement them only on top of the local data: this is >>>> like preferring efficiency over correctness. This also created a lot of >>>> confusion with our users, question like size() doesn't return the correct >>>> value being asked regularly. I totally agree that size() returns E (i.e. >>>> everything that is stored within the grid, including persistence) and it's >>>> performance implications to be documented accordingly. For keySet and >>>> values - we should stop implementing them (throw exception) and point >>>> users to Will's distributed iterator which is a nicer way to achieve the >>>> desired behavior. >>>> >>>> We can also implement keySet() and values() on top of the distributed >>>> entry iterator and document that using the iterator directly is better. >>> Yes, that's what I meant as well. >>> >>> Cheers, >>> -- >>> Mircea Markus >>> Infinispan lead (www.infinispan.org) >>> >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev@lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa <rva...@redhat.com> > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev