[kudu-CR] WIP [util] introduce TTL cache

Alexey Serbin (Code Review) Wed, 27 Mar 2019 00:02:24 -0700

Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12825 )


Change subject: WIP [util] introduce TTL cache
......................................................................


Patch Set 3:

(29 comments)

http://gerrit.cloudera.org:8080/#/c/12825/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12825/3//COMMIT_MSG@13
PS3, Line 13: lazily purged
> You mentioned this offline, that this was fine because we expect the user o
I think that makes sense, but I think it's better to do that in a separate 
changelist.


http://gerrit.cloudera.org:8080/#/c/12825/3//COMMIT_MSG@12
PS3, Line 12: Expired entries are not returned from the cache, but kept around 
until
            : they are lazily purged upon cache reaching its capacity while
            : accommodating more entries.
> Curious if you want to introduce a singleton thread that wakes up periodica
Yep, I'm planning to do that in a separate changelist.


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache-test.cc
File src/kudu/util/ttl_cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache-test.cc@55
PS3, Line 55: using TTLTestCache = TTLCache<K, TestValue>;
> Seems like you could hardcode string as the key too, given how you're testi
Done


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache-test.cc@62
PS3, Line 62:   FLAGS_cache_force_single_shard = true;
            :   FLAGS_cache_memtracker_approximation_ratio = 0;
> Doc why you set these.
Done


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache-test.cc@68
PS3, Line 68:     auto h = cache.Get("key0");
            :     ASSERT_EQ(nullptr, h.get());
> Could combine negative lookups onto one line.
Done


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache-test.cc@118
PS3, Line 118:   ASSERT_NE(nullptr, put_h.get());
             :   ASSERT_NE(nullptr, put_h->ptr());
             :   ASSERT_EQ(100, put_h->ptr()->value);
> This seems to be a common pattern; maybe encapsulate in a helper method?
Done


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache-test.cc@155
PS3, Line 155:     auto h = cache.Get(key);
             :     ASSERT_NE(nullptr, h.get());
             :     ASSERT_NE(nullptr, h->ptr());
             :     ASSERT_EQ(i, h->ptr()->value);
> Any concern here (or on L170) that the test will run sufficiently slow that
Yes, I had that concern, but after running it multiple with 128 threads on 
ve0518 I didn't see any flakes.  If any appears, we can change the cache's TTL.


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache-test.cc@201
PS3, Line 201:     ASSERT_NE(nullptr, h.get());
> Unnecessarily duplicated
Done


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h
File src/kudu/util/ttl_cache.h:

http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@40
PS3, Line 40: // A TTL cache. It's a cache of limited capacity with FIFO 
eviction policy.
            : // All cache's entries have the same TTL. The TTL for cache's 
entries is
            : // specified at the creation time of the cache and does not 
change during its
            : // lifetime.
> Maybe reword as: "Cache with TTL-based time eviction and FIFO-based capacit
Done


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@59
PS3, Line 59:     V* ptr() const {
            :       return ptr_;
            :     }
> Why offer this at all? Is mutating a cached key's value an important use ca
As of now we don't need this, nope.  Removed.


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@91
PS3, Line 91:   // TODO(aserbin): maybe, prohibit inheriting from this class 
instead?
> I don't think we need to be super protective on these matters; I think it's
Done


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@93
PS3, Line 93:
            :   // Retrieve an entry from the cache and return it wrapped into 
a ref-counting
            :   // handle. If a non-expired entry exists for the specified key, 
this method
            :   // returns shared pointer for non-null handle. If there is no 
entry for the
            :   // specified key or the entry has expired at the time of the 
query,
            :   // nullptr wrapped into a ref-counted handle is returned.
> This (and Put()) is a little too much detail. The general "retrieves...if a
Done


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@99
PS3, Line 99:   std::shared_ptr<EntryHandle> Get(const K& key) {
> Why do Get()/Put() wrap EntryHandle in shared_ptr? The TTLCache itself does
I decided to use shared_ptr here because of the requirements: 1) handles should 
support reference counting 2) once handle is returned, it's easy to share it, 
if necessary, between multiple actors.

Yes, multiple Get() calls for the same key return "unrelated" shared pointers.  
I don't think it's an issue.

What would you return from here if not shared_ptr if not building some 
home-grown wrapper?  Returning unique_ptr would make it harder to use: e.g., I 
don't think using Cache itself with its UniqueHandle is handy.  However, if we 
want this TTL cache to follow the suite, maybe that's an option.  What do you 
think?


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@114
PS3, Line 114:     // TODO(aserbin) use make_shared?
> Yes, we should do this to reduce the number of allocations.
Ack


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@115
PS3, Line 115:     return std::shared_ptr<EntryHandle>(
> Something here seems off. If I Get() the same key twice, and let both Entry
I don't think something is off here.  Every handle returned by Get() aggregates 
distinct instance of Cache::UniqueHandle, and Cache itself is declared to be 
safe with regard to concurrent queries.

I added the test scenario you described.


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@115
PS3, Line 115: return std::shared_ptr<EntryHandle>(
             :         new EntryHandle(DCHECK_NOTNULL(entry_ptr->val_ptr), 
std::move(h)));
> Just to make sure I understand the ownership model of this; are the shared_
No, they are not counting "together.  See the response for Adar's comment at 
line 99


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@123
PS3, Line 123: The 'charge' parameter specifies
             :   // the charge to associate with the entry with regard to the 
overall
             :   // cache's capacity.
> What's the use case for this vs. letting the charge get calculated automati
As far as I can see, automatic calculation of memory consumption is pretty much 
useless unless some blob of raw memory stored in the cache.  The use case for 
supplying the charge for an item explicitly comes from the usage of this TTL 
cache in sentry_authz_provider.{cc,h}

I thought that requiring some particular interface for the values stored in the 
cache (e.g., 'ssize_t memory_footprint() const') might be a better choice, but 
I seems to be ugly as well, so I abandoned it.


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@126
PS3, Line 126:   std::shared_ptr<EntryHandle> Put(const K& key,
> Curious why Put() returns an EntryHandle instead of void? What's the use ca
The use case is in sentry_authz_provider.cc.  The idea is to use the same 
pattern of using cache's handles while retrieving and storing entries in the 
cache.


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@129
PS3, Line 129:     // TODO(aserbin): add footprint for structures used by 
Cache's
             :     //                internal housekeeping (RL handles, etc.)
> This is KUDU-1091, right? Does it need to be tracked in TTLCache as well as
Ah, indeed -- thank you for the reference.


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@131
PS3, Line 131:     Cache::PendingHandle* pending = cache_->Allocate(key, 
sizeof(Entry), charge);
> Could you store this in a unique_ptr that uses Cache::Free as a deleter? Th
That would not be move(), that would be release().  I think it's better to 
change the signature of Cache::Allocate() to consume unique_ptr instead of raw 
pointer and update all corresponding call sites.  I think that will be better 
put in a separate changelist. What do you think?


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@132
PS3, Line 132:     if (!pending) {
             :       LOG(WARNING) << strings::Substitute(
             :           "failed to allocate an entry with charge of $0", 
charge);
             :       return nullptr;
             :     }
> Maybe just CHECK on this instead?
For some reason, I thought Cache::Allocate() would return nullptr if it cannot 
allocate space to fit the cache given its capacity, but it seems current 
Cache::Allocate() doesn't handle that at all.  In that case CHECK() here is 
better, of course.


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@138
PS3, Line 138:     memcpy(cache_->MutableValue(pending), &entry, sizeof(Entry));
> Right now, we:
Done


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@139
PS3, Line 139:     Cache::UniqueHandle h(cache_->Insert(pending, 
eviction_cb_.get()),
> This also evicts an existing entry with key 'key', right? Maybe doc that?
That is documented in the method's comment; I also added a comment for the 
Insert() at-the-site.


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@151
PS3, Line 151:     // Pointer to the 'value'.
> Should probably say something about ownership here. Would it even be possib
Since that's all about storing object in some opaque memory blobs in underlying 
Cache, it doesn't make much sense to use unique_ptr instead of raw pointer 
here: regular rules of variables/member scoping do not work.

I added corresponding comment regarding the ownership of the memory for 
'val_ptr'.


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@157
PS3, Line 157:   // The eviction callback that the underlying FIFO cache uses 
to notify
             :   // about reaching zero reference count for its entries.
> Reword as "Callback invoked by the underlying FIFO cache when a cached entr
Done


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@166
PS3, Line 166:       auto* entry_ptr = 
reinterpret_cast<Entry*>(val.mutable_data());
             :       DCHECK(entry_ptr);
             :       delete entry_ptr->val_ptr;
> Nit: rather than the explicit delete, maybe cast and store into a unique_pt
That sounds too fancy to me :)  Do you mind if I leave it as is?  I think this 
way it's a bit more readable.


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache_metrics.cc
File src/kudu/util/ttl_cache_metrics.cc:

PS3:
> Seems inappropriate for these metrics to be defined for the "TTL cache", be
Right -- I had some back-and-forth where to put this, but eventually dumped it 
in ttl_cache_metircs.cc.  I think I'll move and split that once the part with 
TTL-related gauges settles in.


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache_metrics.cc@41
PS3, Line 41: cache_misses
> ttl_cache_misses
Done


http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache_metrics.cc@49
PS3, Line 49: cache_hits
> ttl_cache_hits
Done



--
To view, visit http://gerrit.cloudera.org:8080/12825
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic8aa0ebe6b26bf34ca0e02bebbdb6a94f6b00621
Gerrit-Change-Number: 12825
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 27 Mar 2019 06:53:45 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [util] introduce TTL cache

Reply via email to