Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/12825 )
Change subject: WIP [util] introduce TTL cache ...................................................................... Patch Set 3: (29 comments) http://gerrit.cloudera.org:8080/#/c/12825/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12825/3//COMMIT_MSG@13 PS3, Line 13: lazily purged > You mentioned this offline, that this was fine because we expect the user o I think that makes sense, but I think it's better to do that in a separate changelist. http://gerrit.cloudera.org:8080/#/c/12825/3//COMMIT_MSG@12 PS3, Line 12: Expired entries are not returned from the cache, but kept around until : they are lazily purged upon cache reaching its capacity while : accommodating more entries. > Curious if you want to introduce a singleton thread that wakes up periodica Yep, I'm planning to do that in a separate changelist. http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache-test.cc File src/kudu/util/ttl_cache-test.cc: http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache-test.cc@55 PS3, Line 55: using TTLTestCache = TTLCache<K, TestValue>; > Seems like you could hardcode string as the key too, given how you're testi Done http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache-test.cc@62 PS3, Line 62: FLAGS_cache_force_single_shard = true; : FLAGS_cache_memtracker_approximation_ratio = 0; > Doc why you set these. Done http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache-test.cc@68 PS3, Line 68: auto h = cache.Get("key0"); : ASSERT_EQ(nullptr, h.get()); > Could combine negative lookups onto one line. Done http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache-test.cc@118 PS3, Line 118: ASSERT_NE(nullptr, put_h.get()); : ASSERT_NE(nullptr, put_h->ptr()); : ASSERT_EQ(100, put_h->ptr()->value); > This seems to be a common pattern; maybe encapsulate in a helper method? Done http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache-test.cc@155 PS3, Line 155: auto h = cache.Get(key); : ASSERT_NE(nullptr, h.get()); : ASSERT_NE(nullptr, h->ptr()); : ASSERT_EQ(i, h->ptr()->value); > Any concern here (or on L170) that the test will run sufficiently slow that Yes, I had that concern, but after running it multiple with 128 threads on ve0518 I didn't see any flakes. If any appears, we can change the cache's TTL. http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache-test.cc@201 PS3, Line 201: ASSERT_NE(nullptr, h.get()); > Unnecessarily duplicated Done http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h File src/kudu/util/ttl_cache.h: http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@40 PS3, Line 40: // A TTL cache. It's a cache of limited capacity with FIFO eviction policy. : // All cache's entries have the same TTL. The TTL for cache's entries is : // specified at the creation time of the cache and does not change during its : // lifetime. > Maybe reword as: "Cache with TTL-based time eviction and FIFO-based capacit Done http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@59 PS3, Line 59: V* ptr() const { : return ptr_; : } > Why offer this at all? Is mutating a cached key's value an important use ca As of now we don't need this, nope. Removed. http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@91 PS3, Line 91: // TODO(aserbin): maybe, prohibit inheriting from this class instead? > I don't think we need to be super protective on these matters; I think it's Done http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@93 PS3, Line 93: : // Retrieve an entry from the cache and return it wrapped into a ref-counting : // handle. If a non-expired entry exists for the specified key, this method : // returns shared pointer for non-null handle. If there is no entry for the : // specified key or the entry has expired at the time of the query, : // nullptr wrapped into a ref-counted handle is returned. > This (and Put()) is a little too much detail. The general "retrieves...if a Done http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@99 PS3, Line 99: std::shared_ptr<EntryHandle> Get(const K& key) { > Why do Get()/Put() wrap EntryHandle in shared_ptr? The TTLCache itself does I decided to use shared_ptr here because of the requirements: 1) handles should support reference counting 2) once handle is returned, it's easy to share it, if necessary, between multiple actors. Yes, multiple Get() calls for the same key return "unrelated" shared pointers. I don't think it's an issue. What would you return from here if not shared_ptr if not building some home-grown wrapper? Returning unique_ptr would make it harder to use: e.g., I don't think using Cache itself with its UniqueHandle is handy. However, if we want this TTL cache to follow the suite, maybe that's an option. What do you think? http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@114 PS3, Line 114: // TODO(aserbin) use make_shared? > Yes, we should do this to reduce the number of allocations. Ack http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@115 PS3, Line 115: return std::shared_ptr<EntryHandle>( > Something here seems off. If I Get() the same key twice, and let both Entry I don't think something is off here. Every handle returned by Get() aggregates distinct instance of Cache::UniqueHandle, and Cache itself is declared to be safe with regard to concurrent queries. I added the test scenario you described. http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@115 PS3, Line 115: return std::shared_ptr<EntryHandle>( : new EntryHandle(DCHECK_NOTNULL(entry_ptr->val_ptr), std::move(h))); > Just to make sure I understand the ownership model of this; are the shared_ No, they are not counting "together. See the response for Adar's comment at line 99 http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@123 PS3, Line 123: The 'charge' parameter specifies : // the charge to associate with the entry with regard to the overall : // cache's capacity. > What's the use case for this vs. letting the charge get calculated automati As far as I can see, automatic calculation of memory consumption is pretty much useless unless some blob of raw memory stored in the cache. The use case for supplying the charge for an item explicitly comes from the usage of this TTL cache in sentry_authz_provider.{cc,h} I thought that requiring some particular interface for the values stored in the cache (e.g., 'ssize_t memory_footprint() const') might be a better choice, but I seems to be ugly as well, so I abandoned it. http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@126 PS3, Line 126: std::shared_ptr<EntryHandle> Put(const K& key, > Curious why Put() returns an EntryHandle instead of void? What's the use ca The use case is in sentry_authz_provider.cc. The idea is to use the same pattern of using cache's handles while retrieving and storing entries in the cache. http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@129 PS3, Line 129: // TODO(aserbin): add footprint for structures used by Cache's : // internal housekeeping (RL handles, etc.) > This is KUDU-1091, right? Does it need to be tracked in TTLCache as well as Ah, indeed -- thank you for the reference. http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@131 PS3, Line 131: Cache::PendingHandle* pending = cache_->Allocate(key, sizeof(Entry), charge); > Could you store this in a unique_ptr that uses Cache::Free as a deleter? Th That would not be move(), that would be release(). I think it's better to change the signature of Cache::Allocate() to consume unique_ptr instead of raw pointer and update all corresponding call sites. I think that will be better put in a separate changelist. What do you think? http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@132 PS3, Line 132: if (!pending) { : LOG(WARNING) << strings::Substitute( : "failed to allocate an entry with charge of $0", charge); : return nullptr; : } > Maybe just CHECK on this instead? For some reason, I thought Cache::Allocate() would return nullptr if it cannot allocate space to fit the cache given its capacity, but it seems current Cache::Allocate() doesn't handle that at all. In that case CHECK() here is better, of course. http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@138 PS3, Line 138: memcpy(cache_->MutableValue(pending), &entry, sizeof(Entry)); > Right now, we: Done http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@139 PS3, Line 139: Cache::UniqueHandle h(cache_->Insert(pending, eviction_cb_.get()), > This also evicts an existing entry with key 'key', right? Maybe doc that? That is documented in the method's comment; I also added a comment for the Insert() at-the-site. http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@151 PS3, Line 151: // Pointer to the 'value'. > Should probably say something about ownership here. Would it even be possib Since that's all about storing object in some opaque memory blobs in underlying Cache, it doesn't make much sense to use unique_ptr instead of raw pointer here: regular rules of variables/member scoping do not work. I added corresponding comment regarding the ownership of the memory for 'val_ptr'. http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@157 PS3, Line 157: // The eviction callback that the underlying FIFO cache uses to notify : // about reaching zero reference count for its entries. > Reword as "Callback invoked by the underlying FIFO cache when a cached entr Done http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache.h@166 PS3, Line 166: auto* entry_ptr = reinterpret_cast<Entry*>(val.mutable_data()); : DCHECK(entry_ptr); : delete entry_ptr->val_ptr; > Nit: rather than the explicit delete, maybe cast and store into a unique_pt That sounds too fancy to me :) Do you mind if I leave it as is? I think this way it's a bit more readable. http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache_metrics.cc File src/kudu/util/ttl_cache_metrics.cc: PS3: > Seems inappropriate for these metrics to be defined for the "TTL cache", be Right -- I had some back-and-forth where to put this, but eventually dumped it in ttl_cache_metircs.cc. I think I'll move and split that once the part with TTL-related gauges settles in. http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache_metrics.cc@41 PS3, Line 41: cache_misses > ttl_cache_misses Done http://gerrit.cloudera.org:8080/#/c/12825/3/src/kudu/util/ttl_cache_metrics.cc@49 PS3, Line 49: cache_hits > ttl_cache_hits Done -- To view, visit http://gerrit.cloudera.org:8080/12825 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic8aa0ebe6b26bf34ca0e02bebbdb6a94f6b00621 Gerrit-Change-Number: 12825 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Wed, 27 Mar 2019 06:53:45 +0000 Gerrit-HasComments: Yes
