> On Jan. 22, 2018, 8:49 p.m., Vadim Spector wrote: > > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java > > Lines 67 (patched) > > <https://reviews.apache.org/r/64955/diff/4/?file=1943745#file1943745line68> > > > > I wonder about the choice of LRUMap for cache. > > > > The ideal data structure would one only store the caches for the events > > in the ID range that we want, which is [MAX(event_id) - refetch_count, > > MAX(event_id)]. > > > > LRUMap does not explicitly address this requirement. It assumes that > > the oldest (least accessed) caches will tend to be for the events with the > > oldest IDs. We know this is often not true, which is one of the problems > > this JIRA tries to work around. > > > > Given that when we do not find the entry in cache we check SentryStore > > anyway, everything will still work. But we may end up with a few extra DB > > calls to SentryStore and it's hard to predict the overhead. > > > > The cache structure solving this shortcoming could be > > HashSet<String>[notificationFetchCount +1] array. > > > > It's the circular buffer structure where the cache[ID % cache.length] > > element contains the HashSet of all caches for the events with the given > > event ID. We can see just as fast if the event has been processed. > > > > Before each fetch we can calculate which elements in this cache array > > correspond to the IDs outside the current fetching range and nullify them > > (for curcular buffer some care is needed for modulo arithmetic). > > > > This way we know that > > a) cache stores the caches of the events only in the range we care > > about, though now cleanup is explicit. > > b) we do not necessarily need to put the upper limit on the number of > > events with the given ID (the sizes of HashSet's). > > > > It may save some expensive DB calls. > > > > However, I do not feel strongly one way or the other until we test it > > and see the overhead. Perhaps the fudge factor x3 in LRUMap size is enough > > to keep enough entries, old or new; so I'm leaving it up to you.
Vadim, I prefer to use circular buffer as well. Mentioned this to Kalyan before. The memory usage and execution overhead should be much smaller than LRUMap. - Na ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/64955/#review195917 ----------------------------------------------------------- On Jan. 22, 2018, 5:40 p.m., kalyan kumar kalvagadda wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/64955/ > ----------------------------------------------------------- > > (Updated Jan. 22, 2018, 5:40 p.m.) > > > Review request for sentry, Alexander Kolbasov, Arjun Mishra, Na Li, Sergio > Pena, and Vadim Spector. > > > Bugs: SENTRY-2109 > https://issues.apache.org/jira/browse/SENTRY-2109 > > > Repository: sentry > > > Description > ------- > > This patch does couple of things > 1. Avoid triggering full snapshots when gaps are observed while fetching new > notifications. While fetching new notifications HMSFollower would would fetch > notifications with last event-id as well. When it gets the notifications and > if it doesn't get the notifications with event-id, full snpshot is triggered. > 2. Functinalty to address gaps and out-of-sequence notificaitons by > re-fetching addtional notifications that were already fetched. This solution > is not fool proof. It does a best effort to reduse the chance of loosing > notifications by re-fetching the notifications.This approach will introduce > an over head of fetching addtional notifications that were already fetched. > Overhead of DB look-up is addressed by using a cache. This reduces additional > DB lookups needed to check if the notification was already processed. > 2. Added looging to report duplicate events and out of order events for debug > purpose. > 3. Added new e2e tests to verfy this behavior. > > > Diffs > ----- > > > sentry-core/sentry-core-common/src/main/java/org/apache/sentry/core/common/exception/SentryOutOfSyncException.java > PRE-CREATION > > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java > 6c4631fa74760d8721b5740dd3dffb2c1d8e72e6 > > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java > aa1b6a31c28f35af86952c213d5e20a8c9bb3490 > > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java > 097aa62912e92ece7f8da0ac0fccb124579a88f2 > > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/SentryService.java > 43535a7b50fea51049f3142837736e6a99a3a80f > > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/ServiceConstants.java > 7e02874b4be6a7109108809b1f404fe971b7b8e2 > > sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollowerSentryStoreIntegration.java > 501898bca261db2daf937a8d803d12a59616192b > > sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java > b4100278392986c161625a366212c6fef66ec0a9 > > sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestHMSFollower.java > edde886a7646539499149f2d86758979436567bd > > sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestHiveNotificationFetcher.java > 83a1becd46ac2d69c7d6dd05ed6253d1cdd8800d > > sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestSnapshotCreation.java > PRE-CREATION > > sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestSnapshotCreationWithShorterHMSEventTtl.java > PRE-CREATION > > sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestSnapshotWithLongerHMSFollowerLongerInterval.java > PRE-CREATION > > sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/hdfs/TestHDFSIntegrationBase.java > 4cd00e6672730773c74b9840247d1f4d5f7bdfe4 > > > Diff: https://reviews.apache.org/r/64955/diff/4/ > > > Testing > ------- > > Made sure that tests pass. > > > Thanks, > > kalyan kumar kalvagadda > >