----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/64955/#review195917 -----------------------------------------------------------
sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java Line 60 (original), 61 (patched) <https://reviews.apache.org/r/64955/#comment275299> This using probably unintentional use of Bool instead of primitive boolean. There seems to be no reason for using autoboxed types. sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java Lines 70 (patched) <https://reviews.apache.org/r/64955/#comment275308> So, the assumption is that the average number of duplicates per each ID cannot exceed 3, correct? sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java Lines 67 (patched) <https://reviews.apache.org/r/64955/#comment275341> I wonder about the choice of LRUMap for cache. The ideal data structure would one only store the caches for the events in the ID range that we want, which is [MAX(event_id) - refetch_count, MAX(event_id)]. LRUMap does not explicitly address this requirement. It assumes that the oldest (least accessed) caches will tend to be for the events with the oldest IDs. We know this is often not true, which is one of the problems this JIRA tries to work around. Given that when we do not find the entry in cache we check SentryStore anyway, everything will still work. But we may end up with a few extra DB calls to SentryStore and it's hard to predict the overhead. The cache structure solving this shortcoming could be HashSet<String>[notificationFetchCount +1] array. It's the circular buffer structure where the cache[ID % cache.length] element contains the HashSet of all caches for the events with the given event ID. We can see just as fast if the event has been processed. Before each fetch we can calculate which elements in this cache array correspond to the IDs outside the current fetching range and nullify them (for curcular buffer some care is needed for modulo arithmetic). This way we know that a) cache stores the caches of the events only in the range we care about, though now cleanup is explicit. b) we do not necessarily need to put the upper limit on the number of events with the given ID (the sizes of HashSet's). It may save some expensive DB calls. However, I do not feel strongly one way or the other until we test it and see the overhead. Perhaps the fudge factor x3 in LRUMap size is enough to keep enough entries, old or new; so I'm leaving it up to you. sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java Lines 119 (patched) <https://reviews.apache.org/r/64955/#comment275317> any reason for returning Boolean vs bool? sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java Lines 231 (patched) <https://reviews.apache.org/r/64955/#comment275344> optimization - no need for this DB call if the ID > max(storedId) - Vadim Spector On Jan. 22, 2018, 5:40 p.m., kalyan kumar kalvagadda wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/64955/ > ----------------------------------------------------------- > > (Updated Jan. 22, 2018, 5:40 p.m.) > > > Review request for sentry, Alexander Kolbasov, Arjun Mishra, Na Li, Sergio > Pena, and Vadim Spector. > > > Bugs: SENTRY-2109 > https://issues.apache.org/jira/browse/SENTRY-2109 > > > Repository: sentry > > > Description > ------- > > This patch does couple of things > 1. Avoid triggering full snapshots when gaps are observed while fetching new > notifications. While fetching new notifications HMSFollower would would fetch > notifications with last event-id as well. When it gets the notifications and > if it doesn't get the notifications with event-id, full snpshot is triggered. > 2. Functinalty to address gaps and out-of-sequence notificaitons by > re-fetching addtional notifications that were already fetched. This solution > is not fool proof. It does a best effort to reduse the chance of loosing > notifications by re-fetching the notifications.This approach will introduce > an over head of fetching addtional notifications that were already fetched. > Overhead of DB look-up is addressed by using a cache. This reduces additional > DB lookups needed to check if the notification was already processed. > 2. Added looging to report duplicate events and out of order events for debug > purpose. > 3. Added new e2e tests to verfy this behavior. > > > Diffs > ----- > > > sentry-core/sentry-core-common/src/main/java/org/apache/sentry/core/common/exception/SentryOutOfSyncException.java > PRE-CREATION > > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java > 6c4631fa74760d8721b5740dd3dffb2c1d8e72e6 > > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java > aa1b6a31c28f35af86952c213d5e20a8c9bb3490 > > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java > 097aa62912e92ece7f8da0ac0fccb124579a88f2 > > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/SentryService.java > 43535a7b50fea51049f3142837736e6a99a3a80f > > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/ServiceConstants.java > 7e02874b4be6a7109108809b1f404fe971b7b8e2 > > sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollowerSentryStoreIntegration.java > 501898bca261db2daf937a8d803d12a59616192b > > sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java > b4100278392986c161625a366212c6fef66ec0a9 > > sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestHMSFollower.java > edde886a7646539499149f2d86758979436567bd > > sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestHiveNotificationFetcher.java > 83a1becd46ac2d69c7d6dd05ed6253d1cdd8800d > > sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestSnapshotCreation.java > PRE-CREATION > > sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestSnapshotCreationWithShorterHMSEventTtl.java > PRE-CREATION > > sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestSnapshotWithLongerHMSFollowerLongerInterval.java > PRE-CREATION > > sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/hdfs/TestHDFSIntegrationBase.java > 4cd00e6672730773c74b9840247d1f4d5f7bdfe4 > > > Diff: https://reviews.apache.org/r/64955/diff/4/ > > > Testing > ------- > > Made sure that tests pass. > > > Thanks, > > kalyan kumar kalvagadda > >