> On Jan. 22, 2018, 8:49 p.m., Vadim Spector wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java
> > Lines 67 (patched)
> > <https://reviews.apache.org/r/64955/diff/4/?file=1943745#file1943745line68>
> >
> >     I wonder about the choice of LRUMap for cache.
> >     
> >     The ideal data structure would one only store the caches for the events 
> > in the ID range that we want, which is [MAX(event_id) - refetch_count, 
> > MAX(event_id)].
> >     
> >     LRUMap does not explicitly address this requirement. It assumes that 
> > the oldest (least accessed) caches will tend to be for the events with the 
> > oldest IDs. We know this is often not true, which is one of the problems 
> > this JIRA tries to work around.
> >     
> >     Given that when we do not find the entry in cache we check SentryStore 
> > anyway, everything will still work. But we may end up with a few extra DB 
> > calls to SentryStore and it's hard to predict the overhead.
> >     
> >     The cache structure solving this shortcoming could be 
> > HashSet<String>[notificationFetchCount +1] array.
> >     
> >     It's the circular buffer structure where the cache[ID % cache.length] 
> > element contains the HashSet of all caches for the events with the given 
> > event ID. We can see just as fast if the event has been processed.
> >     
> >     Before each fetch we can calculate which elements in this cache array 
> > correspond to the IDs outside the current fetching range and nullify them 
> > (for curcular buffer some care is needed for modulo arithmetic).
> >     
> >     This way we know that
> >     a) cache stores the caches of the events only in the range we care 
> > about, though now cleanup is explicit.
> >     b) we do not necessarily need to put the upper limit on the number of 
> > events with the given ID (the sizes of HashSet's).
> >     
> >     It may save some expensive DB calls.
> >     
> >     However, I do not feel strongly one way or the other until we test it 
> > and see the overhead. Perhaps the fudge factor x3 in LRUMap size is enough 
> > to keep enough entries, old or new; so I'm leaving it up to you.

Vadim, I prefer to use circular buffer as well. Mentioned this to Kalyan 
before. The memory usage and execution overhead should be much smaller than 
LRUMap.


- Na


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64955/#review195917
-----------------------------------------------------------


On Jan. 22, 2018, 5:40 p.m., kalyan kumar kalvagadda wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64955/
> -----------------------------------------------------------
> 
> (Updated Jan. 22, 2018, 5:40 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, Arjun Mishra, Na Li, Sergio 
> Pena, and Vadim Spector.
> 
> 
> Bugs: SENTRY-2109
>     https://issues.apache.org/jira/browse/SENTRY-2109
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> This patch does couple of things
> 1. Avoid triggering full snapshots when gaps are observed while fetching new 
> notifications. While fetching new notifications HMSFollower would would fetch 
> notifications with last event-id as well. When it gets the notifications and 
> if it doesn't get the notifications with event-id, full snpshot is triggered.
> 2. Functinalty to address gaps and out-of-sequence notificaitons by 
> re-fetching addtional notifications that were already fetched. This solution 
> is not fool proof. It does a best effort to reduse the chance of loosing 
> notifications by re-fetching the notifications.This approach will introduce 
> an over head of fetching addtional notifications that were already fetched. 
> Overhead of DB look-up is addressed by using a cache. This reduces additional 
> DB lookups needed to check if the notification was already processed.
> 2. Added looging to report duplicate events and out of order events for debug 
> purpose.
> 3. Added new e2e tests to verfy this behavior.
> 
> 
> Diffs
> -----
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/core/common/exception/SentryOutOfSyncException.java
>  PRE-CREATION 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  6c4631fa74760d8721b5740dd3dffb2c1d8e72e6 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
>  aa1b6a31c28f35af86952c213d5e20a8c9bb3490 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java
>  097aa62912e92ece7f8da0ac0fccb124579a88f2 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/SentryService.java
>  43535a7b50fea51049f3142837736e6a99a3a80f 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/ServiceConstants.java
>  7e02874b4be6a7109108809b1f404fe971b7b8e2 
>   
> sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollowerSentryStoreIntegration.java
>  501898bca261db2daf937a8d803d12a59616192b 
>   
> sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  b4100278392986c161625a366212c6fef66ec0a9 
>   
> sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestHMSFollower.java
>  edde886a7646539499149f2d86758979436567bd 
>   
> sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestHiveNotificationFetcher.java
>  83a1becd46ac2d69c7d6dd05ed6253d1cdd8800d 
>   
> sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestSnapshotCreation.java
>  PRE-CREATION 
>   
> sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestSnapshotCreationWithShorterHMSEventTtl.java
>  PRE-CREATION 
>   
> sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestSnapshotWithLongerHMSFollowerLongerInterval.java
>  PRE-CREATION 
>   
> sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/hdfs/TestHDFSIntegrationBase.java
>  4cd00e6672730773c74b9840247d1f4d5f7bdfe4 
> 
> 
> Diff: https://reviews.apache.org/r/64955/diff/4/
> 
> 
> Testing
> -------
> 
> Made sure that tests pass.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>

Reply via email to