-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64955/#review195917
-----------------------------------------------------------




sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java
Line 60 (original), 61 (patched)
<https://reviews.apache.org/r/64955/#comment275299>

    This using probably unintentional use of Bool instead of primitive boolean. 
There seems to be no reason for using autoboxed types.



sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java
Lines 70 (patched)
<https://reviews.apache.org/r/64955/#comment275308>

    So, the assumption is that the average number of duplicates per each ID 
cannot exceed 3, correct?



sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java
Lines 67 (patched)
<https://reviews.apache.org/r/64955/#comment275341>

    I wonder about the choice of LRUMap for cache.
    
    The ideal data structure would one only store the caches for the events in 
the ID range that we want, which is [MAX(event_id) - refetch_count, 
MAX(event_id)].
    
    LRUMap does not explicitly address this requirement. It assumes that the 
oldest (least accessed) caches will tend to be for the events with the oldest 
IDs. We know this is often not true, which is one of the problems this JIRA 
tries to work around.
    
    Given that when we do not find the entry in cache we check SentryStore 
anyway, everything will still work. But we may end up with a few extra DB calls 
to SentryStore and it's hard to predict the overhead.
    
    The cache structure solving this shortcoming could be 
HashSet<String>[notificationFetchCount +1] array.
    
    It's the circular buffer structure where the cache[ID % cache.length] 
element contains the HashSet of all caches for the events with the given event 
ID. We can see just as fast if the event has been processed.
    
    Before each fetch we can calculate which elements in this cache array 
correspond to the IDs outside the current fetching range and nullify them (for 
curcular buffer some care is needed for modulo arithmetic).
    
    This way we know that
    a) cache stores the caches of the events only in the range we care about, 
though now cleanup is explicit.
    b) we do not necessarily need to put the upper limit on the number of 
events with the given ID (the sizes of HashSet's).
    
    It may save some expensive DB calls.
    
    However, I do not feel strongly one way or the other until we test it and 
see the overhead. Perhaps the fudge factor x3 in LRUMap size is enough to keep 
enough entries, old or new; so I'm leaving it up to you.



sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java
Lines 119 (patched)
<https://reviews.apache.org/r/64955/#comment275317>

    any reason for returning Boolean vs bool?



sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java
Lines 231 (patched)
<https://reviews.apache.org/r/64955/#comment275344>

    optimization - no need for this DB call if the ID > max(storedId)


- Vadim Spector


On Jan. 22, 2018, 5:40 p.m., kalyan kumar kalvagadda wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64955/
> -----------------------------------------------------------
> 
> (Updated Jan. 22, 2018, 5:40 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, Arjun Mishra, Na Li, Sergio 
> Pena, and Vadim Spector.
> 
> 
> Bugs: SENTRY-2109
>     https://issues.apache.org/jira/browse/SENTRY-2109
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> This patch does couple of things
> 1. Avoid triggering full snapshots when gaps are observed while fetching new 
> notifications. While fetching new notifications HMSFollower would would fetch 
> notifications with last event-id as well. When it gets the notifications and 
> if it doesn't get the notifications with event-id, full snpshot is triggered.
> 2. Functinalty to address gaps and out-of-sequence notificaitons by 
> re-fetching addtional notifications that were already fetched. This solution 
> is not fool proof. It does a best effort to reduse the chance of loosing 
> notifications by re-fetching the notifications.This approach will introduce 
> an over head of fetching addtional notifications that were already fetched. 
> Overhead of DB look-up is addressed by using a cache. This reduces additional 
> DB lookups needed to check if the notification was already processed.
> 2. Added looging to report duplicate events and out of order events for debug 
> purpose.
> 3. Added new e2e tests to verfy this behavior.
> 
> 
> Diffs
> -----
> 
>   
> sentry-core/sentry-core-common/src/main/java/org/apache/sentry/core/common/exception/SentryOutOfSyncException.java
>  PRE-CREATION 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
>  6c4631fa74760d8721b5740dd3dffb2c1d8e72e6 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
>  aa1b6a31c28f35af86952c213d5e20a8c9bb3490 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java
>  097aa62912e92ece7f8da0ac0fccb124579a88f2 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/SentryService.java
>  43535a7b50fea51049f3142837736e6a99a3a80f 
>   
> sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/ServiceConstants.java
>  7e02874b4be6a7109108809b1f404fe971b7b8e2 
>   
> sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/provider/db/service/persistent/TestHMSFollowerSentryStoreIntegration.java
>  501898bca261db2daf937a8d803d12a59616192b 
>   
> sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/provider/db/service/persistent/TestSentryStore.java
>  b4100278392986c161625a366212c6fef66ec0a9 
>   
> sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestHMSFollower.java
>  edde886a7646539499149f2d86758979436567bd 
>   
> sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestHiveNotificationFetcher.java
>  83a1becd46ac2d69c7d6dd05ed6253d1cdd8800d 
>   
> sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestSnapshotCreation.java
>  PRE-CREATION 
>   
> sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestSnapshotCreationWithShorterHMSEventTtl.java
>  PRE-CREATION 
>   
> sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestSnapshotWithLongerHMSFollowerLongerInterval.java
>  PRE-CREATION 
>   
> sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/hdfs/TestHDFSIntegrationBase.java
>  4cd00e6672730773c74b9840247d1f4d5f7bdfe4 
> 
> 
> Diff: https://reviews.apache.org/r/64955/diff/4/
> 
> 
> Testing
> -------
> 
> Made sure that tests pass.
> 
> 
> Thanks,
> 
> kalyan kumar kalvagadda
> 
>

Reply via email to