> On Aug. 31, 2017, 12:25 a.m., Alexander Kolbasov wrote: > > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java > > Lines 127 (patched) > > <https://reviews.apache.org/r/61973/diff/1/?file=1807071#file1807071line127> > > > > If I am reading this code correctly, the following will happen: > > > > 1) Every time we request IDs starting from last seen ID, so normally > > the first ID in the sequence is always a duplicate > > 2) This first ID in the sequence isn't in the cache yet, so we go to > > the sentryStore and fetch the hash and compare. > > > > This means that we always go to sentryStore for the frist element of > > the batch which means every time we call fetchNotifications(). > > > > This is not very bad, but if we store the sha1 hash of the last event > > in the cache we don't have to go to sentryStore at all under normal > > circumstances > > Sergio Pena wrote: > Keeping the cache of the last ID is not enough. The > HiveNotificationFetcher does not know if the IDs fetched are processed, so > the next call to fetchNotifications() may be called with an ID that was not > cached. If I want to cache all of them, then I will add more procesisng on > getting the sha1() of all the notifications and adding them to the cache. > Isn't simpler just to keep it the way it is now?
This is possible but requires a different code structure. This fix is good enough. - Alexander ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/61973/#review184215 ----------------------------------------------------------- On Aug. 31, 2017, 6:13 p.m., Sergio Pena wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/61973/ > ----------------------------------------------------------- > > (Updated Aug. 31, 2017, 6:13 p.m.) > > > Review request for sentry, Alexander Kolbasov and Na Li. > > > Bugs: sentry-1888 > https://issues.apache.org/jira/browse/sentry-1888 > > > Repository: sentry > > > Description > ------- > > Approach: > The proposed solution will be to request notifications from the last ID seen > again. This way we could bring current duplicates and apply them on Sentry. > We have the risk to miss duplicates that were committed much time later, but > we cannot trust on those duplicates as they will not know the order of the > time they were committed. > > The patch adds a new helper class called HiveNotificationFetcher that allow > us to bring new notifications and apply a filter to those notifications that > were already processed by Sentry. > > > Diffs > ----- > > > sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/UniquePathsUpdate.java > 7dae2f538602f4c264084791fb45bb6891a34941 > > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java > 593b92f96b47f959266280bce3d0809f6a80c362 > > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java > 35da6fc7908ec7498d1fd658d75b62028df35751 > > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java > PRE-CREATION > > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/SentryHMSClient.java > 4a8fb953ce486e1aeb1042884de56872b5539cd0 > > sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestHMSFollower.java > f56384a99e128b3e9880cbc5692107d61f2f500f > > sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestHiveNotificationFetcher.java > PRE-CREATION > > sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestSentryHMSClient.java > 77a2bbb214e23ca146c2934ee4d3bf15e592906f > > > Diff: https://reviews.apache.org/r/61973/diff/2/ > > > Testing > ------- > > > Thanks, > > Sergio Pena > >
