Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-11-30 Thread Arjun Mishra via Review Board


> On Sept. 28, 2018, 4:56 p.m., Sergio Pena wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
> > Lines 136-137 (patched)
> > 
> >
> > Why is the cache needed to be instantiated and passed as aparameter 
> > instead of instatiating the cache inside the PathImageRetriever and 
> > PermImageRetriever instead?
> > 
> > A retriever returns the paths, but if they're cache, then it returns 
> > the ones from the cache.
> 
> Arjun Mishra wrote:
> Sergio we haev 2 path classes PathImageRetriever, PathDeltaRetriever. 
> Even though we need cache to be in PathImageRetriever, we need the cache to 
> be visible to PathDeltaRetriever so it can be invalidated.
> 
> Sergio Pena wrote:
> Is a cache needed for deltas as well? I think the cache should be handled 
> internally on each retriever.

Yes but that is a different ticket


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209098
---


On Oct. 5, 2018, 7:11 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Oct. 5, 2018, 7:11 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/11/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-10-08 Thread Arjun Mishra via Review Board


> On Oct. 8, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
> > Lines 139 (patched)
> > 
> >
> > If imageCache.get() is not null, then it might be null after the if 
> > condition. The reason is GC could remove it from memory if it is not a 
> > hard-reference.

Will change it


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209315
---


On Oct. 5, 2018, 7:11 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Oct. 5, 2018, 7:11 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/11/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-10-08 Thread Arjun Mishra via Review Board


> On Oct. 8, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
> > Lines 140-145 (patched)
> > 
> >
> > Invalidate or cleanUp? I like to separate the verbs. Invalidate means 
> > the current cache is not valid and it will be replaced (but not necessary 
> > removed from memory). Clean is what is says, clean from memory.
> > 
> > Also, I think we should see the future of HDFS Federation. We can get a 
> > benefit if we avoid cleaning the cache when multiple NN request images from 
> > Sentry. This way Sentry will return the cache image.

I am fine with that. Some of the team members were against using too much heap


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209315
---


On Oct. 5, 2018, 7:11 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Oct. 5, 2018, 7:11 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/11/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-10-08 Thread Arjun Mishra via Review Board


> On Oct. 8, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
> > Lines 248-266 (patched)
> > 
> >
> > Is there a way to mock the cache objects instead of having new methods 
> > to get its internal value? Or mock the imageRetriever and verify it was 
> > called when a cache is not valid.

I would then have to encapsulate these attributes cacheImgNum, cacheSequenceNum 
into their own class. Then they can be mocked. In your previous code review, 
you were against created unnecessary classes so it wasn't done


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209315
---


On Oct. 5, 2018, 7:11 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Oct. 5, 2018, 7:11 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/11/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-10-08 Thread Arjun Mishra via Review Board


> On Oct. 8, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
> > Lines 222 (patched)
> > 
> >
> > Just check if they're different. It is a simple case.  A cache is not 
> > more valid if the seq num is different (higher or lower, it does not 
> > matter). This logic is assumming HDFS will never request a lower value. We 
> > should not assume that.

cache.seqNum needs to be strictly less than for it to request a fresh full 
update. If cache.seqNum is greater than requestedSeqNum, we can simply return 
that cache and the subsequent request NN will ask for deltas  greater than 
cache.seqNum so it will be able to catch up. If that sequence number doesn't 
not exist, we will do a fresh fetch from Sentry db


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209315
---


On Oct. 5, 2018, 7:11 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Oct. 5, 2018, 7:11 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/11/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-10-08 Thread Arjun Mishra via Review Board


> On Oct. 8, 2018, 4:18 p.m., Sergio Pena wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
> > Line 159 (original), 216-220 (patched)
> > 
> >
> > In the if() above, if imageNum != than the cache, then that will return 
> > null. When is this if going to be valid? imageNum will be teh same than 
> > cache.

There is some confusion. I think you are talking about this condition "if 
(updateCache.getImgNum() != UNUSED_PATH_UPDATE_IMG_NUM && 
updateCache.getImgNum() != latestImageNum) {". Here "latestImageNum" is the 
value of MAX(SNAPSHOT_ID) in the sentry database. The two condititions are 
checking if cache.imageNum != latestImageNum, then return null, else if 
cache.imageNum is less than the requestedImgNum then return null.


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209315
---


On Oct. 5, 2018, 7:11 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Oct. 5, 2018, 7:11 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/11/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-10-08 Thread Sergio Pena via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209315
---



Did you look at the Guava cache?

Read this part of using soft values in the Guava cache (btw, it has invalidate 
vs cleanUp methods which means different things):
http://blog.jessitron.com/2011/10/keeping-your-cache-down-to-size.html


I was looking at the Guava cache and it will be valuable resource for using it 
in this cache instead of doing our own thing. It is much simpler and less prone 
to errors. Really useful.


sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
Lines 139 (patched)


If imageCache.get() is not null, then it might be null after the if 
condition. The reason is GC could remove it from memory if it is not a 
hard-reference.



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
Lines 140-145 (patched)


Invalidate or cleanUp? I like to separate the verbs. Invalidate means the 
current cache is not valid and it will be replaced (but not necessary removed 
from memory). Clean is what is says, clean from memory.

Also, I think we should see the future of HDFS Federation. We can get a 
benefit if we avoid cleaning the cache when multiple NN request images from 
Sentry. This way Sentry will return the cache image.



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
Line 159 (original), 216-220 (patched)


In the if() above, if imageNum != than the cache, then that will return 
null. When is this if going to be valid? imageNum will be teh same than cache.



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
Lines 222 (patched)


Just check if they're different. It is a simple case.  A cache is not more 
valid if the seq num is different (higher or lower, it does not matter). This 
logic is assumming HDFS will never request a lower value. We should not assume 
that.



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
Lines 248-266 (patched)


Is there a way to mock the cache objects instead of having new methods to 
get its internal value? Or mock the imageRetriever and verify it was called 
when a cache is not valid.


- Sergio Pena


On Oct. 5, 2018, 7:11 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Oct. 5, 2018, 7:11 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/11/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-10-05 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/
---

(Updated Oct. 5, 2018, 7:11 p.m.)


Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.


Bugs: SENTRY-2370
https://issues.apache.org/jira/browse/SENTRY-2370


Repository: sentry


Description
---

When NN requests path updates from sentry and if it exceeds the time threshold, 
on consecutive attempts sentry will attempt to fetch the full update from 
scratch. Instead it should cache it and update the cache before sending it to NN


Diffs (updated)
-

  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
 08b16a4df 
  
sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
 f86ce6f83 


Diff: https://reviews.apache.org/r/68547/diff/11/

Changes: https://reviews.apache.org/r/68547/diff/10-11/


Testing
---


Thanks,

Arjun Mishra



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-10-05 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/
---

(Updated Oct. 5, 2018, 7:08 p.m.)


Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.


Bugs: SENTRY-2370
https://issues.apache.org/jira/browse/SENTRY-2370


Repository: sentry


Description
---

When NN requests path updates from sentry and if it exceeds the time threshold, 
on consecutive attempts sentry will attempt to fetch the full update from 
scratch. Instead it should cache it and update the cache before sending it to NN


Diffs (updated)
-

  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
 08b16a4df 
  
sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
 f86ce6f83 


Diff: https://reviews.apache.org/r/68547/diff/10/

Changes: https://reviews.apache.org/r/68547/diff/9-10/


Testing
---


Thanks,

Arjun Mishra



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-10-04 Thread Arjun Mishra via Review Board


> On Oct. 3, 2018, 6:55 p.m., Sergio Pena wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
> > Lines 97-107 (patched)
> > 
> >
> > Why do we need the sequence# and image# here? Those are already checked 
> > in the DbUpdateForward class. I think the cache invalidation here should be 
> > done by looking at the current snapshot ID on the SentryStore and comparing 
> > it with the one store in the cache.

We are not invalidating cache here. We are just checking if the cache is 
current. If it isn't, we fetch a fresh full update from Sentry Db, and update 
the cache


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209184
---


On Oct. 2, 2018, 9:57 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Oct. 2, 2018, 9:57 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageRetriever.java
>  11b75411d 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
>  3532ef33d 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermImageRetriever.java
>  443434127 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestImageRetriever.java
>  fb42b279a 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/9/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-10-03 Thread Sergio Pena via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209184
---



The same PathImageRetriever feedback applies to PermissionImageRetriever.

Btw, while looking at the code completely, I think it makes more sense to keep 
the cache in the DBUpdateForwarder class. That class handles the seqNum and 
imgNum that can be used to make your logic of whether fetch a new snapshot or 
just return the cache. Also the code will be re-used for paths and permissions.


sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageRetriever.java
Lines 65 (patched)


Seems this method is used only internally on each class. Is it needed to be 
part of the interface and public?



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
Line 131 (original), 131 (patched)


Does this 'Clear cache if present' help for supportability? How do you know 
the cache was free or not?



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
Lines 132 (patched)


Leave a comment on this call why you're releasing the cache from memory, 
and why at this point of the call.



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
Lines 44 (patched)


Code convention: Need a space between the variable type and name.



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
Lines 62 (patched)


Code convention: if () instead of if()



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
Lines 65 (patched)


Could you call clearCache() before creating another SoftReference? Just to 
make sure the cache is cleared and can be claimed by GC. We want to make sure 
there are not hard references leaked that causes a memory leak.



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
Lines 79 (patched)


Code convention: If () instead of if()



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
Lines 80 (patched)


The SoftReference object has a clear(). I think we should call it too 
before setting the variable to null.



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
Lines 97-107 (patched)


Why do we need the sequence# and image# here? Those are already checked in 
the DbUpdateForward class. I think the cache invalidation here should be done 
by looking at the current snapshot ID on the SentryStore and comparing it with 
the one store in the cache.


- Sergio Pena


On Oct. 2, 2018, 9:57 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Oct. 2, 2018, 9:57 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageRetriever.java
>  11b75411d 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
>  3532ef33d 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermImageRetriever.java
>  443434127 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestImageRetriever.java
>  fb42b279a 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/9/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-10-02 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/
---

(Updated Oct. 2, 2018, 9:57 p.m.)


Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.


Bugs: SENTRY-2370
https://issues.apache.org/jira/browse/SENTRY-2370


Repository: sentry


Description
---

When NN requests path updates from sentry and if it exceeds the time threshold, 
on consecutive attempts sentry will attempt to fetch the full update from 
scratch. Instead it should cache it and update the cache before sending it to NN


Diffs (updated)
-

  
sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageRetriever.java
 11b75411d 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
 08b16a4df 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
 3532ef33d 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermImageRetriever.java
 443434127 
  
sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
 f86ce6f83 
  
sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestImageRetriever.java
 fb42b279a 


Diff: https://reviews.apache.org/r/68547/diff/9/

Changes: https://reviews.apache.org/r/68547/diff/8-9/


Testing
---


Thanks,

Arjun Mishra



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-10-02 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/
---

(Updated Oct. 2, 2018, 9:47 p.m.)


Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.


Changes
---

Post Sergio's feedback


Bugs: SENTRY-2370
https://issues.apache.org/jira/browse/SENTRY-2370


Repository: sentry


Description
---

When NN requests path updates from sentry and if it exceeds the time threshold, 
on consecutive attempts sentry will attempt to fetch the full update from 
scratch. Instead it should cache it and update the cache before sending it to NN


Diffs (updated)
-

  
sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageRetriever.java
 11b75411d 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
 08b16a4df 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
 3532ef33d 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermImageRetriever.java
 443434127 
  
sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
 f86ce6f83 
  
sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestImageRetriever.java
 fb42b279a 


Diff: https://reviews.apache.org/r/68547/diff/8/

Changes: https://reviews.apache.org/r/68547/diff/7-8/


Testing
---


Thanks,

Arjun Mishra



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-10-01 Thread Arjun Mishra via Review Board


> On Sept. 28, 2018, 4:56 p.m., Sergio Pena wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
> > Line 158 (original), 168-172 (patched)
> > 
> >
> > I don't think this logic whether use a cache or not should be in this 
> > clas. The retriever should call the cache instead (or call the SentryStore 
> > if it needs to invalidate the current cache image).
> 
> Arjun Mishra wrote:
> This is because the logic to invalidate cache is dependent on whether 
> delta updates are being sent or not. Since the decisision to send delta 
> updates is done in DBUpdateForwarder this logic sits in this piece of code
> 
> Sergio Pena wrote:
> Do you need the delta number to invalidate metadata for a snapshot? The 
> PathImageRetriever can keep a cache of the current snapshot and invalidate 
> its cache if the next snapshot ID is different, otherwise return the full 
> path image from the cache (read a SoftReference example on how help clean the 
> cache if it is not used).

Yes we need delta numbers. If the image id is the same but the sequence number 
requested is greater than cache sequence number by 1 we need to request a a 
full update for sequence numbers that don't exist in sentry db


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209098
---


On Sept. 28, 2018, 7:56 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Sept. 28, 2018, 7:56 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermissionsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
>  b8f5ce7db 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/7/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-10-01 Thread Arjun Mishra via Review Board


> On Sept. 28, 2018, 4:56 p.m., Sergio Pena wrote:
> > What if NN does not request full images anymore? How can the cache be 
> > released from memory to avoid unused cache for the rest of the process life?
> 
> Arjun Mishra wrote:
> Sergio, the cache is releases as soon as Deltas are being sent to NN
> 
> Sergio Pena wrote:
> what if there are no Deltas available to send in the next NN requests? 
> The cache will live on memory until then.
> Look for Java soft or weak references (and the use of SoftReferences or 
> WeakReferences). These objects might help on letting the GC to clean your 
> cache if it is not needed anymore.

Deltas don't have to be sent they only have to be requested by NN. As soon as 
NN gets a full update, it will start requesting delta updates and just the 
trigger for requesting delta updates will invalidate cache
I will research soft references


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209098
---


On Sept. 28, 2018, 7:56 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Sept. 28, 2018, 7:56 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermissionsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
>  b8f5ce7db 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/7/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-09-28 Thread Sergio Pena via Review Board


> On Sept. 28, 2018, 4:56 p.m., Sergio Pena wrote:
> > What if NN does not request full images anymore? How can the cache be 
> > released from memory to avoid unused cache for the rest of the process life?
> 
> Arjun Mishra wrote:
> Sergio, the cache is releases as soon as Deltas are being sent to NN

what if there are no Deltas available to send in the next NN requests? The 
cache will live on memory until then.
Look for Java soft or weak references (and the use of SoftReferences or 
WeakReferences). These objects might help on letting the GC to clean your cache 
if it is not needed anymore.


> On Sept. 28, 2018, 4:56 p.m., Sergio Pena wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
> > Line 158 (original), 168-172 (patched)
> > 
> >
> > I don't think this logic whether use a cache or not should be in this 
> > clas. The retriever should call the cache instead (or call the SentryStore 
> > if it needs to invalidate the current cache image).
> 
> Arjun Mishra wrote:
> This is because the logic to invalidate cache is dependent on whether 
> delta updates are being sent or not. Since the decisision to send delta 
> updates is done in DBUpdateForwarder this logic sits in this piece of code

Do you need the delta number to invalidate metadata for a snapshot? The 
PathImageRetriever can keep a cache of the current snapshot and invalidate its 
cache if the next snapshot ID is different, otherwise return the full path 
image from the cache (read a SoftReference example on how help clean the cache 
if it is not used).


> On Sept. 28, 2018, 4:56 p.m., Sergio Pena wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
> > Lines 136-137 (patched)
> > 
> >
> > Why is the cache needed to be instantiated and passed as aparameter 
> > instead of instatiating the cache inside the PathImageRetriever and 
> > PermImageRetriever instead?
> > 
> > A retriever returns the paths, but if they're cache, then it returns 
> > the ones from the cache.
> 
> Arjun Mishra wrote:
> Sergio we haev 2 path classes PathImageRetriever, PathDeltaRetriever. 
> Even though we need cache to be in PathImageRetriever, we need the cache to 
> be visible to PathDeltaRetriever so it can be invalidated.

Is a cache needed for deltas as well? I think the cache should be handled 
internally on each retriever.


- Sergio


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209098
---


On Sept. 28, 2018, 7:56 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Sept. 28, 2018, 7:56 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermissionsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
>  b8f5ce7db 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/7/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-09-28 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/
---

(Updated Sept. 28, 2018, 7:56 p.m.)


Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.


Changes
---

Removed configuration and accommodated Kalyan's suggestions


Bugs: SENTRY-2370
https://issues.apache.org/jira/browse/SENTRY-2370


Repository: sentry


Description
---

When NN requests path updates from sentry and if it exceeds the time threshold, 
on consecutive attempts sentry will attempt to fetch the full update from 
scratch. Instead it should cache it and update the cache before sending it to NN


Diffs (updated)
-

  
sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageCache.java
 PRE-CREATION 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
 08b16a4df 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathsUpdateImageCache.java
 PRE-CREATION 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermissionsUpdateImageCache.java
 PRE-CREATION 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
 b8f5ce7db 
  
sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
 f86ce6f83 


Diff: https://reviews.apache.org/r/68547/diff/7/

Changes: https://reviews.apache.org/r/68547/diff/6-7/


Testing
---


Thanks,

Arjun Mishra



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-09-28 Thread Arjun Mishra via Review Board


> On Sept. 28, 2018, 4:56 p.m., Sergio Pena wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
> > Lines 136-137 (patched)
> > 
> >
> > Why is the cache needed to be instantiated and passed as aparameter 
> > instead of instatiating the cache inside the PathImageRetriever and 
> > PermImageRetriever instead?
> > 
> > A retriever returns the paths, but if they're cache, then it returns 
> > the ones from the cache.

Sergio we haev 2 path classes PathImageRetriever, PathDeltaRetriever. Even 
though we need cache to be in PathImageRetriever, we need the cache to be 
visible to PathDeltaRetriever so it can be invalidated.


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209098
---


On Sept. 27, 2018, 10:29 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Sept. 27, 2018, 10:29 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ServiceConstants.java
>  2d21411e2 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermissionsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
>  b8f5ce7db 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/6/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-09-28 Thread Arjun Mishra via Review Board


> On Sept. 28, 2018, 4:56 p.m., Sergio Pena wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
> > Line 158 (original), 168-172 (patched)
> > 
> >
> > I don't think this logic whether use a cache or not should be in this 
> > clas. The retriever should call the cache instead (or call the SentryStore 
> > if it needs to invalidate the current cache image).

This is because the logic to invalidate cache is dependent on whether delta 
updates are being sent or not. Since the decisision to send delta updates is 
done in DBUpdateForwarder this logic sits in this piece of code


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209098
---


On Sept. 27, 2018, 10:29 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Sept. 27, 2018, 10:29 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ServiceConstants.java
>  2d21411e2 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermissionsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
>  b8f5ce7db 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/6/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-09-28 Thread Arjun Mishra via Review Board


> On Sept. 28, 2018, 4:56 p.m., Sergio Pena wrote:
> > sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ServiceConstants.java
> > Lines 53-55 (patched)
> > 
> >
> > Why is a configuration needed to whether invalidate the cache or not?
> > 
> > Cache invalidation should be done by a certain condition that happens 
> > on the server. For instance, if the snapshot ID read from the DB is 
> > different from the cache, then it needs to invalidate the whole snapshot in 
> > the cache, or if the latest notification in the DB is newer than the cache, 
> > then it invalidates.
> > 
> > I prefer not to add a new configuration for this.

Sure I can remove this configuration


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209098
---


On Sept. 27, 2018, 10:29 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Sept. 27, 2018, 10:29 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ServiceConstants.java
>  2d21411e2 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermissionsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
>  b8f5ce7db 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/6/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-09-28 Thread Arjun Mishra via Review Board


> On Sept. 28, 2018, 4:56 p.m., Sergio Pena wrote:
> > What if NN does not request full images anymore? How can the cache be 
> > released from memory to avoid unused cache for the rest of the process life?

Sergio, the cache is releases as soon as Deltas are being sent to NN


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209098
---


On Sept. 27, 2018, 10:29 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Sept. 27, 2018, 10:29 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ServiceConstants.java
>  2d21411e2 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermissionsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
>  b8f5ce7db 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/6/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-09-28 Thread Sergio Pena via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209098
---



What if NN does not request full images anymore? How can the cache be released 
from memory to avoid unused cache for the rest of the process life?


sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ServiceConstants.java
Lines 53-55 (patched)


Why is a configuration needed to whether invalidate the cache or not?

Cache invalidation should be done by a certain condition that happens on 
the server. For instance, if the snapshot ID read from the DB is different from 
the cache, then it needs to invalidate the whole snapshot in the cache, or if 
the latest notification in the DB is newer than the cache, then it invalidates.

I prefer not to add a new configuration for this.



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
Line 158 (original), 168-172 (patched)


I don't think this logic whether use a cache or not should be in this clas. 
The retriever should call the cache instead (or call the SentryStore if it 
needs to invalidate the current cache image).



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
Lines 136-137 (patched)


Why is the cache needed to be instantiated and passed as aparameter instead 
of instatiating the cache inside the PathImageRetriever and PermImageRetriever 
instead?

A retriever returns the paths, but if they're cache, then it returns the 
ones from the cache.


- Sergio Pena


On Sept. 27, 2018, 10:29 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Sept. 27, 2018, 10:29 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ServiceConstants.java
>  2d21411e2 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermissionsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
>  b8f5ce7db 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/6/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-09-28 Thread kalyan kumar kalvagadda via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review209100
---


Fix it, then Ship it!




Fix it


sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
Lines 191-194 (patched)


nit. There is no need for invalidateCache API in DBUpdateForwarder.


- kalyan kumar kalvagadda


On Sept. 27, 2018, 10:29 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Sept. 27, 2018, 10:29 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ServiceConstants.java
>  2d21411e2 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermissionsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
>  b8f5ce7db 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/6/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-09-27 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/
---

(Updated Sept. 27, 2018, 10:29 p.m.)


Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.


Changes
---

Post feedback


Bugs: SENTRY-2370
https://issues.apache.org/jira/browse/SENTRY-2370


Repository: sentry


Description
---

When NN requests path updates from sentry and if it exceeds the time threshold, 
on consecutive attempts sentry will attempt to fetch the full update from 
scratch. Instead it should cache it and update the cache before sending it to NN


Diffs (updated)
-

  
sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageCache.java
 PRE-CREATION 
  
sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ServiceConstants.java
 2d21411e2 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
 08b16a4df 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathsUpdateImageCache.java
 PRE-CREATION 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermissionsUpdateImageCache.java
 PRE-CREATION 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
 b8f5ce7db 
  
sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
 f86ce6f83 


Diff: https://reviews.apache.org/r/68547/diff/6/

Changes: https://reviews.apache.org/r/68547/diff/5-6/


Testing
---


Thanks,

Arjun Mishra



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-09-15 Thread kalyan kumar kalvagadda via Review Board


> On Sept. 12, 2018, 8:18 p.m., Na Li wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
> > Lines 137 (patched)
> > 
> >
> > This condition can be changed to
> > if(invalidateCacheOnSendingDeltas && imageCache.hasCache())
> > 
> > as long as imageCache has cache, we should invalidate it. It has 
> > nothing to do with if the cache is current or not.

+1 on lina's comment.

This logic can be moved above. As soon as we know that fullsnapshot need not be 
sent, cache can be invalidated, right.


- kalyan kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review208573
---


On Sept. 12, 2018, 6:41 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Sept. 12, 2018, 6:41 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ServiceConstants.java
>  2d21411e2 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermissionsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
>  b8f5ce7db 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/5/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-09-12 Thread Na Li via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review208573
---




sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
Lines 137 (patched)


This condition can be changed to
if(invalidateCacheOnSendingDeltas && imageCache.hasCache())

as long as imageCache has cache, we should invalidate it. It has nothing to 
do with if the cache is current or not.



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathsUpdateImageCache.java
Lines 24 (patched)


this should be

PathsUpdateImageCache implements ImageCache



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathsUpdateImageCache.java
Lines 56 (patched)


this should be
if(pathsUpdateImageCache.getSeqNum() < sequenceNum)

When system starts, NN will ask with sequenceNum = 0, and full snapshot 
will have pathsUpdateImageCache.getSeqNum() >= 0. 

Have you tested that the cache mechanism does work as expected? Will 
isImageCurrent() always return false even when cache is up-to-date?



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermissionsUpdateImageCache.java
Lines 24 (patched)


same problem



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermissionsUpdateImageCache.java
Lines 50 (patched)


same issue


- Na Li


On Sept. 12, 2018, 6:41 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Sept. 12, 2018, 6:41 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ServiceConstants.java
>  2d21411e2 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermissionsUpdateImageCache.java
>  PRE-CREATION 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
>  b8f5ce7db 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/5/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-09-12 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/
---

(Updated Sept. 12, 2018, 6:41 p.m.)


Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.


Bugs: SENTRY-2370
https://issues.apache.org/jira/browse/SENTRY-2370


Repository: sentry


Description
---

When NN requests path updates from sentry and if it exceeds the time threshold, 
on consecutive attempts sentry will attempt to fetch the full update from 
scratch. Instead it should cache it and update the cache before sending it to NN


Diffs (updated)
-

  
sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageCache.java
 PRE-CREATION 
  
sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ServiceConstants.java
 2d21411e2 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
 08b16a4df 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathsUpdateImageCache.java
 PRE-CREATION 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermissionsUpdateImageCache.java
 PRE-CREATION 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryPlugin.java
 b8f5ce7db 
  
sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
 f86ce6f83 


Diff: https://reviews.apache.org/r/68547/diff/5/

Changes: https://reviews.apache.org/r/68547/diff/4-5/


Testing
---


Thanks,

Arjun Mishra



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-08-30 Thread Arjun Mishra via Review Board


> On Aug. 29, 2018, 1:28 p.m., kalyan kumar kalvagadda wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
> > Lines 52-53 (original), 54-71 (patched)
> > 
> >
> > General comment on the solution. From the description you are trying 
> > efficiently serve the full update request while is received immediatly 
> > after the timeout failure. If not please explain other cases you are trying 
> > to solve.
> > 
> > 
> > 1. With this approach sentry would always store a full paths update all 
> > the time. This will increase the memory foot print of sentry. This is not 
> > good.
> > 
> > 2. Why is thsi logic added only for Path updates but not permission 
> > updates.
> > 
> > 3. I would prefer you differ this patch till SENTRY-2287 is fixed. Once 
> > patch for SENTRY-2287 is committed you will have place holder to reset the 
> > cache when the NN acknowledges that it received full update. That wat the 
> > cache is held only short period.
> 
> Arjun Mishra wrote:
> Kalyan, agreed that it would be good to clear the cache once NN 
> acknoledges the receipt of udpates, but that could be worked on as a part of 
> a different ticket. I think this will significantly help with issues where NN 
> request timeout triggers another fetch from Sentry Db. The increase to heap 
> size will be needed but the benefits outway the problem. Let me know your 
> thoughts. 
> 
> I have added Perm Update cache too. I can also add a configuration to 
> enable path or perm caching
> 
> kalyan kumar kalvagadda wrote:
> We need to really carefull before introducing caching. When there we know 
> that higher timeout value avaoid this failure there is no hurry in putting a 
> solution that increases the memory foot print and other potential bugs.

Thanks Kalyan. The higher tmeout fixes the issue, but the timeout threshold is 
arbitrary and may take several iterations to get it right. Also its hard to 
evaluate what the timeout should be based on HMS size.
In any case I will wait for your patch so the cache can be invalidated


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review208090
---


On Aug. 29, 2018, 7:04 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Aug. 29, 2018, 7:04 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageRetriever.java
>  11b75411d 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
>  2b1618134 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermImageRetriever.java
>  f3a2d5028 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestImageRetriever.java
>  fb42b279a 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-08-29 Thread Arjun Mishra via Review Board


> On Aug. 29, 2018, 3:54 p.m., Na Li wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
> > Lines 52-53 (original), 54-71 (patched)
> > 
> >
> > "reset the cache when the NN acknowledges that it received full update."
> > In the future, it is possible that the same sentry server serves 
> > multiple clients. So the server behavior should not be based on client 
> > state.

Thanks Lina. With SENTRY-2287 we can identify if an update was received by NN. 
But that is still being worked on


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review208098
---


On Aug. 29, 2018, 7:04 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Aug. 29, 2018, 7:04 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageRetriever.java
>  11b75411d 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
>  2b1618134 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermImageRetriever.java
>  f3a2d5028 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestImageRetriever.java
>  fb42b279a 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-08-29 Thread Arjun Mishra via Review Board


> On Aug. 29, 2018, 1:28 p.m., kalyan kumar kalvagadda wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
> > Lines 52-53 (original), 54-71 (patched)
> > 
> >
> > General comment on the solution. From the description you are trying 
> > efficiently serve the full update request while is received immediatly 
> > after the timeout failure. If not please explain other cases you are trying 
> > to solve.
> > 
> > 
> > 1. With this approach sentry would always store a full paths update all 
> > the time. This will increase the memory foot print of sentry. This is not 
> > good.
> > 
> > 2. Why is thsi logic added only for Path updates but not permission 
> > updates.
> > 
> > 3. I would prefer you differ this patch till SENTRY-2287 is fixed. Once 
> > patch for SENTRY-2287 is committed you will have place holder to reset the 
> > cache when the NN acknowledges that it received full update. That wat the 
> > cache is held only short period.

Kalyan, agreed that it would be good to clear the cache once NN acknoledges the 
receipt of udpates, but that could be worked on as a part of a different 
ticket. I think this will significantly help with issues where NN request 
timeout triggers another fetch from Sentry Db. The increase to heap size will 
be needed but the benefits outway the problem. Let me know your thoughts. 

I have added Perm Update cache too. I can also add a configuration to enable 
path or perm caching


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review208090
---


On Aug. 29, 2018, 7:04 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Aug. 29, 2018, 7:04 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageRetriever.java
>  11b75411d 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
>  08b16a4df 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
>  2b1618134 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermImageRetriever.java
>  f3a2d5028 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
>  f86ce6f83 
>   
> sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestImageRetriever.java
>  fb42b279a 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-08-29 Thread Arjun Mishra via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/
---

(Updated Aug. 29, 2018, 7:04 p.m.)


Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.


Changes
---

Updated patch with PermChange cache included


Bugs: SENTRY-2370
https://issues.apache.org/jira/browse/SENTRY-2370


Repository: sentry


Description
---

When NN requests path updates from sentry and if it exceeds the time threshold, 
on consecutive attempts sentry will attempt to fetch the full update from 
scratch. Instead it should cache it and update the cache before sending it to NN


Diffs (updated)
-

  
sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/ImageRetriever.java
 11b75411d 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/DBUpdateForwarder.java
 08b16a4df 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
 2b1618134 
  
sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PermImageRetriever.java
 f3a2d5028 
  
sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestDBUpdateForwarder.java
 f86ce6f83 
  
sentry-hdfs/sentry-hdfs-service/src/test/java/org/apache/sentry/hdfs/TestImageRetriever.java
 fb42b279a 


Diff: https://reviews.apache.org/r/68547/diff/2/

Changes: https://reviews.apache.org/r/68547/diff/1-2/


Testing
---


Thanks,

Arjun Mishra



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-08-29 Thread Na Li via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review208098
---




sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
Lines 52-53 (original), 54-71 (patched)


"reset the cache when the NN acknowledges that it received full update."
In the future, it is possible that the same sentry server serves multiple 
clients. So the server behavior should not be based on client state.


- Na Li


On Aug. 28, 2018, 7:08 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Aug. 28, 2018, 7:08 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
>  2b1618134 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-08-29 Thread Arjun Mishra via Review Board


> On Aug. 29, 2018, 1:28 p.m., kalyan kumar kalvagadda wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
> > Line 52 (original), 54 (patched)
> > 
> >
> > This condition is not correct. When cachePathsUpdate is not null it 
> > will always return cachePathsUpdate. cachePathsUpdate will never be updated.

I caught that. It will be in the new patch


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review208090
---


On Aug. 28, 2018, 7:08 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Aug. 28, 2018, 7:08 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
>  2b1618134 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-08-29 Thread kalyan kumar kalvagadda via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review208090
---




sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
Line 52 (original), 54 (patched)


This condition is not correct. When cachePathsUpdate is not null it will 
always return cachePathsUpdate. cachePathsUpdate will never be updated.



sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
Lines 52-53 (original), 54-71 (patched)


General comment on the solution. From the description you are trying 
efficiently serve the full update request while is received immediatly after 
the timeout failure. If not please explain other cases you are trying to solve.

1. With this approach sentry would always store a full paths update all the 
time. This will increase the memory foot print of sentry. This is not good.

2. Why is thsi logic added only for Path updates but not permission updates.

3. I would prefer you differ this patch till SENTRY-2287 is fixed. Once 
patch for SENTRY-2287 is committed you will have place holder to reset the 
cache when the NN acknowledges that it received full update. That wat the cache 
is held only short period.


- kalyan kumar kalvagadda


On Aug. 28, 2018, 7:08 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Aug. 28, 2018, 7:08 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
>  2b1618134 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-08-28 Thread Arjun Mishra via Review Board


> On Aug. 28, 2018, 8:07 p.m., Na Li wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
> > Lines 67 (patched)
> > 
> >
> > This is wrong. 
> > ImageID is for fullsnapshot, Within the same full snapshot, there is 
> > changID for each path change. You have to check the max changID in the 
> > cached update to make sure there is no new change.

Yep. My bad


- Arjun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review208061
---


On Aug. 28, 2018, 7:08 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Aug. 28, 2018, 7:08 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
>  2b1618134 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>



Re: Review Request 68547: SENTRY-2370: Create a cache of PathUpdates to send to NN

2018-08-28 Thread Na Li via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68547/#review208061
---




sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
Lines 67 (patched)


This is wrong. 
ImageID is for fullsnapshot, Within the same full snapshot, there is 
changID for each path change. You have to check the max changID in the cached 
update to make sure there is no new change.


- Na Li


On Aug. 28, 2018, 7:08 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68547/
> ---
> 
> (Updated Aug. 28, 2018, 7:08 p.m.)
> 
> 
> Review request for sentry, kalyan kumar kalvagadda, Na Li, and Sergio Pena.
> 
> 
> Bugs: SENTRY-2370
> https://issues.apache.org/jira/browse/SENTRY-2370
> 
> 
> Repository: sentry
> 
> 
> Description
> ---
> 
> When NN requests path updates from sentry and if it exceeds the time 
> threshold, on consecutive attempts sentry will attempt to fetch the full 
> update from scratch. Instead it should cache it and update the cache before 
> sending it to NN
> 
> 
> Diffs
> -
> 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/PathImageRetriever.java
>  2b1618134 
> 
> 
> Diff: https://reviews.apache.org/r/68547/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>