Smit,

 

I understand the reasoning to leverage existing Ranger tag-sync and tag-store 
implementation, instead of going with a custom context-enricher. While this is 
feasible, it will require use of internal APIs which could change in future 
releases. If you still want to provide an alternate source for tags, I suggest 
to consider extending org.apache.ranger.tagsync.model.AbstractTagSource, 
similar to AtlasTagSource, and register using with following configurations in 
ranger-tagsync-site.xml:

ranger.tagsync.source.<name-of-your-source>=true

ranger.tagsync.source.<name-of-your-source>.class=<implementation-class-name>

 

Hope this helps.

 

Madhan

 

From: Smit Shah <[email protected]>
Date: Tuesday, September 1, 2020 at 4:06 PM
To: Madhan Neethiraj <[email protected]>, "[email protected]" 
<[email protected]>
Cc: "[email protected]" <[email protected]>, "[email protected]" 
<[email protected]>
Subject: Re: Help: Tag based policy for non-Atlas solution

 

Hi Madhan, 

Thank you for writing back with suggestion. 

I would like to get some more insights on few options and general questions 
based on the suggestion provided and more investigation.

 

Option A: The solution you suggested (it’s really helpful)
With this we will not be leveraging ranger-tagsync process and all the tag 
related tables (ranger.x_tag*) that Ranger maintains. I can think of two 
challenges to tackle for us:
For our high request demand, the end-point which retrieves tags for resource 
needs to be highly available, faster and handle concurrent requests. 
If incase the end-point or our tag store is down, it will fail and we have to 
either make the resource request deny/pass-through. 
 

Option B: Leveraging ranger-tagsync process

Similar to how Ranger listens to Atlas’s Kafka topic, we can create an Apache 
Kafka topic for our tag stores change notification and let ranger-tagsync 
process listen to it. We can skip Option A.

Many of the property name defined inside install.properties are specific to 
Atlas. So, not sure if ranger-tagsync is designed specifically for Atlas. 
Can you think of any challenges here? 

Option C: Storing our tags directly inside Rangers internal tag store
There are end-points provided by Ranger that we can leverage. So, instead of 
implementing content enricher (Option A), we can store our tags inside ranger 
tag-store and let Ranger work the normal way. 

Can you think of any challenges here?   




General question:

Does Ranger plugins also keep a cached version of the rangers internal 
tag-store apart from policy? Trying to see if there are benefits of putting our 
tag details inside rangers tag-store.





Overall, Option B seems like a better option to me if possible to implement. 

 

 

SMIT SHAH
SDE, Big Data
Pronouns: he/him/his
 

 

From: Madhan Neethiraj <[email protected]>
Date: Monday, August 31, 2020 at 1:28 AM
To: Smit Shah <[email protected]>, "[email protected]" 
<[email protected]>
Cc: "[email protected]" <[email protected]>, "[email protected]" 
<[email protected]>, "[email protected]" <[email protected]>
Subject: Re: Help: Tag based policy for non-Atlas solution

 

Smit,

 

I suggest to consider implementing a context enricher that deals with 
retrieving tags from your tag store and sets tags for the resource in the 
request-context, with a call to 
RangerAccessRequestUtil.setRequestTagsInContext(context, tags). Tag service-def 
should be updated to register this context enricher, instead of current 
enricher implementation (RangerAdminTagRetriever).

 

Hope this helps.

 

Madhan

 

 

 

From: Smit Shah <[email protected]>
Date: Wednesday, August 26, 2020 at 3:59 PM
To: "[email protected]" <[email protected]>
Cc: "[email protected]" <[email protected]>, "[email protected]" 
<[email protected]>, "[email protected]" <[email protected]>
Subject: Help: Tag based policy for non-Atlas solution

 

cc: Team Members who created Confluence wiki pages that I have referred

 

Hi Apache Ranger Dev Team, 

I am Smit Shah, working at Zillow as a Data Engineer. My team is working on 
Data Governance around Apache Hive. We came across Apache Ranger and one of the 
key feature we like is Tag Based Policies, and really interested to leverage 
this. :)

Now, when going through the documentation for Tag Based Policies, I found that 
Tag Sync has native support for Apache Atlas. Now, our team already has our own 
tag store and trying to avoid adding another layer. So, checking with the team 
if there are any examples/blogs/documentation that you can share which can help 
to: 
1. Store tags
2. How to make tag based policy work in Apache Ranger for non Apache Atlas 
solution 

Some web-pages that I came across during my initial investigation: 
1. Context enrichers – Not sure if this is important for my use-case
2. Installing Tag Synchronizer – How to make this work for non-Atlas solution
3. Ranger API – This might be needed for storing tags, like we can create 
service which calls this end-point which takes data from our tag store and 
store it in Ranger in required format. 


You help/details will be really helpful to us. Sending email seemed like the 
best way to reach out to the team. Thank you very much in advance. :)

 

SMIT SHAH
SDE, Big Data
Pronouns: he/him/his
 

Reply via email to