Re: Scalability - large numbers of users/groups in LDAP
On 22/02/2017 19:28, Sailaja Polavarapu wrote: > Hi Nigel Jones, > As part of incremental sync support for ranger, I was reading through MS AD documentation for memberof attribute. According to the documentation, it looks like memberof attribute value is not stored and is always computed on-fly from the member attribute of the group. > In OpenLdap case, the memberof attribute is not enabled by default as part of the schema. It has to be enabled manually. As far as I know, openLdap doesn’t maintain the back-link between the memberof attribute of user and member/memberUid attribute of the group. It is up to the admin to create these values while adding/updating the users and groups. And the memberof attribute is stored in the schema and the value is retrieved as is without any computation from group member attribute. Thanks, it does look like I have a workable solution to go with (to be verified of course) * Get a list of roles that will participate in this environment (in fact these will source from Apache Atlas as that stores some entity:role associations in our case) * Query ldap for the users in those role (ldap groups) * Push as user/groups into ranger with a new "usersync" process This preserves the current approach ranger takes with just a tweak to the source of the user & role information ;-)
Re: Scalability - large numbers of users/groups in LDAP
On 22/02/2017 16:43, Nigel Jones wrote: Will raise a JIRA I just came across RANGER-1211 . this talks about optimizing user sync through an incremental approach. Can anyone help with a MS AD question The document implies that the memberOf attribute on a user is *computed*, which would suggest it's ALWAYS possible to EFFICIENTLY retrieve the list of users that are member of a known role (member attribute against the group). Is this indeed the case? Only MD? How about OpenLDAP ? If so my problem probably goes away... Thanks
Re: Scalability - large numbers of users/groups in LDAP
On 12/02/2017 10:40, Zsombor wrote: If the performance of the LDAP server is ever become a bottleneck, I would rather see a dedicated/embedded LDAP server which is syncronized automatically from the main LDAP server. I guess, this could be more easily implemented than a complex partial synchronization/cache scenario. Having explored the requirements a little more, though I can identify a small set of roles (ldap groups) that constrain the number of users I'd want to replicate into ranger. there's no easy way to do this via an LDAP query (which contains huge number of users and can't easily be changed in this environment) - for example I can't query the list of users belonging to a particular role. In any case over time the number of users will increase. If I were to ONLY sync the groups, and not the users what breaks.. 1. In the ranger UI I can only define policies based on roles (groups) - which seems fine (if combined with a few local/admin roles) 2. Each Plugin would need to issue an LDAP query on first request to pull in user attributes from LDAP specifically in order to determine role/userid association. Issues here include a) Extra configuration for the plugin, b) a realtime query to a remote system - something we do not do in ranger today. In mitigation if LDAP is down other infrastructure breaks in any case, and an LDAP query is quick c) This needs to go into every plugin as configurable d) This could occur during any kind of connection phase or on first request by a particular user - but how is somewhat engine dependent. Will raise a JIRA
Re: Scalability - large numbers of users/groups in LDAP
Just want to add few more points inline... >> - what additional attributes are pulled Currently we pull following attributes as part of ldap search: For Users: username (like uid, samaccountname, etc…) and user group member attribute (memberof, ismemberof, etc…) For Groups: group member attribute (member, memberuid, etc…) and group name attribute (cn, samaccountname, etc…) All these are configurable properties in usersync. Thanks, Sailaja. On 2/10/17, 9:26 AM, "Nigel Jones"wrote: >On 10/02/2017 17:07, Don Bosco Durai wrote: > > > 1.Ranger should have an option just to sync Group (without >users). We should be already supporting it or there was an intention to >support. If we are not doing it for any reason, I am a strong +1 for >doing it. >I'll experiment with this - only working off the docs so far, trying it >out is next :-) [Sailaja]: Currently we support syncing groups that don’t contain any users. But if the group contains users (as part of member attribute), we still sync those users. Ofcourse, you can tweak the user search configuration in order to not sync users by providing an invalid/non-matching user search filter. This is kind of dirty work around. Same is the case with syncing just users and not groups. I agree that it will be better if we can support syncing just users or just groups for flexibility. > > > 2.Direct LDAP would have been ideal, but we were worried about >the load we might put on LDAP for real-time queries. Just FYI, Ranger >uses LDAP/AD for authentication and easy selection of users/groups >during policy create. For authentication, it is already real-time (even >though I would have preferred to get the roles also in real-time). >A fair concern, though at least it's only at connect time. The >enterprise I spoke to didn't seem to think it was a concern. I'll start >with option #1 though [Sailaja]: Other main reason that we are syncing users/groups from LDAP upfront is to make these available for configuring policies in ranger. > > > If you have a very high number of users/groups, then the short-term >recommendation to is to apply LDAP filters and limit syncing users only >to those using Hadoop. >This will be extending outside hadoop - I'm trying to determine how to >constrain the ldap query to the users using the relevant systems. I can >potentially obtain a list of groups from elsewhere via a new usersync >process, and then go back into ldap to query membership which would look >the same to ranger, just modify that sync. > >Thanks for the info ! > >Nigel. > >
Re: Scalability - large numbers of users/groups in LDAP
Seems you are suggesting two scenarios. 1.Ranger should have an option just to sync Group (without users). We should be already supporting it or there was an intention to support. If we are not doing it for any reason, I am a strong +1 for doing it. 2.Direct LDAP would have been ideal, but we were worried about the load we might put on LDAP for real-time queries. Just FYI, Ranger uses LDAP/AD for authentication and easy selection of users/groups during policy create. For authentication, it is already real-time (even though I would have preferred to get the roles also in real-time). If you have a very high number of users/groups, then the short-term recommendation to is to apply LDAP filters and limit syncing users only to those using Hadoop. Thanks Bosco On 2/10/17, 6:20 AM, "Nigel Jones"wrote: On 10/02/2017 09:58, Velmurugan Periasamy wrote: > Hi Nigel: > > Thanks for starting an interesting thread. > I believe this is already addressed by https://issues.apache.org/jira/browse/RANGER-869. Please take a look. I took a look - indeed I had noticed this option to go via groups and lookup "member" which does mitigate the issue somewhat, depending on the number of groups In the environment I'm thinking of I can probably find an "interesting" list of groups. So I could modify usersync to not just use the group->member lookup, but also to ONLY do that for certain groups (I'll probably need "groupsync" for that... !) Whether this work depends on how the ldap server is set up... I need to take a look.. if so this is probably good enough for now. But I'm still wondering if we really need to sync users at all since at some point any kind of connector/engine may well be doing an ldap lookup anyway - certainly that's true in an engine -- Apache Derby based - that I'm looking at (and developing a plugin for). This may become more important for large numbers of groups and users especially if we consider applying ranger plugins to technologies used by a broad set of users. Out of interest I just noticed in the nifi mailing lists that there was a recent thread on "LDAP Group Authorization". There is some discussion of native nifi+ranger, but in either case the question about why not get the info direct from ldap at connect time is being made. intriguing ... Thanks for the link ... mulling over some more :-) nigel.
Re: Scalability - large numbers of users/groups in LDAP
On 10/02/2017 09:58, Velmurugan Periasamy wrote: > Hi Nigel: > > Thanks for starting an interesting thread. > I believe this is already addressed by https://issues.apache.org/jira/browse/RANGER-869. Please take a look. I took a look - indeed I had noticed this option to go via groups and lookup "member" which does mitigate the issue somewhat, depending on the number of groups In the environment I'm thinking of I can probably find an "interesting" list of groups. So I could modify usersync to not just use the group->member lookup, but also to ONLY do that for certain groups (I'll probably need "groupsync" for that... !) Whether this work depends on how the ldap server is set up... I need to take a look.. if so this is probably good enough for now. But I'm still wondering if we really need to sync users at all since at some point any kind of connector/engine may well be doing an ldap lookup anyway - certainly that's true in an engine -- Apache Derby based - that I'm looking at (and developing a plugin for). This may become more important for large numbers of groups and users especially if we consider applying ranger plugins to technologies used by a broad set of users. Out of interest I just noticed in the nifi mailing lists that there was a recent thread on "LDAP Group Authorization". There is some discussion of native nifi+ranger, but in either case the question about why not get the info direct from ldap at connect time is being made. intriguing ... Thanks for the link ... mulling over some more :-) nigel.
Re: Scalability - large numbers of users/groups in LDAP
Hi Nigel: Thanks for starting an interesting thread. > In some environments selecting a subset of groups (which may be used as > roles), and just pulling users there MAY help if the applications being > secured have a more limited audience I believe this is already addressed by https://issues.apache.org/jira/browse/RANGER-869. Please take a look. Thank you, Vel From: Nigel Jones <jon...@uk.ibm.com<mailto:jon...@uk.ibm.com>> Reply-To: "dev@ranger.apache.org<mailto:dev@ranger.apache.org>" <dev@ranger.apache.org<mailto:dev@ranger.apache.org>> Date: Friday, February 10, 2017 at 2:41 AM To: "d...@ranger.incubator.apache.org<mailto:d...@ranger.incubator.apache.org>" <d...@ranger.incubator.apache.org<mailto:d...@ranger.incubator.apache.org>> Subject: Scalability - large numbers of users/groups in LDAP I've been mulling over an issue recently and interested in any thoughts... I'm pretty new to ranger to very ready to hear why this could never work ;-) Today in an LDAP-managed enterprise environment user & group information is replicated from the LDAP server such as MS Active Directory by the usersync process. I have some control over - the base DN - whether to pull a list of groups from each user, or users from groups - what additional attributes are pulled This is then persisted in ranger & gets pulled by the plugins However in some environments - the numbers of users in LDAP could be very high (100,000+) - it may be difficult to scope the query where ranger is securing access to an enterprise service If we assume any kind of service that involves a "connect" as well read/write operations there could be an opportunity to retrieve user/group information for that user at that point. It could then be saved within the plugin to be used at data access time. As a variation, Potentially we could still populate groups (or role) information in the ranger server, making it easier for policy definitions Has anyone considered this as an option? In some environments selecting a subset of groups (which may be used as roles), and just pulling users there MAY help if the applications being secured have a more limited audience if it sounds interesting I'm inclined to work through the flows in more detail Thanks Nigel.
Scalability - large numbers of users/groups in LDAP
I've been mulling over an issue recently and interested in any thoughts... I'm pretty new to ranger to very ready to hear why this could never work ;-) Today in an LDAP-managed enterprise environment user & group information is replicated from the LDAP server such as MS Active Directory by the usersync process. I have some control over - the base DN - whether to pull a list of groups from each user, or users from groups - what additional attributes are pulled This is then persisted in ranger & gets pulled by the plugins However in some environments - the numbers of users in LDAP could be very high (100,000+) - it may be difficult to scope the query where ranger is securing access to an enterprise service If we assume any kind of service that involves a "connect" as well read/write operations there could be an opportunity to retrieve user/group information for that user at that point. It could then be saved within the plugin to be used at data access time. As a variation, Potentially we could still populate groups (or role) information in the ranger server, making it easier for policy definitions Has anyone considered this as an option? In some environments selecting a subset of groups (which may be used as roles), and just pulling users there MAY help if the applications being secured have a more limited audience if it sounds interesting I'm inclined to work through the flows in more detail Thanks Nigel.