Re: Scalability - large numbers of users/groups in LDAP

2017-03-08 Thread Nigel Jones

On 22/02/2017 19:28, Sailaja Polavarapu wrote:
> Hi Nigel Jones,
>  As part of incremental sync support for ranger, I was reading 
through MS AD documentation for memberof attribute. According to the 
documentation, it looks like memberof attribute value is not stored and 
is always computed on-fly from the member attribute of the group.
> In OpenLdap case, the memberof attribute is not enabled by default as 
part of the schema. It has to be enabled manually. As far as I know, 
openLdap doesn’t maintain the back-link between the memberof attribute 
of user and member/memberUid attribute of the group. It is up to the 
admin to create these values while adding/updating the users and groups. 
And the memberof attribute is stored in the schema and the value is 
retrieved as is without any computation from group member attribute.


Thanks, it does look like I have a workable solution to go with (to be 
verified of course)


* Get a list of roles that will participate in this environment (in fact 
these will source from Apache Atlas as that stores some entity:role 
associations in our case)

* Query ldap for the users in those role (ldap groups)
* Push as user/groups into ranger with a new "usersync" process

This preserves the current approach ranger takes with just a tweak to 
the source of the user & role information ;-)





Re: Scalability - large numbers of users/groups in LDAP

2017-02-22 Thread Nigel Jones

On 22/02/2017 16:43, Nigel Jones wrote:


Will raise a JIRA


I just came across RANGER-1211 . this talks about optimizing user 
sync through an incremental approach.


Can anyone help with a MS AD question

The document implies that the memberOf attribute on a user is 
*computed*, which would suggest it's ALWAYS possible to EFFICIENTLY 
retrieve the list of users that are member of a known role (member 
attribute against the group). Is this indeed the case? Only MD? How 
about OpenLDAP ?


If so my problem probably goes away...

Thanks




Re: Scalability - large numbers of users/groups in LDAP

2017-02-22 Thread Nigel Jones

On 12/02/2017 10:40, Zsombor wrote:

If the performance of the LDAP server is ever become a bottleneck, I would
rather see a dedicated/embedded LDAP server which is syncronized
automatically from the main LDAP server. I guess, this could be more easily
implemented than a complex partial synchronization/cache scenario.


Having explored the requirements a little more, though I can identify a 
small set of roles (ldap groups) that constrain the number of users I'd 
want to replicate into ranger. there's no easy way to do this via an 
LDAP query (which contains huge number of users and can't easily be 
changed in this environment) - for example I can't query the list of 
users belonging to a particular role. In any case over time the number 
of users will increase.


If I were to ONLY sync the groups, and not the users what breaks..

1. In the ranger UI I can only define policies based on roles (groups) - 
which seems fine (if combined with a few local/admin roles)
2. Each Plugin would need to issue an LDAP query on first request to 
pull in user attributes from LDAP specifically in order to determine 
role/userid association. Issues here include

 a) Extra configuration for the plugin,
 b) a realtime query to a remote system - something we do not do in 
ranger today. In mitigation if LDAP is down other infrastructure breaks 
in any case, and an LDAP query is quick

 c) This needs to go into every plugin as configurable
 d) This could occur during any kind of connection phase or on first 
request by a particular user - but how is somewhat engine dependent.


Will raise a JIRA






Re: Scalability - large numbers of users/groups in LDAP

2017-02-10 Thread Sailaja Polavarapu
Just want to add few more points inline...

>> - what additional attributes are pulled
Currently we pull following attributes as part of ldap search:
For Users: username (like uid, samaccountname, etc…) and user group member 
attribute (memberof, ismemberof, etc…)
For Groups: group member attribute (member, memberuid, etc…) and group name 
attribute (cn, samaccountname, etc…)

All these are configurable properties in usersync.

Thanks,
Sailaja.





On 2/10/17, 9:26 AM, "Nigel Jones"  wrote:

>On 10/02/2017 17:07, Don Bosco Durai wrote:
>
> > 1.Ranger should have an option just to sync Group (without 
>users). We should be already supporting it or there was an intention to 
>support.  If we are not doing it for any reason, I am a strong +1 for 
>doing it.
>I'll experiment with this - only working off the docs so far, trying it 
>out is next :-)
[Sailaja]: Currently we support syncing groups that don’t contain any users. 
But if the group contains users (as part of member attribute), we still sync 
those users. Ofcourse, you can tweak the user search configuration in order to 
not sync users by providing an invalid/non-matching user search filter. This is 
kind of dirty work around. Same is the case with syncing just users and not 
groups.
I agree that it will be better if we can support syncing just users or just 
groups for flexibility.

>
> > 2.Direct LDAP would have been ideal, but we were worried about 
>the load we might put on LDAP for real-time queries. Just FYI, Ranger 
>uses LDAP/AD for authentication and easy selection of users/groups 
>during policy create. For authentication, it is already real-time (even 
>though I would have preferred to get the roles also in real-time).
>A fair concern, though at least it's only at connect time. The 
>enterprise I spoke to didn't seem to think it was a concern. I'll start 
>with option #1 though
[Sailaja]: Other main reason that we are syncing users/groups from LDAP upfront 
is to make these available for configuring policies in ranger. 
>
> > If you have a very high number of users/groups, then the short-term 
>recommendation to is to apply LDAP filters and limit syncing users only 
>to those using Hadoop.
>This will be extending outside hadoop - I'm trying to determine how to 
>constrain the ldap query to the users using the relevant systems. I can 
>potentially obtain a list of groups from elsewhere via a new usersync 
>process, and then go back into ldap to query membership which would look 
>the same to ranger, just modify that sync.
>
>Thanks for the info !
>
>Nigel.
>
>


Re: Scalability - large numbers of users/groups in LDAP

2017-02-10 Thread Don Bosco Durai
Seems you are suggesting two scenarios.

1.Ranger should have an option just to sync Group (without users). We 
should be already supporting it or there was an intention to support.  If we 
are not doing it for any reason, I am a strong +1 for doing it. 
2.Direct LDAP would have been ideal, but we were worried about the load we 
might put on LDAP for real-time queries. Just FYI, Ranger uses LDAP/AD for 
authentication and easy selection of users/groups during policy create. For 
authentication, it is already real-time (even though I would have preferred to 
get the roles also in real-time). 

If you have a very high number of users/groups, then the short-term 
recommendation to is to apply LDAP filters and limit syncing users only to 
those using Hadoop.

Thanks

Bosco


On 2/10/17, 6:20 AM, "Nigel Jones"  wrote:

On 10/02/2017 09:58, Velmurugan Periasamy wrote:
 > Hi Nigel:
 >
 > Thanks for starting an interesting thread.

 > I believe this is already addressed by 
https://issues.apache.org/jira/browse/RANGER-869. Please take a look.

I took a look - indeed I had noticed this option to go via groups and 
lookup "member" which does mitigate the issue somewhat, depending on the 
number of groups

In the environment I'm thinking of I can probably find an "interesting" 
list of groups. So I could modify usersync to not just use the 
group->member lookup, but also to ONLY do that for certain groups (I'll 
probably need "groupsync" for that... !)

Whether this work depends on how the ldap server is set up... I need to 
take a look.. if so this is probably good enough for now.

But I'm still wondering if we really need to sync users at all since at 
some point any kind of connector/engine may well be doing an ldap lookup 
anyway - certainly that's true in an engine -- Apache Derby based - that 
I'm looking at (and developing a plugin for). This may become more 
important for large numbers of groups and users especially if we 
consider applying ranger plugins to technologies used by a broad set of 
users.

Out of interest I just noticed in the nifi mailing lists that there was 
a recent thread on "LDAP Group Authorization". There is some discussion 
of native nifi+ranger, but in either case the question about why not get 
the info direct from ldap at connect time is being made. intriguing ...

Thanks for the link ... mulling over some more :-)

nigel.







Re: Scalability - large numbers of users/groups in LDAP

2017-02-10 Thread Nigel Jones

On 10/02/2017 09:58, Velmurugan Periasamy wrote:
> Hi Nigel:
>
> Thanks for starting an interesting thread.

> I believe this is already addressed by 
https://issues.apache.org/jira/browse/RANGER-869. Please take a look.


I took a look - indeed I had noticed this option to go via groups and 
lookup "member" which does mitigate the issue somewhat, depending on the 
number of groups


In the environment I'm thinking of I can probably find an "interesting" 
list of groups. So I could modify usersync to not just use the 
group->member lookup, but also to ONLY do that for certain groups (I'll 
probably need "groupsync" for that... !)


Whether this work depends on how the ldap server is set up... I need to 
take a look.. if so this is probably good enough for now.


But I'm still wondering if we really need to sync users at all since at 
some point any kind of connector/engine may well be doing an ldap lookup 
anyway - certainly that's true in an engine -- Apache Derby based - that 
I'm looking at (and developing a plugin for). This may become more 
important for large numbers of groups and users especially if we 
consider applying ranger plugins to technologies used by a broad set of 
users.


Out of interest I just noticed in the nifi mailing lists that there was 
a recent thread on "LDAP Group Authorization". There is some discussion 
of native nifi+ranger, but in either case the question about why not get 
the info direct from ldap at connect time is being made. intriguing ...


Thanks for the link ... mulling over some more :-)

nigel.




Re: Scalability - large numbers of users/groups in LDAP

2017-02-10 Thread Velmurugan Periasamy
Hi Nigel:

Thanks for starting an interesting thread.

> In some environments selecting a subset of groups (which may be used as
> roles), and just pulling users there MAY help if the applications being
> secured have a more limited audience

I believe this is already addressed by 
https://issues.apache.org/jira/browse/RANGER-869. Please take a look.

Thank you,
Vel

From: Nigel Jones <jon...@uk.ibm.com<mailto:jon...@uk.ibm.com>>
Reply-To: "dev@ranger.apache.org<mailto:dev@ranger.apache.org>" 
<dev@ranger.apache.org<mailto:dev@ranger.apache.org>>
Date: Friday, February 10, 2017 at 2:41 AM
To: "d...@ranger.incubator.apache.org<mailto:d...@ranger.incubator.apache.org>" 
<d...@ranger.incubator.apache.org<mailto:d...@ranger.incubator.apache.org>>
Subject: Scalability - large numbers of users/groups in LDAP

I've been mulling over an issue recently and interested in any
thoughts... I'm pretty new to ranger to very ready to hear why this
could never work ;-)

Today in an LDAP-managed enterprise environment user & group information
is replicated from the LDAP server such as MS Active Directory by the
usersync process. I have some control over
  - the base DN
  - whether to pull a list of groups from each user, or users from groups
  - what additional attributes are pulled
This is then persisted in ranger & gets pulled by the plugins

However in some environments
  - the numbers of users in LDAP could be very high (100,000+)
  - it may be difficult to scope the query where ranger is securing
access to an enterprise service

If we assume any kind of service that involves a "connect" as well
read/write operations there could be an opportunity to retrieve
user/group information for that user at that point. It could then be
saved within the plugin to be used at data access time.

As a variation, Potentially we could still populate groups (or role)
information in the ranger server, making it easier for policy definitions

Has anyone considered this as an option?

In some environments selecting a subset of groups (which may be used as
roles), and just pulling users there MAY help if the applications being
secured have a more limited audience

if it sounds interesting I'm inclined to work through the flows in more
detail

Thanks
Nigel.





Scalability - large numbers of users/groups in LDAP

2017-02-09 Thread Nigel Jones
I've been mulling over an issue recently and interested in any 
thoughts... I'm pretty new to ranger to very ready to hear why this 
could never work ;-)


Today in an LDAP-managed enterprise environment user & group information 
is replicated from the LDAP server such as MS Active Directory by the 
usersync process. I have some control over

 - the base DN
 - whether to pull a list of groups from each user, or users from groups
 - what additional attributes are pulled
This is then persisted in ranger & gets pulled by the plugins

However in some environments
 - the numbers of users in LDAP could be very high (100,000+)
 - it may be difficult to scope the query where ranger is securing 
access to an enterprise service


If we assume any kind of service that involves a "connect" as well 
read/write operations there could be an opportunity to retrieve 
user/group information for that user at that point. It could then be 
saved within the plugin to be used at data access time.


As a variation, Potentially we could still populate groups (or role) 
information in the ranger server, making it easier for policy definitions


Has anyone considered this as an option?

In some environments selecting a subset of groups (which may be used as 
roles), and just pulling users there MAY help if the applications being 
secured have a more limited audience


if it sounds interesting I'm inclined to work through the flows in more 
detail


Thanks
Nigel.