[SSSD] Re: NSS responder should negatively cache local users for a longer time

2016-03-23 Thread Petr Cech

On 03/22/2016 08:58 PM, Jakub Hrozek wrote:

>On 22 Mar 2016, at 20:35, Simo Sorce  wrote:
>
>On Sun, 2016-03-20 at 21:28 +0100, Jakub Hrozek wrote:

>>>On 16 Mar 2016, at 13:45, Petr Cech  wrote:
>>>
>>>Hi,
>>>
>>>I will work on $subject [1] and I have discussed this topic with

>>Jakub a week ago. There are some open questions, so I will be glad if
>>you say your opinion.

>>>
>>>There could be heavy traffic between SSSD client and server coused

>>by local users. We need longer timeout in negative cache for local
>>users.

>>>
>>>Questions are:
>>>
>>>a) Is better hack negative_cache or responder?
>>>

>>
>>I would say that this solution should be reusable by other responders
>>like ifp as well. Therefore I would say either negcache (but there I
>>would say a new function, not extend the generic one) or a reusable
>>function in responder/common.
>>

>>>b) Is better set timeout = 0 (it means permanently in negative

>>cache) or set something really big like 12 hours?

>>>* We couldn't remove local users from permanent negative cache (only

>>by restart).

>>>* Is timeout = 12 hours means some kind of network peak?
>>>

>>
>>I guess some long timeout is slightly more flexible for cases where
>>the admin would add the local user to LDAP groups. A couple of hours
>>should be enough, as long as the negative entries are cached across
>>all clients, then if a single client queries the server once a couple
>>of hours, that should not bring the server down..

>
>12 hours is a lot, if you made a mistake and want to correct it (eg some
>software install created a local user by mistake that the admin removes
>because they want it in LDAP) you do not want to wait for hours.
>
>The main issue with the initgroups calls is not the load o the server,
>but the slowdown of the call which ends up contacting a network (slower)
>store for a local user.
>
>I would use a smaller timeout here, like 10 minutes, but potentially add
>a midway cache check, like we do for positive results. so after 5 min,

The admins actually complained about a load on the servers from users like 
postfix IIRC. That's the reason I suggested a long timeout.

But I guess even a short timeout would work, but I would vote for defaulting to 
the entry_cache_timeout then (with a midway check perhaps..). I guess the 
biggest gain for admins would be to be able to specify a timeout for all passwd 
users at once and avoid putting them one-by-one into filter_users/filter_groups.


>if a request comes in we do an asynchronous positive check to see if we
>still need to extend the negative cache for another 10 minutes. If the
>local user disappeared we drop the negative cache.
>

This is a good idea.


>>btw do you think this feature should be enabled or disabled by
>>default?

>
>Good q. how often does this problem happen ?
>

>>>c) Is it enough to do it only for initgroups?

>>
>>Hmm, not sure, by convention initgroups is the most frequent example
>>(maybe there will be some users of the new libc merge feature), but at
>>the same time special-casing initgroups doesn't gain much..
>>
>>I guess I would personally do this for all lookups that the NSS
>>interface can do (by name, by id) but I'm not 100% for or against
>>either..

>
>I agree, keep it generic.
>
>Simo.
>
>--
>Simo Sorce * Red Hat, Inc * New York

Thank you for taking the time to review the design!


Hi,

I would like to recap the conclusions:


a) Is better hack negative_cache or responder?

  Jakub suggested hack new functions into negcache
  or new reusable functions in responder/common.


b) Is better set timeout = 0 (it means permanently
in negative cache) or set something really big
like 12 hours?

  Discussion crystallized into smaller values.
  For example set
  entry_negative_timeout = entry_cache_timeout

  And maybe adding midway check.
  Note: I am not sure how midway check exactly works.


c) Is it enough to do it only for initgroups?

  We do such negatively cache of local users for all
  lookups that the NSS interface can do.


And there is one new question:

Do you think this feature should be enabled
or disabled by default?

  My suggestion is add option for enabling/disabling
  and maybe another option for setting
  local_entry_negative_timeout (default entry_cache_timeout)

  If this is better behaviour for all our users, it should
  be enabled by default. If it is only feature for some
  of them, it should be disabling.
  I have no concrete opinion yet.


Thank you both for discussion.

Regards

--
Petr^4 Čech
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD] Re: NSS responder should negatively cache local users for a longer time

2016-03-22 Thread Jakub Hrozek

> On 22 Mar 2016, at 20:35, Simo Sorce  wrote:
> 
> On Sun, 2016-03-20 at 21:28 +0100, Jakub Hrozek wrote:
>>> On 16 Mar 2016, at 13:45, Petr Cech  wrote:
>>> 
>>> Hi,
>>> 
>>> I will work on $subject [1] and I have discussed this topic with
>> Jakub a week ago. There are some open questions, so I will be glad if
>> you say your opinion.
>>> 
>>> There could be heavy traffic between SSSD client and server coused
>> by local users. We need longer timeout in negative cache for local
>> users.
>>> 
>>> Questions are:
>>> 
>>> a) Is better hack negative_cache or responder?
>>> 
>> 
>> I would say that this solution should be reusable by other responders
>> like ifp as well. Therefore I would say either negcache (but there I
>> would say a new function, not extend the generic one) or a reusable
>> function in responder/common.
>> 
>>> b) Is better set timeout = 0 (it means permanently in negative
>> cache) or set something really big like 12 hours?
>>> * We couldn't remove local users from permanent negative cache (only
>> by restart).
>>> * Is timeout = 12 hours means some kind of network peak?
>>> 
>> 
>> I guess some long timeout is slightly more flexible for cases where
>> the admin would add the local user to LDAP groups. A couple of hours
>> should be enough, as long as the negative entries are cached across
>> all clients, then if a single client queries the server once a couple
>> of hours, that should not bring the server down..
> 
> 12 hours is a lot, if you made a mistake and want to correct it (eg some
> software install created a local user by mistake that the admin removes
> because they want it in LDAP) you do not want to wait for hours.
> 
> The main issue with the initgroups calls is not the load o the server,
> but the slowdown of the call which ends up contacting a network (slower)
> store for a local user.
> 
> I would use a smaller timeout here, like 10 minutes, but potentially add
> a midway cache check, like we do for positive results. so after 5 min,

The admins actually complained about a load on the servers from users like 
postfix IIRC. That's the reason I suggested a long timeout.

But I guess even a short timeout would work, but I would vote for defaulting to 
the entry_cache_timeout then (with a midway check perhaps..). I guess the 
biggest gain for admins would be to be able to specify a timeout for all passwd 
users at once and avoid putting them one-by-one into filter_users/filter_groups.

> if a request comes in we do an asynchronous positive check to see if we
> still need to extend the negative cache for another 10 minutes. If the
> local user disappeared we drop the negative cache.
> 

This is a good idea.

>> btw do you think this feature should be enabled or disabled by
>> default?
> 
> Good q. how often does this problem happen ?
> 
>>> c) Is it enough to do it only for initgroups?
>> 
>> Hmm, not sure, by convention initgroups is the most frequent example
>> (maybe there will be some users of the new libc merge feature), but at
>> the same time special-casing initgroups doesn't gain much..
>> 
>> I guess I would personally do this for all lookups that the NSS
>> interface can do (by name, by id) but I'm not 100% for or against
>> either..
> 
> I agree, keep it generic.
> 
> Simo.
> 
> -- 
> Simo Sorce * Red Hat, Inc * New York

Thank you for taking the time to review the design!
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD] Re: NSS responder should negatively cache local users for a longer time

2016-03-22 Thread Simo Sorce
On Sun, 2016-03-20 at 21:28 +0100, Jakub Hrozek wrote:
> > On 16 Mar 2016, at 13:45, Petr Cech  wrote:
> > 
> > Hi,
> > 
> > I will work on $subject [1] and I have discussed this topic with
> Jakub a week ago. There are some open questions, so I will be glad if
> you say your opinion.
> > 
> > There could be heavy traffic between SSSD client and server coused
> by local users. We need longer timeout in negative cache for local
> users.
> > 
> > Questions are:
> > 
> > a) Is better hack negative_cache or responder?
> > 
> 
> I would say that this solution should be reusable by other responders
> like ifp as well. Therefore I would say either negcache (but there I
> would say a new function, not extend the generic one) or a reusable
> function in responder/common.
> 
> > b) Is better set timeout = 0 (it means permanently in negative
> cache) or set something really big like 12 hours?
> > * We couldn't remove local users from permanent negative cache (only
> by restart).
> > * Is timeout = 12 hours means some kind of network peak?
> > 
> 
> I guess some long timeout is slightly more flexible for cases where
> the admin would add the local user to LDAP groups. A couple of hours
> should be enough, as long as the negative entries are cached across
> all clients, then if a single client queries the server once a couple
> of hours, that should not bring the server down..

12 hours is a lot, if you made a mistake and want to correct it (eg some
software install created a local user by mistake that the admin removes
because they want it in LDAP) you do not want to wait for hours.

The main issue with the initgroups calls is not the load o the server,
but the slowdown of the call which ends up contacting a network (slower)
store for a local user.

I would use a smaller timeout here, like 10 minutes, but potentially add
a midway cache check, like we do for positive results. so after 5 min,
if a request comes in we do an asynchronous positive check to see if we
still need to extend the negative cache for another 10 minutes. If the
local user disappeared we drop the negative cache.

> btw do you think this feature should be enabled or disabled by
> default?

Good q. how often does this problem happen ?

> > c) Is it enough to do it only for initgroups?
> 
> Hmm, not sure, by convention initgroups is the most frequent example
> (maybe there will be some users of the new libc merge feature), but at
> the same time special-casing initgroups doesn't gain much..
> 
> I guess I would personally do this for all lookups that the NSS
> interface can do (by name, by id) but I'm not 100% for or against
> either..

I agree, keep it generic.

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD] Re: NSS responder should negatively cache local users for a longer time

2016-03-20 Thread Jakub Hrozek

> On 16 Mar 2016, at 13:45, Petr Cech  wrote:
> 
> Hi,
> 
> I will work on $subject [1] and I have discussed this topic with Jakub a week 
> ago. There are some open questions, so I will be glad if you say your opinion.
> 
> There could be heavy traffic between SSSD client and server coused by local 
> users. We need longer timeout in negative cache for local users.
> 
> Questions are:
> 
> a) Is better hack negative_cache or responder?
> 

I would say that this solution should be reusable by other responders like ifp 
as well. Therefore I would say either negcache (but there I would say a new 
function, not extend the generic one) or a reusable function in 
responder/common.

> b) Is better set timeout = 0 (it means permanently in negative cache) or set 
> something really big like 12 hours?
> * We couldn't remove local users from permanent negative cache (only by 
> restart).
> * Is timeout = 12 hours means some kind of network peak?
> 

I guess some long timeout is slightly more flexible for cases where the admin 
would add the local user to LDAP groups. A couple of hours should be enough, as 
long as the negative entries are cached across all clients, then if a single 
client queries the server once a couple of hours, that should not bring the 
server down..

btw do you think this feature should be enabled or disabled by default?

> c) Is it enough to do it only for initgroups?

Hmm, not sure, by convention initgroups is the most frequent example (maybe 
there will be some users of the new libc merge feature), but at the same time 
special-casing initgroups doesn't gain much..

I guess I would personally do this for all lookups that the NSS interface can 
do (by name, by id) but I'm not 100% for or against either..
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org