[SSSD] Re: NSS responder should negatively cache local users for a longer time
On 03/22/2016 08:58 PM, Jakub Hrozek wrote: >On 22 Mar 2016, at 20:35, Simo Sorce wrote: > >On Sun, 2016-03-20 at 21:28 +0100, Jakub Hrozek wrote: >>>On 16 Mar 2016, at 13:45, Petr Cech wrote: >>> >>>Hi, >>> >>>I will work on $subject [1] and I have discussed this topic with >>Jakub a week ago. There are some open questions, so I will be glad if >>you say your opinion. >>> >>>There could be heavy traffic between SSSD client and server coused >>by local users. We need longer timeout in negative cache for local >>users. >>> >>>Questions are: >>> >>>a) Is better hack negative_cache or responder? >>> >> >>I would say that this solution should be reusable by other responders >>like ifp as well. Therefore I would say either negcache (but there I >>would say a new function, not extend the generic one) or a reusable >>function in responder/common. >> >>>b) Is better set timeout = 0 (it means permanently in negative >>cache) or set something really big like 12 hours? >>>* We couldn't remove local users from permanent negative cache (only >>by restart). >>>* Is timeout = 12 hours means some kind of network peak? >>> >> >>I guess some long timeout is slightly more flexible for cases where >>the admin would add the local user to LDAP groups. A couple of hours >>should be enough, as long as the negative entries are cached across >>all clients, then if a single client queries the server once a couple >>of hours, that should not bring the server down.. > >12 hours is a lot, if you made a mistake and want to correct it (eg some >software install created a local user by mistake that the admin removes >because they want it in LDAP) you do not want to wait for hours. > >The main issue with the initgroups calls is not the load o the server, >but the slowdown of the call which ends up contacting a network (slower) >store for a local user. > >I would use a smaller timeout here, like 10 minutes, but potentially add >a midway cache check, like we do for positive results. so after 5 min, The admins actually complained about a load on the servers from users like postfix IIRC. That's the reason I suggested a long timeout. But I guess even a short timeout would work, but I would vote for defaulting to the entry_cache_timeout then (with a midway check perhaps..). I guess the biggest gain for admins would be to be able to specify a timeout for all passwd users at once and avoid putting them one-by-one into filter_users/filter_groups. >if a request comes in we do an asynchronous positive check to see if we >still need to extend the negative cache for another 10 minutes. If the >local user disappeared we drop the negative cache. > This is a good idea. >>btw do you think this feature should be enabled or disabled by >>default? > >Good q. how often does this problem happen ? > >>>c) Is it enough to do it only for initgroups? >> >>Hmm, not sure, by convention initgroups is the most frequent example >>(maybe there will be some users of the new libc merge feature), but at >>the same time special-casing initgroups doesn't gain much.. >> >>I guess I would personally do this for all lookups that the NSS >>interface can do (by name, by id) but I'm not 100% for or against >>either.. > >I agree, keep it generic. > >Simo. > >-- >Simo Sorce * Red Hat, Inc * New York Thank you for taking the time to review the design! Hi, I would like to recap the conclusions: a) Is better hack negative_cache or responder? Jakub suggested hack new functions into negcache or new reusable functions in responder/common. b) Is better set timeout = 0 (it means permanently in negative cache) or set something really big like 12 hours? Discussion crystallized into smaller values. For example set entry_negative_timeout = entry_cache_timeout And maybe adding midway check. Note: I am not sure how midway check exactly works. c) Is it enough to do it only for initgroups? We do such negatively cache of local users for all lookups that the NSS interface can do. And there is one new question: Do you think this feature should be enabled or disabled by default? My suggestion is add option for enabling/disabling and maybe another option for setting local_entry_negative_timeout (default entry_cache_timeout) If this is better behaviour for all our users, it should be enabled by default. If it is only feature for some of them, it should be disabling. I have no concrete opinion yet. Thank you both for discussion. Regards -- Petr^4 Čech ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD] Re: NSS responder should negatively cache local users for a longer time
> On 22 Mar 2016, at 20:35, Simo Sorce wrote: > > On Sun, 2016-03-20 at 21:28 +0100, Jakub Hrozek wrote: >>> On 16 Mar 2016, at 13:45, Petr Cech wrote: >>> >>> Hi, >>> >>> I will work on $subject [1] and I have discussed this topic with >> Jakub a week ago. There are some open questions, so I will be glad if >> you say your opinion. >>> >>> There could be heavy traffic between SSSD client and server coused >> by local users. We need longer timeout in negative cache for local >> users. >>> >>> Questions are: >>> >>> a) Is better hack negative_cache or responder? >>> >> >> I would say that this solution should be reusable by other responders >> like ifp as well. Therefore I would say either negcache (but there I >> would say a new function, not extend the generic one) or a reusable >> function in responder/common. >> >>> b) Is better set timeout = 0 (it means permanently in negative >> cache) or set something really big like 12 hours? >>> * We couldn't remove local users from permanent negative cache (only >> by restart). >>> * Is timeout = 12 hours means some kind of network peak? >>> >> >> I guess some long timeout is slightly more flexible for cases where >> the admin would add the local user to LDAP groups. A couple of hours >> should be enough, as long as the negative entries are cached across >> all clients, then if a single client queries the server once a couple >> of hours, that should not bring the server down.. > > 12 hours is a lot, if you made a mistake and want to correct it (eg some > software install created a local user by mistake that the admin removes > because they want it in LDAP) you do not want to wait for hours. > > The main issue with the initgroups calls is not the load o the server, > but the slowdown of the call which ends up contacting a network (slower) > store for a local user. > > I would use a smaller timeout here, like 10 minutes, but potentially add > a midway cache check, like we do for positive results. so after 5 min, The admins actually complained about a load on the servers from users like postfix IIRC. That's the reason I suggested a long timeout. But I guess even a short timeout would work, but I would vote for defaulting to the entry_cache_timeout then (with a midway check perhaps..). I guess the biggest gain for admins would be to be able to specify a timeout for all passwd users at once and avoid putting them one-by-one into filter_users/filter_groups. > if a request comes in we do an asynchronous positive check to see if we > still need to extend the negative cache for another 10 minutes. If the > local user disappeared we drop the negative cache. > This is a good idea. >> btw do you think this feature should be enabled or disabled by >> default? > > Good q. how often does this problem happen ? > >>> c) Is it enough to do it only for initgroups? >> >> Hmm, not sure, by convention initgroups is the most frequent example >> (maybe there will be some users of the new libc merge feature), but at >> the same time special-casing initgroups doesn't gain much.. >> >> I guess I would personally do this for all lookups that the NSS >> interface can do (by name, by id) but I'm not 100% for or against >> either.. > > I agree, keep it generic. > > Simo. > > -- > Simo Sorce * Red Hat, Inc * New York Thank you for taking the time to review the design! ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD] Re: NSS responder should negatively cache local users for a longer time
On Sun, 2016-03-20 at 21:28 +0100, Jakub Hrozek wrote: > > On 16 Mar 2016, at 13:45, Petr Cech wrote: > > > > Hi, > > > > I will work on $subject [1] and I have discussed this topic with > Jakub a week ago. There are some open questions, so I will be glad if > you say your opinion. > > > > There could be heavy traffic between SSSD client and server coused > by local users. We need longer timeout in negative cache for local > users. > > > > Questions are: > > > > a) Is better hack negative_cache or responder? > > > > I would say that this solution should be reusable by other responders > like ifp as well. Therefore I would say either negcache (but there I > would say a new function, not extend the generic one) or a reusable > function in responder/common. > > > b) Is better set timeout = 0 (it means permanently in negative > cache) or set something really big like 12 hours? > > * We couldn't remove local users from permanent negative cache (only > by restart). > > * Is timeout = 12 hours means some kind of network peak? > > > > I guess some long timeout is slightly more flexible for cases where > the admin would add the local user to LDAP groups. A couple of hours > should be enough, as long as the negative entries are cached across > all clients, then if a single client queries the server once a couple > of hours, that should not bring the server down.. 12 hours is a lot, if you made a mistake and want to correct it (eg some software install created a local user by mistake that the admin removes because they want it in LDAP) you do not want to wait for hours. The main issue with the initgroups calls is not the load o the server, but the slowdown of the call which ends up contacting a network (slower) store for a local user. I would use a smaller timeout here, like 10 minutes, but potentially add a midway cache check, like we do for positive results. so after 5 min, if a request comes in we do an asynchronous positive check to see if we still need to extend the negative cache for another 10 minutes. If the local user disappeared we drop the negative cache. > btw do you think this feature should be enabled or disabled by > default? Good q. how often does this problem happen ? > > c) Is it enough to do it only for initgroups? > > Hmm, not sure, by convention initgroups is the most frequent example > (maybe there will be some users of the new libc merge feature), but at > the same time special-casing initgroups doesn't gain much.. > > I guess I would personally do this for all lookups that the NSS > interface can do (by name, by id) but I'm not 100% for or against > either.. I agree, keep it generic. Simo. -- Simo Sorce * Red Hat, Inc * New York ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD] Re: NSS responder should negatively cache local users for a longer time
> On 16 Mar 2016, at 13:45, Petr Cech wrote: > > Hi, > > I will work on $subject [1] and I have discussed this topic with Jakub a week > ago. There are some open questions, so I will be glad if you say your opinion. > > There could be heavy traffic between SSSD client and server coused by local > users. We need longer timeout in negative cache for local users. > > Questions are: > > a) Is better hack negative_cache or responder? > I would say that this solution should be reusable by other responders like ifp as well. Therefore I would say either negcache (but there I would say a new function, not extend the generic one) or a reusable function in responder/common. > b) Is better set timeout = 0 (it means permanently in negative cache) or set > something really big like 12 hours? > * We couldn't remove local users from permanent negative cache (only by > restart). > * Is timeout = 12 hours means some kind of network peak? > I guess some long timeout is slightly more flexible for cases where the admin would add the local user to LDAP groups. A couple of hours should be enough, as long as the negative entries are cached across all clients, then if a single client queries the server once a couple of hours, that should not bring the server down.. btw do you think this feature should be enabled or disabled by default? > c) Is it enough to do it only for initgroups? Hmm, not sure, by convention initgroups is the most frequent example (maybe there will be some users of the new libc merge feature), but at the same time special-casing initgroups doesn't gain much.. I guess I would personally do this for all lookups that the NSS interface can do (by name, by id) but I'm not 100% for or against either.. ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org