[SSSD]Re: [PATCH] NSS: Fix memory leak netgroup
On 12/04/2015 03:37 PM, Jakub Hrozek wrote: On Thu, Dec 03, 2015 at 03:29:22PM +0100, Pavel Reichl wrote: On 12/03/2015 02:06 PM, Jakub Hrozek wrote: On Mon, Nov 30, 2015 at 06:45:00PM +0100, Pavel Reichl wrote: On 11/30/2015 06:02 PM, Jakub Hrozek wrote: Wouldn't it then be better to see if another same object is already in the hashtable and free it before replacing? I agree it would be best. I tried that before and failed because I could not decipher out the relation of talloc contexts. I tried that again. Seems that leaks are gone. Segfaults were not happening during my testing. Code got even messier :-( I think the code would look nice if it was placed in the else branch of create_negcache_netgr() :-) Done, thanks for hint. I also don't think we need the tmp_ctx change.. I dare to disagree here. With prev. versions of patch I was experiencing segfaults. IIRC step_context is being allocated on context of the netgroup that is freed. I also think that it's not a good idea to allocate local data on context that does not hold any reference to that data. But it might be matter of personal taste. What kind of segfaults, what was the backtrace? I think it might be good to add talloc_zfree() instead of talloc_free() or explicitly NULL the pointer, that should solve the issue without adding a new context... Since it's per-domain, I wonder if step_ctx->dctx might be better candidate than step_ctx either way. Hello Jakub, I amended the patch as you proposed. You can now experience the segfault yourself - just query a non-existing netgroup. I attached the backtrace as well. I can investiage and use talloc reporting if you wish. But still using tmp_ctx seems as way of the least effort... I think you are right, the step_ctx hierarchy is tricky, so the temporary context looks like the easiest solution. Please do one more change in the patch: OK, please see attached patch. From 0160ee92c4d9a9542abfd3e918686526a52e367d Mon Sep 17 00:00:00 2001 From: Pavel Reichl Date: Fri, 27 Nov 2015 07:53:00 -0500 Subject: [PATCH] NSS: Fix memory leak netgroup [...] @@ -461,6 +469,11 @@ static errno_t create_negcache_netgr(struct setent_step_ctx *step_ctx) } done: +/* Free netgroup after step_ctx is not needed. */ +if (netgr_to_be_freed) { +talloc_zfree(netgr_to_be_freed); +} Since talloc_free(NULL) is a noop, we should drop the if completely. OK, done. ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org >From 47e73fdd381240b67eba1d7e1de44a589388c503 Mon Sep 17 00:00:00 2001 From: Pavel Reichl Date: Fri, 27 Nov 2015 07:53:00 -0500 Subject: [PATCH] NSS: Fix memory leak netgroup If netgroup cannot be found in setnetgrent_retry() function set_netgroup_entry() is called which steals getent_ctx directly to nss_ctx->netgroups. Subsequently function lookup_netgr_step() is called that (in case of nenexisting group) will call create_negcache_netgr() which creates a new dummy object to serve as negative cache. While doing so it calls again set_netgroup_entry() for the same netgroup and it calls hash_enter. hash_enter will remove previously hashed entry for netgroup (created in setnetgrent_retry()) from hash table but it won't be freed and thus it leaks. This patch marks such netgroup and freed it after the call of create_negcache_netgr(). Resolves: https://fedorahosted.org/sssd/ticket/2865 --- src/responder/nss/nsssrv_netgroup.c | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/src/responder/nss/nsssrv_netgroup.c b/src/responder/nss/nsssrv_netgroup.c index 9a78c1119c2f4e06e43ebec29ace775adc997e08..4647a20af6b0f8e00c120839bc7e880e89156de6 100644 --- a/src/responder/nss/nsssrv_netgroup.c +++ b/src/responder/nss/nsssrv_netgroup.c @@ -434,6 +434,7 @@ static errno_t create_negcache_netgr(struct setent_step_ctx *step_ctx) { errno_t ret; struct getent_ctx *netgr; +struct getent_ctx *netgr_to_be_freed = NULL; netgr = talloc_zero(step_ctx->nctx, struct getent_ctx); if (netgr == NULL) { @@ -441,6 +442,13 @@ static errno_t create_negcache_netgr(struct setent_step_ctx *step_ctx) ret = ENOMEM; goto done; } else { +/* Is there already netgroup with such name? */ +ret = get_netgroup_entry(step_ctx->nctx, step_ctx->name, + &netgr_to_be_freed); +if (ret != EOK) { +netgr_to_be_freed = NULL; +} + netgr->ready = true; netgr->found = false; netgr->entries = NULL; @@ -461,6 +469,8 @@ static errno_t create_negcache_netgr(struct setent_step_ctx *step_ctx) } done: +talloc_zfree(netgr_to_be_freed); + if (ret != EOK) { talloc_free(netgr); } @@ -474,6 +484,12 @@ static errno_t lookup_netgr_step(struct setent_step_ctx *step_ctx)
[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts
On (04/12/15 16:02), Michal Židek wrote: >On 12/04/2015 03:07 PM, Lukas Slebodnik wrote: >>On (04/12/15 14:35), Michal Židek wrote: >>>On 12/04/2015 02:32 PM, Jakub Hrozek wrote: On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote: >On 12/04/2015 12:29 PM, Lukas Slebodnik wrote: >>On (04/12/15 12:11), Jakub Hrozek wrote: >>>On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote: On (03/12/15 20:22), Jakub Hrozek wrote: >On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: >>On (02/12/15 17:10), Michal Židek wrote: >>>Hi! >>> >>>I saw some integration tests failures recently, >>>and I think there is a race condition between the >>>enumeration refresh timeout and the sleeps >>>after some operations that wait for this timeout. >>>SSSD fails to populate changes from LDAP in time >>>and some asserts can fail because of this. >>> >>>So far I saw 4 tests to fail like this, which >>>is already quite a lot. >>> >>>The attached patch modifies the timeout values >>>and hopefully removes the issue. >>> >>>Michal >> >>>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 >>>2001 >>>From: =?UTF-8?q?Michal=20=C5=BDidek?= >>>Date: Wed, 2 Dec 2015 16:44:48 +0100 >>>Subject: [PATCH] ldap_test.py: Modify enum cache timeouts >>> >>>There is a race condation between ldap >>>enumeration refresh timeout and the sleeps >>>that wait for the ldap changes to populate >>>to SSSD if the timeout and the sleeps have >>>the same value. >>>--- >>>src/tests/intg/ldap_test.py | 30 +- >>>1 file changed, 17 insertions(+), 13 deletions(-) >>> >>>diff --git a/src/tests/intg/ldap_test.py >>>b/src/tests/intg/ldap_test.py >>>index 757ee20..8ec8dbe 100644 >>>--- a/src/tests/intg/ldap_test.py >>>+++ b/src/tests/intg/ldap_test.py >>>@@ -33,7 +33,11 @@ import ldap_ent >>>from util import * >>> >>>LDAP_BASE_DN = "dc=example,dc=com" >>>-INTERACTIVE_TIMEOUT = 4 >>>+INTERACTIVE_TIMEOUT = 2 >>>+ >>>+ >>>+def wait_for_ldap_enum_refresh(): >>>+time.sleep(INTERACTIVE_TIMEOUT + 4) >>Why does it need to be INTERACTIVE_TIMEOUT + 4 >> >>Could it be INTERACTIVE_TIMEOUT + 3 or + 5 >> > >Regardless of the value we choose, can we move this patch forward? I >see >the related failure quite often in SSSD. Adding timeout without real explanation is not a solution. The main problem is with empiric values. If they are very high then test are slow. And there still can be slow/fast machine where lower values caused troubles. The ideal solution would be to get rid of enumeration in ldap tests. >>> >>>Enumeration is a codepath that is different from non-enumeration, so it >>>should be tested. Not as priority, not as the only ldap tests, but it's >>>a valid case, so it should be there. >>> If we want to test enumeration than it should be in separate test. >>> >>>Maybe, but we do have enumeration tests and we should fix them. >>> >>Adding sleep is not a fix. It's just a workaround >>because all sleep timeout are just an empiric values. >>and we should fix test and not adding workaround/hacks. >> >>If we cannot fix the test and don't want te rewrite test without >>enumeration >>then we should remove/revert problematic tests. >> >>LS > >I will send a new patch with an explanation (sort of), >but it still will be a guess. I am not sure what the >real safe value should be, only that the sleep's >after operation should be longer than the ldap >refresh and enum cache timeouts (and that the >current values do not cope well wit the CI load). Would it be more acceptable then to define the ldap refresh and enum cache timeouts as variables in the test and sleep for (enum_timeout + cache_timeout + 1) ? At least that would be more readable than a magic constant.. >>> >>>Will do. All will be derived from INTERACTIVE_TIMEOUT >>>so that we need to change just one value if needed in the >>>future. >>> >>Will it be reliable? >> >>Will it work on slow(arm) machines? >> >>I plan to run integration test in "%check" phase >>of rpm build. And koji/fedora has rpm machines. > >We can mark tests that may fail on slow machines >due to timeouts as "unsafe" and skip them in the >rpm build. It's not about rpmbuild, it's a general problem. And marking tests as unsafe does not solve anything. We need a reliable way how to find out that enumeratin task
[SSSD]Re: [PATCH] Keep external members of IPA groups
On Fri, Dec 04, 2015 at 04:05:52PM +0100, Sumit Bose wrote: > On Fri, Dec 04, 2015 at 02:47:17PM +0100, Jakub Hrozek wrote: > > Hi, > > > > it seems like https://fedorahosted.org/sssd/ticket/2492 was not fixed > > completely. Attached are two patches that do the trick for me -- they > > are not polished (at the very least, the first one needs a test..) but I > > would like to get another opinion if they at least aim in the right > > direction. > > > > So my groups on IPA server are set like this: > > > > [jhrozek@unidirect] sssd $ [(ext_mem)] ipa group-show topgr --all --raw > > dn: cn=topgr,cn=groups,cn=accounts,dc=ipa,dc=test > > cn: topgr > > gidnumber: 24024 > > member: cn=ext_ipa_gr,cn=groups,cn=accounts,dc=ipa,dc=test > > ipaUniqueID: 193b7c04-91e9-11e5-bc59-5254005f7b59 > > objectClass: ipaobject > > objectClass: top > > objectClass: ipausergroup > > objectClass: posixgroup > > objectClass: groupofnames > > objectClass: nestedgroup > > [jhrozek@unidirect] sssd $ [(ext_mem)] ipa group-show ext_ipa_gr --all --raw > > dn: cn=ext_ipa_gr,cn=groups,cn=accounts,dc=ipa,dc=test > > cn: ext_ipa_gr > > ipaexternalmember: S-1-5-21-1897531548-1940899517-361317264-500 > > ipaUniqueID: ad2bd978-91e8-11e5-9d52-5254005f7b59 > > memberof: cn=topgr,cn=groups,cn=accounts,dc=ipa,dc=test > > objectClass: ipaobject > > objectClass: top > > objectClass: nestedgroup > > objectClass: ipausergroup > > objectClass: groupofnames > > objectClass: ipaexternalgroup > > > > The SID S-1-5-21-1897531548-1940899517-361317264-500 is an AD user > > (administrator) > > > > Now when I do: > > sudo sss_cache -E > > $ id -G administra...@win.trust.test > > $ sudo sss_cache -G > > $ getent group 24024 > > > > Then the "topgr" group gets resolved and the code gets into > > sdap_save_grpmem: > > 908 /* This is a temporal solution until the IPA provider is able to > > 909 * resolve external group membership. > > 910 * https://fedorahosted.org/sssd/ticket/2522 > > 911 */ > > 912 if (opts->schema_type == SDAP_SCHEMA_IPA_V1) { > > 913 ret = sysdb_attrs_get_string(attrs, SYSDB_SID_STR, &group_sid); > > 914 if (ret != EOK) { > > 915 DEBUG(SSSDBG_TRACE_FUNC, "Failed to get group sid\n"); > > 916 group_sid = NULL; > > > > Here we set the group_sid variable to NULL, because topgr, being an IPA > > group, doesn't have a SID... > > why does it not have a SID. I don't know, even re-running: # ipa-adtrust-install --add-sids Tells me that "Sidgen task plugin already configured, nothing to do". I guess my test IPA install is not in the best shape, maybe I should just refresh it.. > I guess it has a GID since we got this far. > IPA with trust enabled should automatically add SIDs to new POSIX groups > and groups created before trust setup should get them from running the > sidgen task. It is important that the groups have a SID because > otherwise they cannot be added to the PAC causing different views of the > groups membership see by the PAC responder and the ID provider. Makes sense. > > bye, > Sumit > ___ > sssd-devel mailing list > sssd-devel@lists.fedorahosted.org > https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD]Re: [PATCH] Reduce the code duplication in Data Provider
On Thu, Dec 03, 2015 at 01:24:54PM +0100, Pavel Březina wrote: > On 11/20/2015 12:04 PM, Jakub Hrozek wrote: > >On Thu, Nov 19, 2015 at 01:51:54PM +0100, Pavel Březina wrote: > >>>Thanks, new patches are attached. > >> > >>I had a phone call with Jakub and we decided that it will be better to use > >>be_req directly instead of lower level sbus_req. This will allow us to > >>simplify it more. > > > >OK, see the attached patches.. > > Hi, thanks for the changes. > > First patch: > > >@@ -199,6 +205,7 @@ struct be_req *be_req_create(TALLOC_CTX *mem_ctx, > > be_req->domain = be_ctx->domain; > > be_req->fn = fn; > > be_req->pvt = pvt_fn_data; > >+be_req->req_name = name; > > I know every call now passes static data but IMHO we should be safe and use > talloc_strdup here. > > >+static errno_t be_sbus_reply(struct sbus_request *sbus_req, > >+ dbus_uint16_t err_maj, > >+ dbus_uint32_t err_min, > >+ const char *err_msg) > >+{ > >+errno_t ret; > >+const char *safe_err_msg; > >+ > >+/* Only return a reply if one was requested > >+ * There may not be one if this request began > >+ * while we were offline > >+ */ > >+if (sbus_req == NULL) { > >+return EOK; > >+} > >+ > >+safe_err_msg = safe_be_req_err_msg(err_msg, err_maj); > >+ > >+if (err_maj == DP_ERR_FATAL && err_min == ENODEV) { > >+DEBUG(SSSDBG_TRACE_LIBS, "Handler not configured\n"); > >+} else { > >+DEBUG(SSSDBG_TRACE_LIBS, > >+ "Request processed. Returned %d,%d,%s\n", > >+ err_maj, err_min, err_msg); > >+} > > Can we translate err_maj into string here? To get message similar to: > "Request processed: Success [%d]: %s, err_min, err_msg > "Request processed: Failure [%d]: %s, err_min, err_msg > "Request processed: Offline [%d]: %s, err_min, err_msg > > Second patch: > > >+#define BE_SBUS_REPLY_DATA_INIT { .err_maj = DP_ERR_FATAL, \ > >+ .err_min = EFAULT, \ > >+ .err_msg = "Fatal error" \ > >+}; > > ERR_INTERNAL might be a better choice over EFAULT since it basically means > we forgot to set it -> code error. OK, see the attached two. >From 0bff8916f35a39ff8fdd4789a31289f7d1b05877 Mon Sep 17 00:00:00 2001 From: Jakub Hrozek Date: Wed, 11 Nov 2015 13:39:43 +0100 Subject: [PATCH 1/2] DP: Reduce code duplication in the callback handlers Instead of calling sbus_request_return_and_finish() directly with the same checks copied over, add a be_sbus_reply() helper instead. --- src/providers/ad/ad_subdomains.c | 2 +- src/providers/data_provider_be.c | 355 + src/providers/dp_backend.h | 11 +- src/providers/ipa/ipa_subdomains.c | 2 +- 4 files changed, 138 insertions(+), 232 deletions(-) diff --git a/src/providers/ad/ad_subdomains.c b/src/providers/ad/ad_subdomains.c index 2e5d9120e473e32a84610d607ccf329249b4ac9e..4799b518a1354e5b4ef8392b860effc9121ee121 100644 --- a/src/providers/ad/ad_subdomains.c +++ b/src/providers/ad/ad_subdomains.c @@ -1118,7 +1118,7 @@ static void ad_subdom_online_cb(void *pvt) refresh_interval = ctx->be_ctx->domain->subdomain_refresh_interval; -be_req = be_req_create(ctx, NULL, ctx->be_ctx, +be_req = be_req_create(ctx, NULL, ctx->be_ctx, "AD subdomains", ad_subdom_be_req_callback, NULL); if (be_req == NULL) { DEBUG(SSSDBG_CRIT_FAILURE, "be_req_create() failed.\n"); diff --git a/src/providers/data_provider_be.c b/src/providers/data_provider_be.c index 9bc54b6d59dafd22b9f70da7cca3b520ca941efc..b67efa87008d8e15fb9fb27c3d02996608b0aa62 100644 --- a/src/providers/data_provider_be.c +++ b/src/providers/data_provider_be.c @@ -174,6 +174,9 @@ struct be_req { */ int phase; +/* Just for nicer debugging */ +const char *req_name; + struct be_req *prev; struct be_req *next; }; @@ -186,8 +189,11 @@ static int be_req_destructor(struct be_req *be_req) } struct be_req *be_req_create(TALLOC_CTX *mem_ctx, - struct be_client *becli, struct be_ctx *be_ctx, - be_async_callback_t fn, void *pvt_fn_data) + struct be_client *becli, + struct be_ctx *be_ctx, + const char *name, + be_async_callback_t fn, + void *pvt_fn_data) { struct be_req *be_req; @@ -199,6 +205,11 @@ struct be_req *be_req_create(TALLOC_CTX *mem_ctx, be_req->domain = be_ctx->domain; be_req->fn = fn; be_req->pvt = pvt_fn_data; +be_req->req_name = talloc_strdup(be_req, name); +if (be_req->req_name == NULL) { +talloc_free(be_req); +return NULL; +} /* Add this request to active request list and make sure it
[SSSD]Re: [PATCH] IPA_PROVIDER: Explicit no handle of services
On 12/03/2015 05:26 PM, Jakub Hrozek wrote: On Tue, Dec 01, 2015 at 05:02:39PM +0100, Petr Cech wrote: On 11/27/2015 10:35 AM, Jakub Hrozek wrote: On Thu, Nov 19, 2015 at 05:57:56PM +0100, Petr Cech wrote: On 11/12/2015 06:04 PM, Jakub Hrozek wrote: On Thu, Nov 12, 2015 at 04:54:21PM +0100, Petr Cech wrote: On 11/11/2015 02:42 PM, Jakub Hrozek wrote: Hi, I think it's a good idea to only say we don't handle services for IPA subdomains. But I also think it would be better to shortcut the request sooner, in ipa_subdomain_account_send() to avoid even sending an LDAP query. Hi Jakub, new patch is attached. During the testing... I found out, that I use wrong set up. Subdomains are connected to FreeIPA with trusted AD. What's wrong about it? I think we/should/ test this setup.. Hi Jakub, I was confused about it. I haven't got such environment with IP trust. But now I have one. So... patch is here, but I would like set up my environment properly and then I will inform you:-) Regards Petr But I still do something wrong. I set up environment: AD server: dom-78.abc... FreeIPA server: mirfak.persei.dev FreeIPA client: algol.persei.dev I set up the trust between IPA and AD. And then I tried on ipa clinet: # getent services ldap/mirfak.persei@persei.dev This is not a different kind of service (a Kerberos service principal). The ticket is about services resolvable via libc's NSS interface: https://www.gnu.org/software/libc/manual/html_node/Name-Service-Switch.html#Name-Service-Switch Hi Jakub, thank you for documentation. See man 5 nsswitch conf for some more info. The API used for resolving services is described in man 3 getservent and similar. Normally services are stored on the filesystem, similar to local users, see cat /etc/services. But custom services can also be stored in LDAP or other sources. And while we support looking up services, the IPA back end does not and currently if you do: getent services -s sss dns This command is usefull. I have used: # getent services -s sss d...@persei.dev The logs would show: (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [be_get_account_info] (0x0200): Got request for [0x1005][FAST BE_REQ_SERVICES][1][name=dns] (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [be_req_set_domain] (0x0400): Changing request domain from [ipa.test] to [ipa.test] (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [services_get_send] (0x1000): Preparing to search for services with filter [(&(cn=dns)(objectclass=ipService))] (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_id_op_connect_step] (0x4000): reusing cached connection (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_services_next_base] (0x0400): Searching for services with base [cn=accounts,dc=ipa,dc=test] (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_print_server] (0x2000): Searching 192.168.122.100 (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(cn=dns)(objectclass=ipService))][cn=accounts,dc=ipa,dc=test]. (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [objectClass] (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [cn] (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipServicePort] (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [ipServiceProtocol] (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [entryUSN] (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 16 (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_op_add] (0x2000): New operation 16 timeout 2 (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_process_result] (0x2000): Trace: sh[0xeedc00], connected[1], ops[0xf1aca0], ldap[0xef0bb0] (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_RESULT] (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_generic_op_finished] (0x0400): Search result: Success(0), no errmsg set (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_op_destructor] (0x2000): Operation 16 finished (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_services_process] (0x0400): Search for services, returned 0 results. (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_id_op_done] (0x4000): releasing operation connection (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [ldb] (0x4000): start ldb transaction (nesting: 0) (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sysdb_search_services] (0x2000): Search services with filter: (&(objectclass=service)(&(serviceProtocol=*)(|(nameAlias=dns)(name=dns (Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [ldb] (0x4000): Added timed event "
[SSSD]Re: [PATCH] Keep external members of IPA groups
On Fri, Dec 04, 2015 at 02:47:17PM +0100, Jakub Hrozek wrote: > Hi, > > it seems like https://fedorahosted.org/sssd/ticket/2492 was not fixed > completely. Attached are two patches that do the trick for me -- they > are not polished (at the very least, the first one needs a test..) but I > would like to get another opinion if they at least aim in the right > direction. > > So my groups on IPA server are set like this: > > [jhrozek@unidirect] sssd $ [(ext_mem)] ipa group-show topgr --all --raw > dn: cn=topgr,cn=groups,cn=accounts,dc=ipa,dc=test > cn: topgr > gidnumber: 24024 > member: cn=ext_ipa_gr,cn=groups,cn=accounts,dc=ipa,dc=test > ipaUniqueID: 193b7c04-91e9-11e5-bc59-5254005f7b59 > objectClass: ipaobject > objectClass: top > objectClass: ipausergroup > objectClass: posixgroup > objectClass: groupofnames > objectClass: nestedgroup > [jhrozek@unidirect] sssd $ [(ext_mem)] ipa group-show ext_ipa_gr --all --raw > dn: cn=ext_ipa_gr,cn=groups,cn=accounts,dc=ipa,dc=test > cn: ext_ipa_gr > ipaexternalmember: S-1-5-21-1897531548-1940899517-361317264-500 > ipaUniqueID: ad2bd978-91e8-11e5-9d52-5254005f7b59 > memberof: cn=topgr,cn=groups,cn=accounts,dc=ipa,dc=test > objectClass: ipaobject > objectClass: top > objectClass: nestedgroup > objectClass: ipausergroup > objectClass: groupofnames > objectClass: ipaexternalgroup > > The SID S-1-5-21-1897531548-1940899517-361317264-500 is an AD user > (administrator) > > Now when I do: > sudo sss_cache -E > $ id -G administra...@win.trust.test > $ sudo sss_cache -G > $ getent group 24024 > > Then the "topgr" group gets resolved and the code gets into sdap_save_grpmem: > 908 /* This is a temporal solution until the IPA provider is able to > 909 * resolve external group membership. > 910 * https://fedorahosted.org/sssd/ticket/2522 > 911 */ > 912 if (opts->schema_type == SDAP_SCHEMA_IPA_V1) { > 913 ret = sysdb_attrs_get_string(attrs, SYSDB_SID_STR, &group_sid); > 914 if (ret != EOK) { > 915 DEBUG(SSSDBG_TRACE_FUNC, "Failed to get group sid\n"); > 916 group_sid = NULL; > > Here we set the group_sid variable to NULL, because topgr, being an IPA > group, doesn't have a SID... why does it not have a SID. I guess it has a GID since we got this far. IPA with trust enabled should automatically add SIDs to new POSIX groups and groups created before trust setup should get them from running the sidgen task. It is important that the groups have a SID because otherwise they cannot be added to the PAC causing different views of the groups membership see by the PAC responder and the ID provider. bye, Sumit ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts
On 12/04/2015 03:07 PM, Lukas Slebodnik wrote: On (04/12/15 14:35), Michal Židek wrote: On 12/04/2015 02:32 PM, Jakub Hrozek wrote: On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote: On 12/04/2015 12:29 PM, Lukas Slebodnik wrote: On (04/12/15 12:11), Jakub Hrozek wrote: On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote: On (03/12/15 20:22), Jakub Hrozek wrote: On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: On (02/12/15 17:10), Michal Židek wrote: Hi! I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this. So far I saw 4 tests to fail like this, which is already quite a lot. The attached patch modifies the timeout values and hopefully removes the issue. Michal >From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20=C5=BDidek?= Date: Wed, 2 Dec 2015 16:44:48 +0100 Subject: [PATCH] ldap_test.py: Modify enum cache timeouts There is a race condation between ldap enumeration refresh timeout and the sleeps that wait for the ldap changes to populate to SSSD if the timeout and the sleeps have the same value. --- src/tests/intg/ldap_test.py | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py index 757ee20..8ec8dbe 100644 --- a/src/tests/intg/ldap_test.py +++ b/src/tests/intg/ldap_test.py @@ -33,7 +33,11 @@ import ldap_ent >from util import * LDAP_BASE_DN = "dc=example,dc=com" -INTERACTIVE_TIMEOUT = 4 +INTERACTIVE_TIMEOUT = 2 + + +def wait_for_ldap_enum_refresh(): +time.sleep(INTERACTIVE_TIMEOUT + 4) Why does it need to be INTERACTIVE_TIMEOUT + 4 Could it be INTERACTIVE_TIMEOUT + 3 or + 5 Regardless of the value we choose, can we move this patch forward? I see the related failure quite often in SSSD. Adding timeout without real explanation is not a solution. The main problem is with empiric values. If they are very high then test are slow. And there still can be slow/fast machine where lower values caused troubles. The ideal solution would be to get rid of enumeration in ldap tests. Enumeration is a codepath that is different from non-enumeration, so it should be tested. Not as priority, not as the only ldap tests, but it's a valid case, so it should be there. If we want to test enumeration than it should be in separate test. Maybe, but we do have enumeration tests and we should fix them. Adding sleep is not a fix. It's just a workaround because all sleep timeout are just an empiric values. and we should fix test and not adding workaround/hacks. If we cannot fix the test and don't want te rewrite test without enumeration then we should remove/revert problematic tests. LS I will send a new patch with an explanation (sort of), but it still will be a guess. I am not sure what the real safe value should be, only that the sleep's after operation should be longer than the ldap refresh and enum cache timeouts (and that the current values do not cope well wit the CI load). Would it be more acceptable then to define the ldap refresh and enum cache timeouts as variables in the test and sleep for (enum_timeout + cache_timeout + 1) ? At least that would be more readable than a magic constant.. Will do. All will be derived from INTERACTIVE_TIMEOUT so that we need to change just one value if needed in the future. Will it be reliable? Will it work on slow(arm) machines? I plan to run integration test in "%check" phase of rpm build. And koji/fedora has rpm machines. We can mark tests that may fail on slow machines due to timeouts as "unsafe" and skip them in the rpm build. The simplest way to do it would be to use -k "not test_unsafe_" option in INTGCHECK_PYTEST_ARGS and the unsafe tests would have test_unsafe_ prefix. Would that work for you? It is something that we can start using immediately. I already have a POC patch. LS ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD]Re: [PATCHES] sudo provider improvements
On (02/12/15 14:40), Pavel Březina wrote: >On 12/02/2015 02:06 PM, Lukas Slebodnik wrote: >>On (02/12/15 11:05), Pavel Březina wrote: >>>On 12/01/2015 02:02 PM, Lukas Slebodnik wrote: On (24/11/15 13:23), Pavel Březina wrote: >Hi, >I'm sending some sudo provider patches. I wanted to fix/improve things in >the >ldap sudo provider prior my work on ipa provider so I get familiar with it >again and avoid making the same mistakes. > >It fixes tevent style, shuffles the code around a little bit, convert >periodic task to use be_ptask module, renew hostinfo when needed, fix >sdap_id_op logic, recude code duplication, remove dead code, simplify error >handling, etc. > >Ticket fixed: >https://fedorahosted.org/sssd/ticket/1943 >https://fedorahosted.org/sssd/ticket/2672 > >I let Dan run downstream tests on those patches. We had to fix one test >that >was prone to a race condition which my patches revealed, but everything is >green now. > I got following valgrind errors with patches ==17279== 1 errors in context 3 of 7: ==17279== Conditional jump or move depends on uninitialised value(s) ==17279==at 0x8BC76FE: _talloc_steal_loc (talloc.c:1162) ==17279==by 0x13DA3775: sdap_sudo_set_usn (sdap_async_sudo.c:318) ==17279==by 0x13DA3775: sdap_sudo_refresh_done (sdap_async_sudo.c:750) ==17279==by 0x89B4201: _tevent_req_error (tevent_req.c:167) ==17279==by 0x13DA3321: sdap_sudo_load_sudoers_done (sdap_async_sudo.c:170) ==17279==by 0x89B4201: _tevent_req_error (tevent_req.c:167) ==17279==by 0x89B4201: _tevent_req_error (tevent_req.c:167) ==17279==by 0x13D71ADB: generic_ext_search_handler.isra.3 (sdap_async.c:1651) ==17279==by 0x89B3923: tevent_common_loop_immediate (tevent_immediate.c:135) ==17279==by 0x89B822D: epoll_event_loop_once (tevent_epoll.c:907) ==17279==by 0x89B6936: std_event_loop_once (tevent_standard.c:114) ==17279==by 0x89B30FC: _tevent_loop_once (tevent.c:533) ==17279==by 0x89B329A: tevent_common_loop_wait (tevent.c:637) ==17279== ==17279== ==17279== 1 errors in context 4 of 7: ==17279== Conditional jump or move depends on uninitialised value(s) ==17279==at 0x13DA3738: sdap_sudo_set_usn (sdap_async_sudo.c:307) ==17279==by 0x13DA3738: sdap_sudo_refresh_done (sdap_async_sudo.c:750) ==17279==by 0x89B4201: _tevent_req_error (tevent_req.c:167) ==17279==by 0x13DA3321: sdap_sudo_load_sudoers_done (sdap_async_sudo.c:170) ==17279==by 0x89B4201: _tevent_req_error (tevent_req.c:167) ==17279==by 0x89B4201: _tevent_req_error (tevent_req.c:167) ==17279==by 0x13D71ADB: generic_ext_search_handler.isra.3 (sdap_async.c:1651) ==17279==by 0x89B3923: tevent_common_loop_immediate (tevent_immediate.c:135) ==17279==by 0x89B822D: epoll_event_loop_once (tevent_epoll.c:907) ==17279==by 0x89B6936: std_event_loop_once (tevent_standard.c:114) ==17279==by 0x89B30FC: _tevent_loop_once (tevent.c:533) ==17279==by 0x89B329A: tevent_common_loop_wait (tevent.c:637) ==17279==by 0x89B68D6: std_event_loop_wait (tevent_standard.c:140) >>> >>>I can't see a codepath where usn could be uninitialized, do you? >>I didn't try but static analysers helped me, >>at least I hope it will help you :-) >>Maybe, once you will learn how to use them :-) :-) :-) >> >> >>Error: UNINIT (CWE-457): [#def1] >>sssd-1.13.90/src/providers/ldap/sdap_async_sudo.c:690: var_decl: Declaring >>variable "usn" without initializer. >>sssd-1.13.90/src/providers/ldap/sdap_async_sudo.c:750: uninit_use_in_call: >>Using uninitialized value "usn" when calling "sdap_sudo_set_usn". >># 748| >># 749| /* remember new usn */ >># 750|-> sdap_sudo_set_usn(state->srv_opts, usn); >># 751| >># 752| ret = EOK; >> >>Error: CLANG_WARNING: [#def2] >>sssd-1.13.90/src/providers/ldap/sdap_async_sudo.c:750:5: warning: Function >>call argument is an uninitialized value >>#sdap_sudo_set_usn(state->srv_opts, usn); >>#^ ~~~ >>sssd-1.13.90/src/providers/ldap/sdap_async_sudo.c:690:5: note: 'usn' declared >>without an initial value >>#char *usn; >>#^ >>sssd-1.13.90/src/providers/ldap/sdap_async_sudo.c:704:9: note: Assuming >>'dp_error' is not equal to 0 >>#if (dp_error == DP_ERR_OK && ret != EOK) { >>#^ >>sssd-1.13.90/src/providers/ldap/sdap_async_sudo.c:704:31: note: Left side of >>'&&' is false >>#if (dp_error == DP_ERR_OK && ret != EOK) { >># ^ >>sssd-1.13.90/src/providers/ldap/sdap_async_sudo.c:717:9: note: Assuming 'ret' >>is equal to 0 >>#if (ret != EOK) { >>#^~ >>sssd-1.13.90/src/providers/ldap/sdap_async_sudo.c:717:5: note: Taking false >>branch >>#if (ret != EOK) { >>#^ >>sssd
[SSSD]Re: [PATCH] make globals in *_opts.h extern
On Thu, Dec 03, 2015 at 12:45:55PM +0100, Pavel Březina wrote: > On 12/03/2015 11:48 AM, Jakub Hrozek wrote: > >On Wed, Dec 02, 2015 at 11:58:55AM +0100, Pavel Březina wrote: > >>This solves situation where you want to use those globals on other place > >>than in *_common.c. > >> > >>I also created https://fedorahosted.org/sssd/ticket/2890 so we can avoid > >>order-dependency on header files such as sysdb_services.h which I had to fix > >>for AD patch. > > > >Looks like tests failed: > > > > http://sssd-ci.duckdns.org/logs/job/34/17/fedora20/ci-build-debug/ci-make-tests.log > > I forgot to run tests... new patches are attached. > Thanks, the patches now build. I have two questions: 1) Since we're moving code around anymore, would it make sense to also move the enums with option indexes to opts.h? I find it a bit odd they are in common.h 2) Does it make sense to remove the remaining headers (data_provider and common) from opts.h and leave only options in? ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD]Re: [PATCH] MAN: Clarify that subdomains always use service discovery
On Mon, Nov 30, 2015 at 05:41:35PM +0100, Jakub Hrozek wrote: > On Mon, Nov 30, 2015 at 05:01:53AM -0500, Dan Lavu wrote: > > Lukas, > > > > Fixed some grammatical errors and wording. Also noticed the line about > > supported AD versions, please correct me if I'm wrong, but we only > > officially support 2008R2 and 2012R2 so I modified that line stating that. > > > > Dan > > > > - Original Message - > > > > From: "Pavel Březina" > > To: sssd-devel@lists.fedorahosted.org > > Sent: Monday, November 30, 2015 4:36:10 AM > > Subject: [SSSD]Re: [PATCH] MAN: Clarify that subdomains always use service > > discovery > > > > On 11/26/2015 05:10 PM, Jakub Hrozek wrote: > > > Hi. > > > > > > attached is a simple manpage patch for #2881. > > > > Ack. > > ___ > > sssd-devel mailing list > > sssd-devel@lists.fedorahosted.org > > https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org > > > > > > Hi, > > > From 3cb1cd429c59e380a264d13d4299f76e10238799 Mon Sep 17 00:00:00 2001 > > From: Dan Lavu > > Date: Mon, 30 Nov 2015 04:51:00 -0500 > > Subject: [PATCH] Clarify that subdomains always use service discovery > > > > --- > > src/man/sssd-ad.5.xml | 16 > > 1 file changed, 12 insertions(+), 4 deletions(-) > > > > diff --git a/src/man/sssd-ad.5.xml b/src/man/sssd-ad.5.xml > > index > > 047cf046d425ed32a4983691c18dc9f8b94a0160..d3f360e7db5358d159954cdddeaf3ff38a79ce3e > > 100644 > > --- a/src/man/sssd-ad.5.xml > > +++ b/src/man/sssd-ad.5.xml > > @@ -39,12 +39,13 @@ > > > > > > The AD provider supports connecting to Active Directory 2008 R2 > > -or later. Earlier versions may work, but are unsupported. > > +and 2012 R2. Other versions may work, but are unsupported. > > I don't think we want to do this change. I know you're coming from your > downstream perspective where I would agree, but in upstream, we should > support also 2016 when it comes out and 2012.0 (as opposed to R2). > > > > > > > -The AD provider is able to provide identity information and > > -authentication for entities from trusted domains as well. > > Currently > > -only trusted domains in the same forest are recognized. > > +The AD provider can be used to get user information > > +and authenticate users from trusted domains. Currently > > +only trusted domains in the same forest are recognized. In > > +addition servers from trusted domains are always > > auto-discovered. > > > > > > The AD provider accepts the same options used by the > > @@ -121,10 +122,17 @@ ldap_id_mapping = False > > connect in order of preference. For more > > information on failover and server redundancy, > > see > > the FAILOVER section. > > + > > + > > This is optional if autodiscovery is enabled. > > For more information on service discovery, > > refer > > to the SERVICE DISCOVERY > > section. > > > > + > > +Note: Trusted domains will always auto-discover > > +servers even if the primary server is > > explicitly > > +defined in the ad_server option. > > + > > > > > > > > I agree with the rest of the changes and if you agree with dropping the > first hunk, I'll push the patch. Hi Dan, are you OK with droppig the first paragraph about Windows server supported versions? ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD]Re: [PATCH] NSS: Fix memory leak netgroup
On Thu, Dec 03, 2015 at 03:29:22PM +0100, Pavel Reichl wrote: > > On 12/03/2015 02:06 PM, Jakub Hrozek wrote: > >On Mon, Nov 30, 2015 at 06:45:00PM +0100, Pavel Reichl wrote: > >> > >> > >>On 11/30/2015 06:02 PM, Jakub Hrozek wrote: > > > >Wouldn't it then be better to see if another same object is already in > >the hashtable and free it before replacing? > > I agree it would be best. I tried that before and failed because I could > not decipher out the relation of talloc contexts. > > I tried that again. Seems that leaks are gone. Segfaults were not > happening during my testing. > >>> > Code got even messier :-( > >>> > >>>I think the code would look nice if it was placed in the else branch of > >>>create_negcache_netgr() :-) > >> > >>Done, thanks for hint. > >> > >>> > >>>I also don't think we need the tmp_ctx change.. > >> > >>I dare to disagree here. With prev. versions of patch I was experiencing > >>segfaults. IIRC step_context is being allocated on context of the netgroup > >>that is freed. I also think that it's not a good idea to allocate local data > >>on context that does not hold any reference to that data. But it might be > >>matter of personal taste. > > > >What kind of segfaults, what was the backtrace? I think it might be good to > >add talloc_zfree() instead of talloc_free() or explicitly NULL the pointer, > >that should solve the issue without adding a new context... > > > >Since it's per-domain, I wonder if step_ctx->dctx might be better > >candidate than step_ctx either way. > > > > Hello Jakub, I amended the patch as you proposed. You can now experience the > segfault yourself - just query a non-existing netgroup. > I attached the backtrace as well. > > I can investiage and use talloc reporting if you wish. But still using > tmp_ctx seems as way of the least effort... I think you are right, the step_ctx hierarchy is tricky, so the temporary context looks like the easiest solution. Please do one more change in the patch: > From 0160ee92c4d9a9542abfd3e918686526a52e367d Mon Sep 17 00:00:00 2001 > From: Pavel Reichl > Date: Fri, 27 Nov 2015 07:53:00 -0500 > Subject: [PATCH] NSS: Fix memory leak netgroup [...] > @@ -461,6 +469,11 @@ static errno_t create_negcache_netgr(struct > setent_step_ctx *step_ctx) > } > > done: > +/* Free netgroup after step_ctx is not needed. */ > +if (netgr_to_be_freed) { > +talloc_zfree(netgr_to_be_freed); > +} Since talloc_free(NULL) is a noop, we should drop the if completely. ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts
On (04/12/15 14:35), Michal Židek wrote: >On 12/04/2015 02:32 PM, Jakub Hrozek wrote: >>On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote: >>>On 12/04/2015 12:29 PM, Lukas Slebodnik wrote: On (04/12/15 12:11), Jakub Hrozek wrote: >On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote: >>On (03/12/15 20:22), Jakub Hrozek wrote: >>>On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: On (02/12/15 17:10), Michal Židek wrote: >Hi! > >I saw some integration tests failures recently, >and I think there is a race condition between the >enumeration refresh timeout and the sleeps >after some operations that wait for this timeout. >SSSD fails to populate changes from LDAP in time >and some asserts can fail because of this. > >So far I saw 4 tests to fail like this, which >is already quite a lot. > >The attached patch modifies the timeout values >and hopefully removes the issue. > >Michal >From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 >From: =?UTF-8?q?Michal=20=C5=BDidek?= >Date: Wed, 2 Dec 2015 16:44:48 +0100 >Subject: [PATCH] ldap_test.py: Modify enum cache timeouts > >There is a race condation between ldap >enumeration refresh timeout and the sleeps >that wait for the ldap changes to populate >to SSSD if the timeout and the sleeps have >the same value. >--- >src/tests/intg/ldap_test.py | 30 +- >1 file changed, 17 insertions(+), 13 deletions(-) > >diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py >index 757ee20..8ec8dbe 100644 >--- a/src/tests/intg/ldap_test.py >+++ b/src/tests/intg/ldap_test.py >@@ -33,7 +33,11 @@ import ldap_ent >from util import * > >LDAP_BASE_DN = "dc=example,dc=com" >-INTERACTIVE_TIMEOUT = 4 >+INTERACTIVE_TIMEOUT = 2 >+ >+ >+def wait_for_ldap_enum_refresh(): >+time.sleep(INTERACTIVE_TIMEOUT + 4) Why does it need to be INTERACTIVE_TIMEOUT + 4 Could it be INTERACTIVE_TIMEOUT + 3 or + 5 >>> >>>Regardless of the value we choose, can we move this patch forward? I see >>>the related failure quite often in SSSD. >>Adding timeout without real explanation is not a solution. >> >>The main problem is with empiric values. >>If they are very high then test are slow. >>And there still can be slow/fast machine where lower values caused >>troubles. >> >>The ideal solution would be to get rid of enumeration >>in ldap tests. > >Enumeration is a codepath that is different from non-enumeration, so it >should be tested. Not as priority, not as the only ldap tests, but it's >a valid case, so it should be there. > >>If we want to test enumeration than it should be in separate >>test. > >Maybe, but we do have enumeration tests and we should fix them. > Adding sleep is not a fix. It's just a workaround because all sleep timeout are just an empiric values. and we should fix test and not adding workaround/hacks. If we cannot fix the test and don't want te rewrite test without enumeration then we should remove/revert problematic tests. LS >>> >>>I will send a new patch with an explanation (sort of), >>>but it still will be a guess. I am not sure what the >>>real safe value should be, only that the sleep's >>>after operation should be longer than the ldap >>>refresh and enum cache timeouts (and that the >>>current values do not cope well wit the CI load). >> >>Would it be more acceptable then to define the ldap refresh and enum >>cache timeouts as variables in the test and sleep for (enum_timeout + >>cache_timeout + 1) ? >> >>At least that would be more readable than a magic constant.. > >Will do. All will be derived from INTERACTIVE_TIMEOUT >so that we need to change just one value if needed in the >future. > Will it be reliable? Will it work on slow(arm) machines? I plan to run integration test in "%check" phase of rpm build. And koji/fedora has rpm machines. I already have a POC patch. LS ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts
On 12/04/2015 12:42 PM, Lukas Slebodnik wrote: On (03/12/15 20:22), Jakub Hrozek wrote: On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: On (02/12/15 17:10), Michal Židek wrote: Hi! I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this. So far I saw 4 tests to fail like this, which is already quite a lot. The attached patch modifies the timeout values and hopefully removes the issue. Michal >From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20=C5=BDidek?= Date: Wed, 2 Dec 2015 16:44:48 +0100 Subject: [PATCH] ldap_test.py: Modify enum cache timeouts There is a race condation between ldap enumeration refresh timeout and the sleeps that wait for the ldap changes to populate to SSSD if the timeout and the sleeps have the same value. --- src/tests/intg/ldap_test.py | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py index 757ee20..8ec8dbe 100644 --- a/src/tests/intg/ldap_test.py +++ b/src/tests/intg/ldap_test.py @@ -33,7 +33,11 @@ import ldap_ent from util import * LDAP_BASE_DN = "dc=example,dc=com" -INTERACTIVE_TIMEOUT = 4 +INTERACTIVE_TIMEOUT = 2 + + +def wait_for_ldap_enum_refresh(): +time.sleep(INTERACTIVE_TIMEOUT + 4) Why does it need to be INTERACTIVE_TIMEOUT + 4 Could it be INTERACTIVE_TIMEOUT + 3 or + 5 Regardless of the value we choose, can we move this patch forward? I see the related failure quite often in SSSD. Adding timeout without real explanation is not a solution. The main problem is with empiric values. If they are very high then test are slow. And there still can be slow/fast machine where lower values caused troubles. The ideal solution would be to get rid of enumeration in ldap tests. If we want to test enumeration than it should be in separate test. I'm not sure we would be able to get rid of "sleep()" in enumeration test but all such values should pre properly documented why it is big enough ... Yeah, these timeouts are messy. The problem is not the empiric timeout values themselves but rather the guesswork of when certain cache state changes are supposed to happen in relation to them. If we can reason about event deadlines then we can use them. If not, and the code is too complicated, can we perhaps introduce some explicit synchronization with sssd caching mechanism, where sssd behavior will drive the tests? E.g. the tests would actually wait for sssd to do something with the cache and after sssd reports it is done, the tests will verify the time and the result? We would still get to test that sssd does something we need in the expected timeframe, but we can make the tests faster - i.e. as fast as sssd can perform and be configured to. Perhaps add something to the sss_cache tool? Nick ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD][PATCH] Keep external members of IPA groups
Hi, it seems like https://fedorahosted.org/sssd/ticket/2492 was not fixed completely. Attached are two patches that do the trick for me -- they are not polished (at the very least, the first one needs a test..) but I would like to get another opinion if they at least aim in the right direction. So my groups on IPA server are set like this: [jhrozek@unidirect] sssd $ [(ext_mem)] ipa group-show topgr --all --raw dn: cn=topgr,cn=groups,cn=accounts,dc=ipa,dc=test cn: topgr gidnumber: 24024 member: cn=ext_ipa_gr,cn=groups,cn=accounts,dc=ipa,dc=test ipaUniqueID: 193b7c04-91e9-11e5-bc59-5254005f7b59 objectClass: ipaobject objectClass: top objectClass: ipausergroup objectClass: posixgroup objectClass: groupofnames objectClass: nestedgroup [jhrozek@unidirect] sssd $ [(ext_mem)] ipa group-show ext_ipa_gr --all --raw dn: cn=ext_ipa_gr,cn=groups,cn=accounts,dc=ipa,dc=test cn: ext_ipa_gr ipaexternalmember: S-1-5-21-1897531548-1940899517-361317264-500 ipaUniqueID: ad2bd978-91e8-11e5-9d52-5254005f7b59 memberof: cn=topgr,cn=groups,cn=accounts,dc=ipa,dc=test objectClass: ipaobject objectClass: top objectClass: nestedgroup objectClass: ipausergroup objectClass: groupofnames objectClass: ipaexternalgroup The SID S-1-5-21-1897531548-1940899517-361317264-500 is an AD user (administrator) Now when I do: sudo sss_cache -E $ id -G administra...@win.trust.test $ sudo sss_cache -G $ getent group 24024 Then the "topgr" group gets resolved and the code gets into sdap_save_grpmem: 908 /* This is a temporal solution until the IPA provider is able to 909 * resolve external group membership. 910 * https://fedorahosted.org/sssd/ticket/2522 911 */ 912 if (opts->schema_type == SDAP_SCHEMA_IPA_V1) { 913 ret = sysdb_attrs_get_string(attrs, SYSDB_SID_STR, &group_sid); 914 if (ret != EOK) { 915 DEBUG(SSSDBG_TRACE_FUNC, "Failed to get group sid\n"); 916 group_sid = NULL; Here we set the group_sid variable to NULL, because topgr, being an IPA group, doesn't have a SID... 917 } 918 919 if (group_sid != NULL) { 920 ret = retain_extern_members(memctx, dom, group_name, group_sid, 921 &userdns, &nuserdns); Which means we don't call retain_extern_members at all. 922 if (ret != EOK) { 923 DEBUG(SSSDBG_TRACE_INTERNAL, 924 "retain_extern_members failed: %d:[%s].\n", 925 ret, sss_strerror(ret)); 926 } 927 } 928 } But since the whole block of code was added in the same commit, I guess it must have some purpose..so is going on without a SID the right thing to do? What was the use-case of the original code? >From 582b81d08f0ba70612aa173dae5f31263295826f Mon Sep 17 00:00:00 2001 From: Jakub Hrozek Date: Fri, 4 Dec 2015 13:51:28 +0100 Subject: [PATCH 1/2] SYSDB: Treat empty elements as not found in sysdb_attrs_get_string() --- src/db/sysdb.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/db/sysdb.c b/src/db/sysdb.c index a71364d7c4b600eafd10fafa6641eac7b2292764..97a7c52bb49dc072d371ffb657540d2ed2f3d0b7 100644 --- a/src/db/sysdb.c +++ b/src/db/sysdb.c @@ -371,7 +371,13 @@ int sysdb_attrs_get_string(struct sysdb_attrs *attrs, const char *name, return ret; } -if (el->num_values != 1) { +/* There is no guarantee that the element wasn't added before + * with sysdb_attrs_get_el() as empty, we should treat empty + * elements as not found + */ +if (el->num_values == 0) { +return ENOENT; +} else if (el->num_values != 1) { return ERANGE; } -- 2.4.3 >From c1e19528a071d4a354af1d8667c2b00edf9a5b7b Mon Sep 17 00:00:00 2001 From: Jakub Hrozek Date: Fri, 4 Dec 2015 14:11:38 +0100 Subject: [PATCH 2/2] LDAP: Link external group members of IPA groups, too --- src/providers/ldap/sdap_async_groups.c | 26 -- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/src/providers/ldap/sdap_async_groups.c b/src/providers/ldap/sdap_async_groups.c index c15a6a2bd724498aaf59b7ed3172ec4ed09abb9d..b3f7ae51edbc2b1bbd94a49722459c3938fa6d98 100644 --- a/src/providers/ldap/sdap_async_groups.c +++ b/src/providers/ldap/sdap_async_groups.c @@ -781,6 +781,12 @@ are_sids_from_same_dom(const char *sid1, const char *sid2, bool *_result) char *rid1, *rid2; bool result; +if ((sid1 == NULL || sid2 == NULL) && sid1 != sid2) { +/* IPA and AD group probably */ +*_result = false; +return EOK; +} + rid1 = strrchr(sid1, '-'); if (rid1 == NULL) { return EINVAL; @@ -911,19 +917,19 @@ static int sdap_save_grpmem(TALLOC_CTX *memctx, */ if (opts->schema_type == SDAP_SCHEMA_IPA_V1) { ret = sysdb_attrs_get_string(attrs, SYSDB_SID_STR, &group_sid); -if (ret != EOK) { -DEBUG(SSSDBG_TRACE_
[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts
On 12/04/2015 02:32 PM, Jakub Hrozek wrote: On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote: On 12/04/2015 12:29 PM, Lukas Slebodnik wrote: On (04/12/15 12:11), Jakub Hrozek wrote: On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote: On (03/12/15 20:22), Jakub Hrozek wrote: On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: On (02/12/15 17:10), Michal Židek wrote: Hi! I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this. So far I saw 4 tests to fail like this, which is already quite a lot. The attached patch modifies the timeout values and hopefully removes the issue. Michal >From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20=C5=BDidek?= Date: Wed, 2 Dec 2015 16:44:48 +0100 Subject: [PATCH] ldap_test.py: Modify enum cache timeouts There is a race condation between ldap enumeration refresh timeout and the sleeps that wait for the ldap changes to populate to SSSD if the timeout and the sleeps have the same value. --- src/tests/intg/ldap_test.py | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py index 757ee20..8ec8dbe 100644 --- a/src/tests/intg/ldap_test.py +++ b/src/tests/intg/ldap_test.py @@ -33,7 +33,11 @@ import ldap_ent >from util import * LDAP_BASE_DN = "dc=example,dc=com" -INTERACTIVE_TIMEOUT = 4 +INTERACTIVE_TIMEOUT = 2 + + +def wait_for_ldap_enum_refresh(): +time.sleep(INTERACTIVE_TIMEOUT + 4) Why does it need to be INTERACTIVE_TIMEOUT + 4 Could it be INTERACTIVE_TIMEOUT + 3 or + 5 Regardless of the value we choose, can we move this patch forward? I see the related failure quite often in SSSD. Adding timeout without real explanation is not a solution. The main problem is with empiric values. If they are very high then test are slow. And there still can be slow/fast machine where lower values caused troubles. The ideal solution would be to get rid of enumeration in ldap tests. Enumeration is a codepath that is different from non-enumeration, so it should be tested. Not as priority, not as the only ldap tests, but it's a valid case, so it should be there. If we want to test enumeration than it should be in separate test. Maybe, but we do have enumeration tests and we should fix them. Adding sleep is not a fix. It's just a workaround because all sleep timeout are just an empiric values. and we should fix test and not adding workaround/hacks. If we cannot fix the test and don't want te rewrite test without enumeration then we should remove/revert problematic tests. LS I will send a new patch with an explanation (sort of), but it still will be a guess. I am not sure what the real safe value should be, only that the sleep's after operation should be longer than the ldap refresh and enum cache timeouts (and that the current values do not cope well wit the CI load). Would it be more acceptable then to define the ldap refresh and enum cache timeouts as variables in the test and sleep for (enum_timeout + cache_timeout + 1) ? At least that would be more readable than a magic constant.. Will do. All will be derived from INTERACTIVE_TIMEOUT so that we need to change just one value if needed in the future. Lukas is right that I pulled the values out of place where no knowledge resides so let us make a compromise. I will push this patch to the CI a lot of times (let's say 40-50) over the weekend and see if it fails. I am also not sure if lowering the INTERACTIVE_TIMEOUT was a good idea, I did it in order to make the execution a little faster and it is a timeout that needs to be shorter than the sleeps that wait for the ldap change to propagate to sysdb (and I did not want the sleeps to be too long). With the weekend "stress testing" we can hopefully avoid additional adjustments with new patches. Michal ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts
On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote: > On 12/04/2015 12:29 PM, Lukas Slebodnik wrote: > >On (04/12/15 12:11), Jakub Hrozek wrote: > >>On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote: > >>>On (03/12/15 20:22), Jakub Hrozek wrote: > On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: > >On (02/12/15 17:10), Michal Židek wrote: > >>Hi! > >> > >>I saw some integration tests failures recently, > >>and I think there is a race condition between the > >>enumeration refresh timeout and the sleeps > >>after some operations that wait for this timeout. > >>SSSD fails to populate changes from LDAP in time > >>and some asserts can fail because of this. > >> > >>So far I saw 4 tests to fail like this, which > >>is already quite a lot. > >> > >>The attached patch modifies the timeout values > >>and hopefully removes the issue. > >> > >>Michal > > > >>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 > >>From: =?UTF-8?q?Michal=20=C5=BDidek?= > >>Date: Wed, 2 Dec 2015 16:44:48 +0100 > >>Subject: [PATCH] ldap_test.py: Modify enum cache timeouts > >> > >>There is a race condation between ldap > >>enumeration refresh timeout and the sleeps > >>that wait for the ldap changes to populate > >>to SSSD if the timeout and the sleeps have > >>the same value. > >>--- > >>src/tests/intg/ldap_test.py | 30 +- > >>1 file changed, 17 insertions(+), 13 deletions(-) > >> > >>diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py > >>index 757ee20..8ec8dbe 100644 > >>--- a/src/tests/intg/ldap_test.py > >>+++ b/src/tests/intg/ldap_test.py > >>@@ -33,7 +33,11 @@ import ldap_ent > >>from util import * > >> > >>LDAP_BASE_DN = "dc=example,dc=com" > >>-INTERACTIVE_TIMEOUT = 4 > >>+INTERACTIVE_TIMEOUT = 2 > >>+ > >>+ > >>+def wait_for_ldap_enum_refresh(): > >>+time.sleep(INTERACTIVE_TIMEOUT + 4) > >Why does it need to be INTERACTIVE_TIMEOUT + 4 > > > >Could it be INTERACTIVE_TIMEOUT + 3 or + 5 > > > > Regardless of the value we choose, can we move this patch forward? I see > the related failure quite often in SSSD. > >>>Adding timeout without real explanation is not a solution. > >>> > >>>The main problem is with empiric values. > >>>If they are very high then test are slow. > >>>And there still can be slow/fast machine where lower values caused > >>>troubles. > >>> > >>>The ideal solution would be to get rid of enumeration > >>>in ldap tests. > >> > >>Enumeration is a codepath that is different from non-enumeration, so it > >>should be tested. Not as priority, not as the only ldap tests, but it's > >>a valid case, so it should be there. > >> > >>>If we want to test enumeration than it should be in separate > >>>test. > >> > >>Maybe, but we do have enumeration tests and we should fix them. > >> > >Adding sleep is not a fix. It's just a workaround > >because all sleep timeout are just an empiric values. > >and we should fix test and not adding workaround/hacks. > > > >If we cannot fix the test and don't want te rewrite test without enumeration > >then we should remove/revert problematic tests. > > > >LS > > I will send a new patch with an explanation (sort of), > but it still will be a guess. I am not sure what the > real safe value should be, only that the sleep's > after operation should be longer than the ldap > refresh and enum cache timeouts (and that the > current values do not cope well wit the CI load). Would it be more acceptable then to define the ldap refresh and enum cache timeouts as variables in the test and sleep for (enum_timeout + cache_timeout + 1) ? At least that would be more readable than a magic constant.. > > Lukas is right that I pulled the values out of > place where no knowledge resides so let us make > a compromise. I will push this patch to the > CI a lot of times (let's say 40-50) over the > weekend and see if it fails. > > I am also not sure if lowering the INTERACTIVE_TIMEOUT > was a good idea, I did it in order to make the > execution a little faster and it is a timeout that > needs to be shorter than the sleeps that wait > for the ldap change to propagate to sysdb (and I did not > want the sleeps to be too long). With > the weekend "stress testing" we can hopefully avoid > additional adjustments with new patches. > > Michal > ___ > sssd-devel mailing list > sssd-devel@lists.fedorahosted.org > https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts
On 12/04/2015 12:29 PM, Lukas Slebodnik wrote: On (04/12/15 12:11), Jakub Hrozek wrote: On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote: On (03/12/15 20:22), Jakub Hrozek wrote: On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: On (02/12/15 17:10), Michal Židek wrote: Hi! I saw some integration tests failures recently, and I think there is a race condition between the enumeration refresh timeout and the sleeps after some operations that wait for this timeout. SSSD fails to populate changes from LDAP in time and some asserts can fail because of this. So far I saw 4 tests to fail like this, which is already quite a lot. The attached patch modifies the timeout values and hopefully removes the issue. Michal >From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20=C5=BDidek?= Date: Wed, 2 Dec 2015 16:44:48 +0100 Subject: [PATCH] ldap_test.py: Modify enum cache timeouts There is a race condation between ldap enumeration refresh timeout and the sleeps that wait for the ldap changes to populate to SSSD if the timeout and the sleeps have the same value. --- src/tests/intg/ldap_test.py | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py index 757ee20..8ec8dbe 100644 --- a/src/tests/intg/ldap_test.py +++ b/src/tests/intg/ldap_test.py @@ -33,7 +33,11 @@ import ldap_ent from util import * LDAP_BASE_DN = "dc=example,dc=com" -INTERACTIVE_TIMEOUT = 4 +INTERACTIVE_TIMEOUT = 2 + + +def wait_for_ldap_enum_refresh(): +time.sleep(INTERACTIVE_TIMEOUT + 4) Why does it need to be INTERACTIVE_TIMEOUT + 4 Could it be INTERACTIVE_TIMEOUT + 3 or + 5 Regardless of the value we choose, can we move this patch forward? I see the related failure quite often in SSSD. Adding timeout without real explanation is not a solution. The main problem is with empiric values. If they are very high then test are slow. And there still can be slow/fast machine where lower values caused troubles. The ideal solution would be to get rid of enumeration in ldap tests. Enumeration is a codepath that is different from non-enumeration, so it should be tested. Not as priority, not as the only ldap tests, but it's a valid case, so it should be there. If we want to test enumeration than it should be in separate test. Maybe, but we do have enumeration tests and we should fix them. Adding sleep is not a fix. It's just a workaround because all sleep timeout are just an empiric values. and we should fix test and not adding workaround/hacks. If we cannot fix the test and don't want te rewrite test without enumeration then we should remove/revert problematic tests. LS I will send a new patch with an explanation (sort of), but it still will be a guess. I am not sure what the real safe value should be, only that the sleep's after operation should be longer than the ldap refresh and enum cache timeouts (and that the current values do not cope well wit the CI load). Lukas is right that I pulled the values out of place where no knowledge resides so let us make a compromise. I will push this patch to the CI a lot of times (let's say 40-50) over the weekend and see if it fails. I am also not sure if lowering the INTERACTIVE_TIMEOUT was a good idea, I did it in order to make the execution a little faster and it is a timeout that needs to be shorter than the sleeps that wait for the ldap change to propagate to sysdb (and I did not want the sleeps to be too long). With the weekend "stress testing" we can hopefully avoid additional adjustments with new patches. Michal ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts
On (04/12/15 12:11), Jakub Hrozek wrote: >On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote: >> On (03/12/15 20:22), Jakub Hrozek wrote: >> >On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: >> >> On (02/12/15 17:10), Michal Židek wrote: >> >> >Hi! >> >> > >> >> >I saw some integration tests failures recently, >> >> >and I think there is a race condition between the >> >> >enumeration refresh timeout and the sleeps >> >> >after some operations that wait for this timeout. >> >> >SSSD fails to populate changes from LDAP in time >> >> >and some asserts can fail because of this. >> >> > >> >> >So far I saw 4 tests to fail like this, which >> >> >is already quite a lot. >> >> > >> >> >The attached patch modifies the timeout values >> >> >and hopefully removes the issue. >> >> > >> >> >Michal >> >> >> >> >From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 >> >> >From: =?UTF-8?q?Michal=20=C5=BDidek?= >> >> >Date: Wed, 2 Dec 2015 16:44:48 +0100 >> >> >Subject: [PATCH] ldap_test.py: Modify enum cache timeouts >> >> > >> >> >There is a race condation between ldap >> >> >enumeration refresh timeout and the sleeps >> >> >that wait for the ldap changes to populate >> >> >to SSSD if the timeout and the sleeps have >> >> >the same value. >> >> >--- >> >> > src/tests/intg/ldap_test.py | 30 +- >> >> > 1 file changed, 17 insertions(+), 13 deletions(-) >> >> > >> >> >diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py >> >> >index 757ee20..8ec8dbe 100644 >> >> >--- a/src/tests/intg/ldap_test.py >> >> >+++ b/src/tests/intg/ldap_test.py >> >> >@@ -33,7 +33,11 @@ import ldap_ent >> >> > from util import * >> >> > >> >> > LDAP_BASE_DN = "dc=example,dc=com" >> >> >-INTERACTIVE_TIMEOUT = 4 >> >> >+INTERACTIVE_TIMEOUT = 2 >> >> >+ >> >> >+ >> >> >+def wait_for_ldap_enum_refresh(): >> >> >+time.sleep(INTERACTIVE_TIMEOUT + 4) >> >> Why does it need to be INTERACTIVE_TIMEOUT + 4 >> >> >> >> Could it be INTERACTIVE_TIMEOUT + 3 or + 5 >> >> >> > >> >Regardless of the value we choose, can we move this patch forward? I see >> >the related failure quite often in SSSD. >> Adding timeout without real explanation is not a solution. >> >> The main problem is with empiric values. >> If they are very high then test are slow. >> And there still can be slow/fast machine where lower values caused troubles. >> >> The ideal solution would be to get rid of enumeration >> in ldap tests. > >Enumeration is a codepath that is different from non-enumeration, so it >should be tested. Not as priority, not as the only ldap tests, but it's >a valid case, so it should be there. > >> If we want to test enumeration than it should be in separate >> test. > >Maybe, but we do have enumeration tests and we should fix them. > Adding sleep is not a fix. It's just a workaround because all sleep timeout are just an empiric values. and we should fix test and not adding workaround/hacks. If we cannot fix the test and don't want te rewrite test without enumeration then we should remove/revert problematic tests. LS ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts
On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote: > On (03/12/15 20:22), Jakub Hrozek wrote: > >On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: > >> On (02/12/15 17:10), Michal Židek wrote: > >> >Hi! > >> > > >> >I saw some integration tests failures recently, > >> >and I think there is a race condition between the > >> >enumeration refresh timeout and the sleeps > >> >after some operations that wait for this timeout. > >> >SSSD fails to populate changes from LDAP in time > >> >and some asserts can fail because of this. > >> > > >> >So far I saw 4 tests to fail like this, which > >> >is already quite a lot. > >> > > >> >The attached patch modifies the timeout values > >> >and hopefully removes the issue. > >> > > >> >Michal > >> > >> >From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 > >> >From: =?UTF-8?q?Michal=20=C5=BDidek?= > >> >Date: Wed, 2 Dec 2015 16:44:48 +0100 > >> >Subject: [PATCH] ldap_test.py: Modify enum cache timeouts > >> > > >> >There is a race condation between ldap > >> >enumeration refresh timeout and the sleeps > >> >that wait for the ldap changes to populate > >> >to SSSD if the timeout and the sleeps have > >> >the same value. > >> >--- > >> > src/tests/intg/ldap_test.py | 30 +- > >> > 1 file changed, 17 insertions(+), 13 deletions(-) > >> > > >> >diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py > >> >index 757ee20..8ec8dbe 100644 > >> >--- a/src/tests/intg/ldap_test.py > >> >+++ b/src/tests/intg/ldap_test.py > >> >@@ -33,7 +33,11 @@ import ldap_ent > >> > from util import * > >> > > >> > LDAP_BASE_DN = "dc=example,dc=com" > >> >-INTERACTIVE_TIMEOUT = 4 > >> >+INTERACTIVE_TIMEOUT = 2 > >> >+ > >> >+ > >> >+def wait_for_ldap_enum_refresh(): > >> >+time.sleep(INTERACTIVE_TIMEOUT + 4) > >> Why does it need to be INTERACTIVE_TIMEOUT + 4 > >> > >> Could it be INTERACTIVE_TIMEOUT + 3 or + 5 > >> > > > >Regardless of the value we choose, can we move this patch forward? I see > >the related failure quite often in SSSD. > Adding timeout without real explanation is not a solution. > > The main problem is with empiric values. > If they are very high then test are slow. > And there still can be slow/fast machine where lower values caused troubles. > > The ideal solution would be to get rid of enumeration > in ldap tests. Enumeration is a codepath that is different from non-enumeration, so it should be tested. Not as priority, not as the only ldap tests, but it's a valid case, so it should be there. > If we want to test enumeration than it should be in separate > test. Maybe, but we do have enumeration tests and we should fix them. > I'm not sure we would be able to get rid of "sleep()" > in enumeration test but all such values should pre properly documented why it > is big enough ... > > LS > ___ > sssd-devel mailing list > sssd-devel@lists.fedorahosted.org > https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts
On (03/12/15 20:22), Jakub Hrozek wrote: >On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: >> On (02/12/15 17:10), Michal Židek wrote: >> >Hi! >> > >> >I saw some integration tests failures recently, >> >and I think there is a race condition between the >> >enumeration refresh timeout and the sleeps >> >after some operations that wait for this timeout. >> >SSSD fails to populate changes from LDAP in time >> >and some asserts can fail because of this. >> > >> >So far I saw 4 tests to fail like this, which >> >is already quite a lot. >> > >> >The attached patch modifies the timeout values >> >and hopefully removes the issue. >> > >> >Michal >> >> >From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001 >> >From: =?UTF-8?q?Michal=20=C5=BDidek?= >> >Date: Wed, 2 Dec 2015 16:44:48 +0100 >> >Subject: [PATCH] ldap_test.py: Modify enum cache timeouts >> > >> >There is a race condation between ldap >> >enumeration refresh timeout and the sleeps >> >that wait for the ldap changes to populate >> >to SSSD if the timeout and the sleeps have >> >the same value. >> >--- >> > src/tests/intg/ldap_test.py | 30 +- >> > 1 file changed, 17 insertions(+), 13 deletions(-) >> > >> >diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py >> >index 757ee20..8ec8dbe 100644 >> >--- a/src/tests/intg/ldap_test.py >> >+++ b/src/tests/intg/ldap_test.py >> >@@ -33,7 +33,11 @@ import ldap_ent >> > from util import * >> > >> > LDAP_BASE_DN = "dc=example,dc=com" >> >-INTERACTIVE_TIMEOUT = 4 >> >+INTERACTIVE_TIMEOUT = 2 >> >+ >> >+ >> >+def wait_for_ldap_enum_refresh(): >> >+time.sleep(INTERACTIVE_TIMEOUT + 4) >> Why does it need to be INTERACTIVE_TIMEOUT + 4 >> >> Could it be INTERACTIVE_TIMEOUT + 3 or + 5 >> > >Regardless of the value we choose, can we move this patch forward? I see >the related failure quite often in SSSD. Adding timeout without real explanation is not a solution. The main problem is with empiric values. If they are very high then test are slow. And there still can be slow/fast machine where lower values caused troubles. The ideal solution would be to get rid of enumeration in ldap tests. If we want to test enumeration than it should be in separate test. I'm not sure we would be able to get rid of "sleep()" in enumeration test but all such values should pre properly documented why it is big enough ... LS ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
[SSSD]Re: about fedorahosted-to-github mirror
On 12/03/2015 09:00 PM, Jakub Hrozek wrote: Hi, I was looking at options we have for setting up an automated way to mirror our fedorahosted.org repo to github.com. Unfortunately, the github mirror functionality seems to be discontinued[*], so the next best thing to do is to set up a github deploy key: https://developer.github.com/guides/managing-deploy-keys/#deploy-keys The private key would be on the machine we'd mirror from, the public key would be uploaded to github. My question is -- do we want to set up the push job on fedorahosted.org or one of our machines? 1) fedorahosted.org [+] We don't have to manage the machine, dedicated admins do [-] We'd have to give read ACL to an identity that pushes /all/ fedorahosted.org projects. 2) Our own (CI?) machines [+] We manage the machine with the private key. We keep control of the key. [-] We manage the machine with the private key. We're developers, not admins. I would personally prefer 1) because if the git user on fedorahosted is compromised, all bets are off anyway and the concern about a push key to our /mirror/ repo would not be the primary one. But at the same time, I don't feel comfortable doing the decision without asking the list. I also prefer 1). So -- is anyone opposed to me asking fedorahosted.org to generate a keypair and giving us the public key that I would upload to github? Thanks! [*] github has gained enough traction already, so they don't care about this functionality anymore.. ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org