[SSSD]Re: [PATCH] NSS: Fix memory leak netgroup

2015-12-04 Thread Pavel Reichl



On 12/04/2015 03:37 PM, Jakub Hrozek wrote:

On Thu, Dec 03, 2015 at 03:29:22PM +0100, Pavel Reichl wrote:


On 12/03/2015 02:06 PM, Jakub Hrozek wrote:

On Mon, Nov 30, 2015 at 06:45:00PM +0100, Pavel Reichl wrote:



On 11/30/2015 06:02 PM, Jakub Hrozek wrote:


Wouldn't it then be better to see if another same object is already in
the hashtable and free it before replacing?


I agree it would be best. I tried that before and failed because I could not 
decipher out the relation of talloc contexts.

I tried that again. Seems that leaks are gone. Segfaults were not happening 
during my testing.



Code got even messier :-(


I think the code would look nice if it was placed in the else branch of
create_negcache_netgr() :-)


Done, thanks for hint.



I also don't think we need the tmp_ctx change..


I dare to disagree here. With prev. versions of patch I was experiencing
segfaults. IIRC step_context is being allocated on context of the netgroup
that is freed. I also think that it's not a good idea to allocate local data
on context that does not hold any reference to that data. But it might be
matter of personal taste.


What kind of segfaults, what was the backtrace? I think it might be good to
add talloc_zfree() instead of talloc_free() or explicitly NULL the pointer,
that should solve the issue without adding a new context...

Since it's per-domain, I wonder if step_ctx->dctx might be better
candidate than step_ctx either way.



Hello Jakub, I amended the patch as you proposed. You can now experience the 
segfault yourself - just query a non-existing netgroup.
I attached the backtrace as well.

I can investiage and use talloc reporting if you wish. But still using tmp_ctx 
seems as way of the least effort...


I think you are right, the step_ctx hierarchy is tricky, so the
temporary context looks like the easiest solution. Please do one more
change in the patch:


OK, please see attached patch.




 From 0160ee92c4d9a9542abfd3e918686526a52e367d Mon Sep 17 00:00:00 2001
From: Pavel Reichl 
Date: Fri, 27 Nov 2015 07:53:00 -0500
Subject: [PATCH] NSS: Fix memory leak netgroup


[...]


@@ -461,6 +469,11 @@ static errno_t create_negcache_netgr(struct 
setent_step_ctx *step_ctx)
  }

  done:
+/* Free netgroup after step_ctx is not needed. */
+if (netgr_to_be_freed) {
+talloc_zfree(netgr_to_be_freed);
+}


Since talloc_free(NULL) is a noop, we should drop the if completely.


OK, done.


___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org

>From 47e73fdd381240b67eba1d7e1de44a589388c503 Mon Sep 17 00:00:00 2001
From: Pavel Reichl 
Date: Fri, 27 Nov 2015 07:53:00 -0500
Subject: [PATCH] NSS: Fix memory leak netgroup

If netgroup cannot be found in setnetgrent_retry() function
set_netgroup_entry() is called which steals getent_ctx directly to
nss_ctx->netgroups.
Subsequently function lookup_netgr_step() is called that (in case of
nenexisting group) will call create_negcache_netgr() which creates
a new dummy object to serve as negative cache. While doing so it calls
again set_netgroup_entry() for the same netgroup and it calls
hash_enter.

hash_enter will remove previously hashed entry for netgroup (created in
setnetgrent_retry()) from hash table but it won't be freed and thus it
leaks. This patch marks such netgroup and freed it after the call of
create_negcache_netgr().

Resolves:
https://fedorahosted.org/sssd/ticket/2865
---
 src/responder/nss/nsssrv_netgroup.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/src/responder/nss/nsssrv_netgroup.c b/src/responder/nss/nsssrv_netgroup.c
index 9a78c1119c2f4e06e43ebec29ace775adc997e08..4647a20af6b0f8e00c120839bc7e880e89156de6 100644
--- a/src/responder/nss/nsssrv_netgroup.c
+++ b/src/responder/nss/nsssrv_netgroup.c
@@ -434,6 +434,7 @@ static errno_t create_negcache_netgr(struct setent_step_ctx *step_ctx)
 {
 errno_t ret;
 struct getent_ctx *netgr;
+struct getent_ctx *netgr_to_be_freed = NULL;
 
 netgr = talloc_zero(step_ctx->nctx, struct getent_ctx);
 if (netgr == NULL) {
@@ -441,6 +442,13 @@ static errno_t create_negcache_netgr(struct setent_step_ctx *step_ctx)
 ret = ENOMEM;
 goto done;
 } else {
+/* Is there already netgroup with such name? */
+ret = get_netgroup_entry(step_ctx->nctx, step_ctx->name,
+ &netgr_to_be_freed);
+if (ret != EOK) {
+netgr_to_be_freed = NULL;
+}
+
 netgr->ready = true;
 netgr->found = false;
 netgr->entries = NULL;
@@ -461,6 +469,8 @@ static errno_t create_negcache_netgr(struct setent_step_ctx *step_ctx)
 }
 
 done:
+talloc_zfree(netgr_to_be_freed);
+
 if (ret != EOK) {
 talloc_free(netgr);
 }
@@ -474,6 +484,12 @@ static errno_t lookup_netgr_step(struct setent_step_ctx *step_ctx)
 

[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts

2015-12-04 Thread Lukas Slebodnik
On (04/12/15 16:02), Michal Židek wrote:
>On 12/04/2015 03:07 PM, Lukas Slebodnik wrote:
>>On (04/12/15 14:35), Michal Židek wrote:
>>>On 12/04/2015 02:32 PM, Jakub Hrozek wrote:
On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote:
>On 12/04/2015 12:29 PM, Lukas Slebodnik wrote:
>>On (04/12/15 12:11), Jakub Hrozek wrote:
>>>On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:
On (03/12/15 20:22), Jakub Hrozek wrote:
>On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:
>>On (02/12/15 17:10), Michal Židek wrote:
>>>Hi!
>>>
>>>I saw some integration tests failures recently,
>>>and I think there is a race condition between the
>>>enumeration refresh timeout and the sleeps
>>>after some operations that wait for this timeout.
>>>SSSD fails to populate changes from LDAP in time
>>>and some asserts can fail because of this.
>>>
>>>So far I saw 4 tests to fail like this, which
>>>is already quite a lot.
>>>
>>>The attached patch modifies the timeout values
>>>and hopefully removes the issue.
>>>
>>>Michal
>>
>>>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 
>>>2001
>>>From: =?UTF-8?q?Michal=20=C5=BDidek?= 
>>>Date: Wed, 2 Dec 2015 16:44:48 +0100
>>>Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
>>>
>>>There is a race condation between ldap
>>>enumeration refresh timeout and the sleeps
>>>that wait for the ldap changes to populate
>>>to SSSD if the timeout and the sleeps have
>>>the same value.
>>>---
>>>src/tests/intg/ldap_test.py | 30 +-
>>>1 file changed, 17 insertions(+), 13 deletions(-)
>>>
>>>diff --git a/src/tests/intg/ldap_test.py 
>>>b/src/tests/intg/ldap_test.py
>>>index 757ee20..8ec8dbe 100644
>>>--- a/src/tests/intg/ldap_test.py
>>>+++ b/src/tests/intg/ldap_test.py
>>>@@ -33,7 +33,11 @@ import ldap_ent
>>>from util import *
>>>
>>>LDAP_BASE_DN = "dc=example,dc=com"
>>>-INTERACTIVE_TIMEOUT = 4
>>>+INTERACTIVE_TIMEOUT = 2
>>>+
>>>+
>>>+def wait_for_ldap_enum_refresh():
>>>+time.sleep(INTERACTIVE_TIMEOUT + 4)
>>Why does it need to be INTERACTIVE_TIMEOUT + 4
>>
>>Could it be INTERACTIVE_TIMEOUT + 3 or + 5
>>
>
>Regardless of the value we choose, can we move this patch forward? I 
>see
>the related failure quite often in SSSD.
Adding timeout without real explanation is not a solution.

The main problem is with empiric values.
If they are very high then test are slow.
And there still can be slow/fast machine where lower values caused 
troubles.

The ideal solution would be to get rid of enumeration
in ldap tests.
>>>
>>>Enumeration is a codepath that is different from non-enumeration, so it
>>>should be tested. Not as priority, not as the only ldap tests, but it's
>>>a valid case, so it should be there.
>>>
If we want to test enumeration than it should be in separate
test.
>>>
>>>Maybe, but we do have enumeration tests and we should fix them.
>>>
>>Adding sleep is not a fix. It's just a workaround
>>because all sleep timeout are just an empiric values.
>>and we should fix test and not adding workaround/hacks.
>>
>>If we cannot fix the test and don't want te rewrite test without 
>>enumeration
>>then we should remove/revert problematic tests.
>>
>>LS
>
>I will send a new patch with an explanation (sort of),
>but it still will be a guess. I am not sure what the
>real safe value should be, only that the sleep's
>after operation should be longer than the ldap
>refresh and enum cache timeouts (and that the
>current values do not cope well wit the CI load).

Would it be more acceptable then to define the ldap refresh and enum
cache timeouts as variables in the test and sleep for (enum_timeout +
cache_timeout + 1) ?

At least that would be more readable than a magic constant..
>>>
>>>Will do. All will be derived from INTERACTIVE_TIMEOUT
>>>so that we need to change just one value if needed in the
>>>future.
>>>
>>Will it be reliable?
>>
>>Will it work on slow(arm) machines?
>>
>>I plan to run integration test in "%check" phase
>>of rpm build. And koji/fedora has rpm machines.
>
>We can mark tests that may fail on slow machines
>due to timeouts as "unsafe" and skip them in the
>rpm build.
It's not about rpmbuild, it's a general problem.
And marking tests as unsafe does not solve anything.
We need a reliable way how to find out that enumeratin task 

[SSSD]Re: [PATCH] Keep external members of IPA groups

2015-12-04 Thread Jakub Hrozek
On Fri, Dec 04, 2015 at 04:05:52PM +0100, Sumit Bose wrote:
> On Fri, Dec 04, 2015 at 02:47:17PM +0100, Jakub Hrozek wrote:
> > Hi,
> > 
> > it seems like https://fedorahosted.org/sssd/ticket/2492 was not fixed
> > completely. Attached are two patches that do the trick for me -- they
> > are not polished (at the very least, the first one needs a test..) but I
> > would like to get another opinion if they at least aim in the right
> > direction.
> > 
> > So my groups on IPA server are set like this:
> > 
> > [jhrozek@unidirect] sssd $ [(ext_mem)] ipa group-show topgr --all --raw
> >   dn: cn=topgr,cn=groups,cn=accounts,dc=ipa,dc=test
> >   cn: topgr
> >   gidnumber: 24024
> >   member: cn=ext_ipa_gr,cn=groups,cn=accounts,dc=ipa,dc=test
> >   ipaUniqueID: 193b7c04-91e9-11e5-bc59-5254005f7b59
> >   objectClass: ipaobject
> >   objectClass: top
> >   objectClass: ipausergroup
> >   objectClass: posixgroup
> >   objectClass: groupofnames
> >   objectClass: nestedgroup
> > [jhrozek@unidirect] sssd $ [(ext_mem)] ipa group-show ext_ipa_gr --all --raw
> >   dn: cn=ext_ipa_gr,cn=groups,cn=accounts,dc=ipa,dc=test
> >   cn: ext_ipa_gr
> >   ipaexternalmember: S-1-5-21-1897531548-1940899517-361317264-500
> >   ipaUniqueID: ad2bd978-91e8-11e5-9d52-5254005f7b59
> >   memberof: cn=topgr,cn=groups,cn=accounts,dc=ipa,dc=test
> >   objectClass: ipaobject
> >   objectClass: top
> >   objectClass: nestedgroup
> >   objectClass: ipausergroup
> >   objectClass: groupofnames
> >   objectClass: ipaexternalgroup
> > 
> > The SID S-1-5-21-1897531548-1940899517-361317264-500 is an AD user
> > (administrator)
> > 
> > Now when I do:
> > sudo sss_cache -E
> > $ id -G administra...@win.trust.test
> > $ sudo sss_cache -G
> > $ getent group 24024
> > 
> > Then the "topgr" group gets resolved and the code gets into 
> > sdap_save_grpmem:
> > 908 /* This is a temporal solution until the IPA provider is able to
> > 909  * resolve external group membership.
> > 910  * https://fedorahosted.org/sssd/ticket/2522
> > 911  */
> > 912 if (opts->schema_type == SDAP_SCHEMA_IPA_V1) {
> > 913 ret = sysdb_attrs_get_string(attrs, SYSDB_SID_STR, &group_sid);
> > 914 if (ret != EOK) {
> > 915 DEBUG(SSSDBG_TRACE_FUNC, "Failed to get group sid\n");
> > 916 group_sid = NULL;
> > 
> > Here we set the group_sid variable to NULL, because topgr, being an IPA
> > group, doesn't have a SID...
> 
> why does it not have a SID.

I don't know, even re-running:
 # ipa-adtrust-install --add-sids
Tells me that "Sidgen task plugin already configured, nothing to do". I
guess my test IPA install is not in the best shape, maybe I should just
refresh it..

> I guess it has a GID since we got this far.
> IPA with trust enabled should automatically add SIDs to new POSIX groups
> and groups created before trust setup should get them from running the
> sidgen task. It is important that the groups have a SID because
> otherwise they cannot be added to the PAC causing different views of the
> groups membership see by the PAC responder and the ID provider.

Makes sense.

> 
> bye,
> Sumit
> ___
> sssd-devel mailing list
> sssd-devel@lists.fedorahosted.org
> https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD]Re: [PATCH] Reduce the code duplication in Data Provider

2015-12-04 Thread Jakub Hrozek
On Thu, Dec 03, 2015 at 01:24:54PM +0100, Pavel Březina wrote:
> On 11/20/2015 12:04 PM, Jakub Hrozek wrote:
> >On Thu, Nov 19, 2015 at 01:51:54PM +0100, Pavel Březina wrote:
> >>>Thanks, new patches are attached.
> >>
> >>I had a phone call with Jakub and we decided that it will be better to use
> >>be_req directly instead of lower level sbus_req. This will allow us to
> >>simplify it more.
> >
> >OK, see the attached patches..
> 
> Hi, thanks for the changes.
> 
> First patch:
> 
> >@@ -199,6 +205,7 @@ struct be_req *be_req_create(TALLOC_CTX *mem_ctx,
> > be_req->domain = be_ctx->domain;
> > be_req->fn = fn;
> > be_req->pvt = pvt_fn_data;
> >+be_req->req_name = name;
> 
> I know every call now passes static data but IMHO we should be safe and use
> talloc_strdup here.
> 
> >+static errno_t be_sbus_reply(struct sbus_request *sbus_req,
> >+ dbus_uint16_t err_maj,
> >+ dbus_uint32_t err_min,
> >+ const char *err_msg)
> >+{
> >+errno_t ret;
> >+const char *safe_err_msg;
> >+
> >+/* Only return a reply if one was requested
> >+ * There may not be one if this request began
> >+ * while we were offline
> >+ */
> >+if (sbus_req == NULL) {
> >+return EOK;
> >+}
> >+
> >+safe_err_msg = safe_be_req_err_msg(err_msg, err_maj);
> >+
> >+if (err_maj == DP_ERR_FATAL && err_min == ENODEV) {
> >+DEBUG(SSSDBG_TRACE_LIBS, "Handler not configured\n");
> >+} else {
> >+DEBUG(SSSDBG_TRACE_LIBS,
> >+  "Request processed. Returned %d,%d,%s\n",
> >+  err_maj, err_min, err_msg);
> >+}
> 
> Can we translate err_maj into string here? To get message similar to:
> "Request processed: Success [%d]: %s, err_min, err_msg
> "Request processed: Failure [%d]: %s, err_min, err_msg
> "Request processed: Offline [%d]: %s, err_min, err_msg
> 
> Second patch:
> 
> >+#define BE_SBUS_REPLY_DATA_INIT { .err_maj = DP_ERR_FATAL, \
> >+  .err_min = EFAULT, \
> >+  .err_msg = "Fatal error" \
> >+};
> 
> ERR_INTERNAL might be a better choice over EFAULT since it basically means
> we forgot to set it -> code error.

OK, see the attached two.
>From 0bff8916f35a39ff8fdd4789a31289f7d1b05877 Mon Sep 17 00:00:00 2001
From: Jakub Hrozek 
Date: Wed, 11 Nov 2015 13:39:43 +0100
Subject: [PATCH 1/2] DP: Reduce code duplication in the callback handlers

Instead of calling sbus_request_return_and_finish() directly with the
same checks copied over, add a be_sbus_reply() helper instead.
---
 src/providers/ad/ad_subdomains.c   |   2 +-
 src/providers/data_provider_be.c   | 355 +
 src/providers/dp_backend.h |  11 +-
 src/providers/ipa/ipa_subdomains.c |   2 +-
 4 files changed, 138 insertions(+), 232 deletions(-)

diff --git a/src/providers/ad/ad_subdomains.c b/src/providers/ad/ad_subdomains.c
index 
2e5d9120e473e32a84610d607ccf329249b4ac9e..4799b518a1354e5b4ef8392b860effc9121ee121
 100644
--- a/src/providers/ad/ad_subdomains.c
+++ b/src/providers/ad/ad_subdomains.c
@@ -1118,7 +1118,7 @@ static void ad_subdom_online_cb(void *pvt)
 
 refresh_interval = ctx->be_ctx->domain->subdomain_refresh_interval;
 
-be_req = be_req_create(ctx, NULL, ctx->be_ctx,
+be_req = be_req_create(ctx, NULL, ctx->be_ctx, "AD subdomains",
ad_subdom_be_req_callback, NULL);
 if (be_req == NULL) {
 DEBUG(SSSDBG_CRIT_FAILURE, "be_req_create() failed.\n");
diff --git a/src/providers/data_provider_be.c b/src/providers/data_provider_be.c
index 
9bc54b6d59dafd22b9f70da7cca3b520ca941efc..b67efa87008d8e15fb9fb27c3d02996608b0aa62
 100644
--- a/src/providers/data_provider_be.c
+++ b/src/providers/data_provider_be.c
@@ -174,6 +174,9 @@ struct be_req {
  */
 int phase;
 
+/* Just for nicer debugging */
+const char *req_name;
+
 struct be_req *prev;
 struct be_req *next;
 };
@@ -186,8 +189,11 @@ static int be_req_destructor(struct be_req *be_req)
 }
 
 struct be_req *be_req_create(TALLOC_CTX *mem_ctx,
- struct be_client *becli, struct be_ctx *be_ctx,
- be_async_callback_t fn, void *pvt_fn_data)
+ struct be_client *becli,
+ struct be_ctx *be_ctx,
+ const char *name,
+ be_async_callback_t fn,
+ void *pvt_fn_data)
 {
 struct be_req *be_req;
 
@@ -199,6 +205,11 @@ struct be_req *be_req_create(TALLOC_CTX *mem_ctx,
 be_req->domain = be_ctx->domain;
 be_req->fn = fn;
 be_req->pvt = pvt_fn_data;
+be_req->req_name = talloc_strdup(be_req, name);
+if (be_req->req_name == NULL) {
+talloc_free(be_req);
+return NULL;
+}
 
 /* Add this request to active request list and make sure it

[SSSD]Re: [PATCH] IPA_PROVIDER: Explicit no handle of services

2015-12-04 Thread Petr Cech

On 12/03/2015 05:26 PM, Jakub Hrozek wrote:

On Tue, Dec 01, 2015 at 05:02:39PM +0100, Petr Cech wrote:

On 11/27/2015 10:35 AM, Jakub Hrozek wrote:

On Thu, Nov 19, 2015 at 05:57:56PM +0100, Petr Cech wrote:

On 11/12/2015 06:04 PM, Jakub Hrozek wrote:

On Thu, Nov 12, 2015 at 04:54:21PM +0100, Petr Cech wrote:

On 11/11/2015 02:42 PM, Jakub Hrozek wrote:

Hi, I think it's a good idea to only say we don't handle services for
IPA subdomains. But I also think it would be better to shortcut the
request sooner, in ipa_subdomain_account_send() to avoid even sending an
LDAP query.

Hi Jakub,

new patch is attached. During the testing... I found out, that I use wrong
set up. Subdomains are connected to FreeIPA with trusted AD.

What's wrong about it? I think we/should/  test this setup..


Hi Jakub,

I was confused about it. I haven't got such environment with IP trust. But
now I have one.





So... patch is here, but I would like set up my environment properly and
then I will inform you:-)

Regards

Petr


But I still do something wrong. I set up environment:

AD server: dom-78.abc...
FreeIPA server: mirfak.persei.dev
FreeIPA client: algol.persei.dev

I set up the trust between IPA and AD. And then I tried on ipa clinet:
# getent services ldap/mirfak.persei@persei.dev


This is not a different kind of service (a Kerberos service principal).
The ticket is about services resolvable via libc's NSS interface:
 
https://www.gnu.org/software/libc/manual/html_node/Name-Service-Switch.html#Name-Service-Switch

Hi Jakub,

thank you for documentation.



See man 5 nsswitch conf for some more info. The API used for resolving
services is described in man 3 getservent and similar. Normally services
are stored on the filesystem, similar to local users, see cat
/etc/services. But custom services can also be stored in LDAP or other
sources. And while we support looking up services, the IPA back end does
not and currently if you do:
 getent services -s sss dns


This command is usefull. I have used:
# getent services -s sss d...@persei.dev



The logs would show:
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [be_get_account_info] (0x0200): 
Got request for [0x1005][FAST BE_REQ_SERVICES][1][name=dns]
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [be_req_set_domain] (0x0400): 
Changing request domain from [ipa.test] to [ipa.test]
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [services_get_send] (0x1000): 
Preparing to search for services with filter 
[(&(cn=dns)(objectclass=ipService))]
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_id_op_connect_step] 
(0x4000): reusing cached connection
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_services_next_base] 
(0x0400): Searching for services with base [cn=accounts,dc=ipa,dc=test]
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_print_server] (0x2000): 
Searching 192.168.122.100
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_generic_ext_step] 
(0x0400): calling ldap_search_ext with 
[(&(cn=dns)(objectclass=ipService))][cn=accounts,dc=ipa,dc=test].
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_generic_ext_step] 
(0x1000): Requesting attrs: [objectClass]
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_generic_ext_step] 
(0x1000): Requesting attrs: [cn]
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_generic_ext_step] 
(0x1000): Requesting attrs: [ipServicePort]
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_generic_ext_step] 
(0x1000): Requesting attrs: [ipServiceProtocol]
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_generic_ext_step] 
(0x1000): Requesting attrs: [entryUSN]
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_generic_ext_step] 
(0x2000): ldap_search_ext called, msgid = 16
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_op_add] (0x2000): New 
operation 16 timeout 2
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_process_result] (0x2000): 
Trace: sh[0xeedc00], connected[1], ops[0xf1aca0], ldap[0xef0bb0]
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_process_message] 
(0x4000): Message type: [LDAP_RES_SEARCH_RESULT]
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_generic_op_finished] 
(0x0400): Search result: Success(0), no errmsg set
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_op_destructor] (0x2000): 
Operation 16 finished
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_get_services_process] 
(0x0400): Search for services, returned 0 results.
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sdap_id_op_done] (0x4000): 
releasing operation connection
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [ldb] (0x4000): start ldb 
transaction (nesting: 0)
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [sysdb_search_services] (0x2000): 
Search services with filter: 
(&(objectclass=service)(&(serviceProtocol=*)(|(nameAlias=dns)(name=dns
(Fri Nov 27 09:31:13 2015) [sssd[be[ipa.test]]] [ldb] (0x4000): Added timed event 
"

[SSSD]Re: [PATCH] Keep external members of IPA groups

2015-12-04 Thread Sumit Bose
On Fri, Dec 04, 2015 at 02:47:17PM +0100, Jakub Hrozek wrote:
> Hi,
> 
> it seems like https://fedorahosted.org/sssd/ticket/2492 was not fixed
> completely. Attached are two patches that do the trick for me -- they
> are not polished (at the very least, the first one needs a test..) but I
> would like to get another opinion if they at least aim in the right
> direction.
> 
> So my groups on IPA server are set like this:
> 
> [jhrozek@unidirect] sssd $ [(ext_mem)] ipa group-show topgr --all --raw
>   dn: cn=topgr,cn=groups,cn=accounts,dc=ipa,dc=test
>   cn: topgr
>   gidnumber: 24024
>   member: cn=ext_ipa_gr,cn=groups,cn=accounts,dc=ipa,dc=test
>   ipaUniqueID: 193b7c04-91e9-11e5-bc59-5254005f7b59
>   objectClass: ipaobject
>   objectClass: top
>   objectClass: ipausergroup
>   objectClass: posixgroup
>   objectClass: groupofnames
>   objectClass: nestedgroup
> [jhrozek@unidirect] sssd $ [(ext_mem)] ipa group-show ext_ipa_gr --all --raw
>   dn: cn=ext_ipa_gr,cn=groups,cn=accounts,dc=ipa,dc=test
>   cn: ext_ipa_gr
>   ipaexternalmember: S-1-5-21-1897531548-1940899517-361317264-500
>   ipaUniqueID: ad2bd978-91e8-11e5-9d52-5254005f7b59
>   memberof: cn=topgr,cn=groups,cn=accounts,dc=ipa,dc=test
>   objectClass: ipaobject
>   objectClass: top
>   objectClass: nestedgroup
>   objectClass: ipausergroup
>   objectClass: groupofnames
>   objectClass: ipaexternalgroup
> 
> The SID S-1-5-21-1897531548-1940899517-361317264-500 is an AD user
> (administrator)
> 
> Now when I do:
> sudo sss_cache -E
> $ id -G administra...@win.trust.test
> $ sudo sss_cache -G
> $ getent group 24024
> 
> Then the "topgr" group gets resolved and the code gets into sdap_save_grpmem:
> 908 /* This is a temporal solution until the IPA provider is able to
> 909  * resolve external group membership.
> 910  * https://fedorahosted.org/sssd/ticket/2522
> 911  */
> 912 if (opts->schema_type == SDAP_SCHEMA_IPA_V1) {
> 913 ret = sysdb_attrs_get_string(attrs, SYSDB_SID_STR, &group_sid);
> 914 if (ret != EOK) {
> 915 DEBUG(SSSDBG_TRACE_FUNC, "Failed to get group sid\n");
> 916 group_sid = NULL;
> 
> Here we set the group_sid variable to NULL, because topgr, being an IPA
> group, doesn't have a SID...

why does it not have a SID. I guess it has a GID since we got this far.
IPA with trust enabled should automatically add SIDs to new POSIX groups
and groups created before trust setup should get them from running the
sidgen task. It is important that the groups have a SID because
otherwise they cannot be added to the PAC causing different views of the
groups membership see by the PAC responder and the ID provider.

bye,
Sumit
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts

2015-12-04 Thread Michal Židek

On 12/04/2015 03:07 PM, Lukas Slebodnik wrote:

On (04/12/15 14:35), Michal Židek wrote:

On 12/04/2015 02:32 PM, Jakub Hrozek wrote:

On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote:

On 12/04/2015 12:29 PM, Lukas Slebodnik wrote:

On (04/12/15 12:11), Jakub Hrozek wrote:

On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:

On (03/12/15 20:22), Jakub Hrozek wrote:

On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:

On (02/12/15 17:10), Michal Židek wrote:

Hi!

I saw some integration tests failures recently,
and I think there is a race condition between the
enumeration refresh timeout and the sleeps
after some operations that wait for this timeout.
SSSD fails to populate changes from LDAP in time
and some asserts can fail because of this.

So far I saw 4 tests to fail like this, which
is already quite a lot.

The attached patch modifies the timeout values
and hopefully removes the issue.

Michal


>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001

From: =?UTF-8?q?Michal=20=C5=BDidek?= 
Date: Wed, 2 Dec 2015 16:44:48 +0100
Subject: [PATCH] ldap_test.py: Modify enum cache timeouts

There is a race condation between ldap
enumeration refresh timeout and the sleeps
that wait for the ldap changes to populate
to SSSD if the timeout and the sleeps have
the same value.
---
src/tests/intg/ldap_test.py | 30 +-
1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py
index 757ee20..8ec8dbe 100644
--- a/src/tests/intg/ldap_test.py
+++ b/src/tests/intg/ldap_test.py
@@ -33,7 +33,11 @@ import ldap_ent

>from util import *


LDAP_BASE_DN = "dc=example,dc=com"
-INTERACTIVE_TIMEOUT = 4
+INTERACTIVE_TIMEOUT = 2
+
+
+def wait_for_ldap_enum_refresh():
+time.sleep(INTERACTIVE_TIMEOUT + 4)

Why does it need to be INTERACTIVE_TIMEOUT + 4

Could it be INTERACTIVE_TIMEOUT + 3 or + 5



Regardless of the value we choose, can we move this patch forward? I see
the related failure quite often in SSSD.

Adding timeout without real explanation is not a solution.

The main problem is with empiric values.
If they are very high then test are slow.
And there still can be slow/fast machine where lower values caused troubles.

The ideal solution would be to get rid of enumeration
in ldap tests.


Enumeration is a codepath that is different from non-enumeration, so it
should be tested. Not as priority, not as the only ldap tests, but it's
a valid case, so it should be there.


If we want to test enumeration than it should be in separate
test.


Maybe, but we do have enumeration tests and we should fix them.


Adding sleep is not a fix. It's just a workaround
because all sleep timeout are just an empiric values.
and we should fix test and not adding workaround/hacks.

If we cannot fix the test and don't want te rewrite test without enumeration
then we should remove/revert problematic tests.

LS


I will send a new patch with an explanation (sort of),
but it still will be a guess. I am not sure what the
real safe value should be, only that the sleep's
after operation should be longer than the ldap
refresh and enum cache timeouts (and that the
current values do not cope well wit the CI load).


Would it be more acceptable then to define the ldap refresh and enum
cache timeouts as variables in the test and sleep for (enum_timeout +
cache_timeout + 1) ?

At least that would be more readable than a magic constant..


Will do. All will be derived from INTERACTIVE_TIMEOUT
so that we need to change just one value if needed in the
future.


Will it be reliable?

Will it work on slow(arm) machines?

I plan to run integration test in "%check" phase
of rpm build. And koji/fedora has rpm machines.


We can mark tests that may fail on slow machines
due to timeouts as "unsafe" and skip them in the
rpm build. The simplest way to do it would be to
use -k "not test_unsafe_" option in INTGCHECK_PYTEST_ARGS
and the unsafe tests would have test_unsafe_ prefix.
Would that work for you? It is something that we
can start using immediately.


I already have a POC patch.

LS

___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD]Re: [PATCHES] sudo provider improvements

2015-12-04 Thread Lukas Slebodnik
On (02/12/15 14:40), Pavel Březina wrote:
>On 12/02/2015 02:06 PM, Lukas Slebodnik wrote:
>>On (02/12/15 11:05), Pavel Březina wrote:
>>>On 12/01/2015 02:02 PM, Lukas Slebodnik wrote:
On (24/11/15 13:23), Pavel Březina wrote:
>Hi,
>I'm sending some sudo provider patches. I wanted to fix/improve things in 
>the
>ldap sudo provider prior my work on ipa provider so I get familiar with it
>again and avoid making the same mistakes.
>
>It fixes tevent style, shuffles the code around a little bit, convert
>periodic task to use be_ptask module, renew hostinfo when needed, fix
>sdap_id_op logic, recude code duplication, remove dead code, simplify error
>handling, etc.
>
>Ticket fixed:
>https://fedorahosted.org/sssd/ticket/1943
>https://fedorahosted.org/sssd/ticket/2672
>
>I let Dan run downstream tests on those patches. We had to fix one test 
>that
>was prone to a race condition which my patches revealed, but everything is
>green now.
>

I got following valgrind errors with patches

==17279== 1 errors in context 3 of 7:
==17279== Conditional jump or move depends on uninitialised value(s)
==17279==at 0x8BC76FE: _talloc_steal_loc (talloc.c:1162)
==17279==by 0x13DA3775: sdap_sudo_set_usn (sdap_async_sudo.c:318)
==17279==by 0x13DA3775: sdap_sudo_refresh_done (sdap_async_sudo.c:750)
==17279==by 0x89B4201: _tevent_req_error (tevent_req.c:167)
==17279==by 0x13DA3321: sdap_sudo_load_sudoers_done 
(sdap_async_sudo.c:170)
==17279==by 0x89B4201: _tevent_req_error (tevent_req.c:167)
==17279==by 0x89B4201: _tevent_req_error (tevent_req.c:167)
==17279==by 0x13D71ADB: generic_ext_search_handler.isra.3 
(sdap_async.c:1651)
==17279==by 0x89B3923: tevent_common_loop_immediate 
(tevent_immediate.c:135)
==17279==by 0x89B822D: epoll_event_loop_once (tevent_epoll.c:907)
==17279==by 0x89B6936: std_event_loop_once (tevent_standard.c:114)
==17279==by 0x89B30FC: _tevent_loop_once (tevent.c:533)
==17279==by 0x89B329A: tevent_common_loop_wait (tevent.c:637)
==17279==
==17279==
==17279== 1 errors in context 4 of 7:
==17279== Conditional jump or move depends on uninitialised value(s)
==17279==at 0x13DA3738: sdap_sudo_set_usn (sdap_async_sudo.c:307)
==17279==by 0x13DA3738: sdap_sudo_refresh_done (sdap_async_sudo.c:750)
==17279==by 0x89B4201: _tevent_req_error (tevent_req.c:167)
==17279==by 0x13DA3321: sdap_sudo_load_sudoers_done 
(sdap_async_sudo.c:170)
==17279==by 0x89B4201: _tevent_req_error (tevent_req.c:167)
==17279==by 0x89B4201: _tevent_req_error (tevent_req.c:167)
==17279==by 0x13D71ADB: generic_ext_search_handler.isra.3 
(sdap_async.c:1651)
==17279==by 0x89B3923: tevent_common_loop_immediate 
(tevent_immediate.c:135)
==17279==by 0x89B822D: epoll_event_loop_once (tevent_epoll.c:907)
==17279==by 0x89B6936: std_event_loop_once (tevent_standard.c:114)
==17279==by 0x89B30FC: _tevent_loop_once (tevent.c:533)
==17279==by 0x89B329A: tevent_common_loop_wait (tevent.c:637)
==17279==by 0x89B68D6: std_event_loop_wait (tevent_standard.c:140)
>>>
>>>I can't see a codepath where usn could be uninitialized, do you?
>>I didn't try but static analysers helped me,
>>at least I hope it will help you :-)
>>Maybe, once you will learn how to use them :-) :-) :-)
>>
>>
>>Error: UNINIT (CWE-457): [#def1]
>>sssd-1.13.90/src/providers/ldap/sdap_async_sudo.c:690: var_decl: Declaring 
>>variable "usn" without initializer.
>>sssd-1.13.90/src/providers/ldap/sdap_async_sudo.c:750: uninit_use_in_call: 
>>Using uninitialized value "usn" when calling "sdap_sudo_set_usn".
>>#  748|
>>#  749|   /* remember new usn */
>>#  750|-> sdap_sudo_set_usn(state->srv_opts, usn);
>>#  751|
>>#  752|   ret = EOK;
>>
>>Error: CLANG_WARNING: [#def2]
>>sssd-1.13.90/src/providers/ldap/sdap_async_sudo.c:750:5: warning: Function 
>>call argument is an uninitialized value
>>#sdap_sudo_set_usn(state->srv_opts, usn);
>>#^  ~~~
>>sssd-1.13.90/src/providers/ldap/sdap_async_sudo.c:690:5: note: 'usn' declared 
>>without an initial value
>>#char *usn;
>>#^
>>sssd-1.13.90/src/providers/ldap/sdap_async_sudo.c:704:9: note: Assuming 
>>'dp_error' is not equal to 0
>>#if (dp_error == DP_ERR_OK && ret != EOK) {
>>#^
>>sssd-1.13.90/src/providers/ldap/sdap_async_sudo.c:704:31: note: Left side of 
>>'&&' is false
>>#if (dp_error == DP_ERR_OK && ret != EOK) {
>>#  ^
>>sssd-1.13.90/src/providers/ldap/sdap_async_sudo.c:717:9: note: Assuming 'ret' 
>>is equal to 0
>>#if (ret != EOK) {
>>#^~
>>sssd-1.13.90/src/providers/ldap/sdap_async_sudo.c:717:5: note: Taking false 
>>branch
>>#if (ret != EOK) {
>>#^
>>sssd

[SSSD]Re: [PATCH] make globals in *_opts.h extern

2015-12-04 Thread Jakub Hrozek
On Thu, Dec 03, 2015 at 12:45:55PM +0100, Pavel Březina wrote:
> On 12/03/2015 11:48 AM, Jakub Hrozek wrote:
> >On Wed, Dec 02, 2015 at 11:58:55AM +0100, Pavel Březina wrote:
> >>This solves situation where you want to use those globals on other place
> >>than in *_common.c.
> >>
> >>I also created https://fedorahosted.org/sssd/ticket/2890 so we can avoid
> >>order-dependency on header files such as sysdb_services.h which I had to fix
> >>for AD patch.
> >
> >Looks like tests failed:
> > 
> > http://sssd-ci.duckdns.org/logs/job/34/17/fedora20/ci-build-debug/ci-make-tests.log
> 
> I forgot to run tests... new patches are attached.
> 

Thanks, the patches now build. I have two questions:
1) Since we're moving code around anymore, would it make sense to
also move the enums with option indexes to opts.h? I find it a bit
odd they are in common.h
2) Does it make sense to remove the remaining headers (data_provider
and common) from opts.h and leave only options in?
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD]Re: [PATCH] MAN: Clarify that subdomains always use service discovery

2015-12-04 Thread Jakub Hrozek
On Mon, Nov 30, 2015 at 05:41:35PM +0100, Jakub Hrozek wrote:
> On Mon, Nov 30, 2015 at 05:01:53AM -0500, Dan Lavu wrote:
> > Lukas, 
> > 
> > Fixed some grammatical errors and wording. Also noticed the line about 
> > supported AD versions, please correct me if I'm wrong, but we only 
> > officially support 2008R2 and 2012R2 so I modified that line stating that. 
> > 
> > Dan 
> > 
> > - Original Message -
> > 
> > From: "Pavel Březina"  
> > To: sssd-devel@lists.fedorahosted.org 
> > Sent: Monday, November 30, 2015 4:36:10 AM 
> > Subject: [SSSD]Re: [PATCH] MAN: Clarify that subdomains always use service 
> > discovery 
> > 
> > On 11/26/2015 05:10 PM, Jakub Hrozek wrote: 
> > > Hi. 
> > > 
> > > attached is a simple manpage patch for #2881. 
> > 
> > Ack. 
> > ___ 
> > sssd-devel mailing list 
> > sssd-devel@lists.fedorahosted.org 
> > https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
> >  
> > 
> 
> Hi,
> 
> > From 3cb1cd429c59e380a264d13d4299f76e10238799 Mon Sep 17 00:00:00 2001
> > From: Dan Lavu 
> > Date: Mon, 30 Nov 2015 04:51:00 -0500
> > Subject: [PATCH] Clarify that subdomains always use service discovery
> > 
> > ---
> >  src/man/sssd-ad.5.xml | 16 
> >  1 file changed, 12 insertions(+), 4 deletions(-)
> > 
> > diff --git a/src/man/sssd-ad.5.xml b/src/man/sssd-ad.5.xml
> > index 
> > 047cf046d425ed32a4983691c18dc9f8b94a0160..d3f360e7db5358d159954cdddeaf3ff38a79ce3e
> >  100644
> > --- a/src/man/sssd-ad.5.xml
> > +++ b/src/man/sssd-ad.5.xml
> > @@ -39,12 +39,13 @@
> >  
> >  
> >  The AD provider supports connecting to Active Directory 2008 R2
> > -or later. Earlier versions may work, but are unsupported.
> > +and 2012 R2. Other versions may work, but are unsupported.
> 
> I don't think we want to do this change. I know you're coming from your
> downstream perspective where I would agree, but in upstream, we should
> support also 2016 when it comes out and 2012.0 (as opposed to R2).
> 
> >  
> >  
> > -The AD provider is able to provide identity information and
> > -authentication for entities from trusted domains as well. 
> > Currently
> > -only trusted domains in the same forest are recognized.
> > +The AD provider can be used to get user information
> > +and authenticate users from trusted domains. Currently
> > +only trusted domains in the same forest are recognized. In
> > +addition servers from trusted domains are always 
> > auto-discovered.
> >  
> >  
> >  The AD provider accepts the same options used by the
> > @@ -121,10 +122,17 @@ ldap_id_mapping = False
> >  connect in order of preference. For more
> >  information on failover and server redundancy, 
> > see
> >  the FAILOVER section.
> > +
> > +
> >  This is optional if autodiscovery is enabled.
> >  For more information on service discovery, 
> > refer
> >  to the SERVICE DISCOVERY 
> > section.
> >  
> > +
> > +Note: Trusted domains will always auto-discover
> > +servers even if the primary server is 
> > explicitly
> > +defined in the ad_server option.
> > +
> >  
> >  
> >  
> 
> I agree with the rest of the changes and if you agree with dropping the
> first hunk, I'll push the patch.

Hi Dan,

are you OK with droppig the first paragraph about Windows server
supported versions?
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD]Re: [PATCH] NSS: Fix memory leak netgroup

2015-12-04 Thread Jakub Hrozek
On Thu, Dec 03, 2015 at 03:29:22PM +0100, Pavel Reichl wrote:
> 
> On 12/03/2015 02:06 PM, Jakub Hrozek wrote:
> >On Mon, Nov 30, 2015 at 06:45:00PM +0100, Pavel Reichl wrote:
> >>
> >>
> >>On 11/30/2015 06:02 PM, Jakub Hrozek wrote:
> >
> >Wouldn't it then be better to see if another same object is already in
> >the hashtable and free it before replacing?
> 
> I agree it would be best. I tried that before and failed because I could 
> not decipher out the relation of talloc contexts.
> 
> I tried that again. Seems that leaks are gone. Segfaults were not 
> happening during my testing.
> >>>
> Code got even messier :-(
> >>>
> >>>I think the code would look nice if it was placed in the else branch of
> >>>create_negcache_netgr() :-)
> >>
> >>Done, thanks for hint.
> >>
> >>>
> >>>I also don't think we need the tmp_ctx change..
> >>
> >>I dare to disagree here. With prev. versions of patch I was experiencing
> >>segfaults. IIRC step_context is being allocated on context of the netgroup
> >>that is freed. I also think that it's not a good idea to allocate local data
> >>on context that does not hold any reference to that data. But it might be
> >>matter of personal taste.
> >
> >What kind of segfaults, what was the backtrace? I think it might be good to
> >add talloc_zfree() instead of talloc_free() or explicitly NULL the pointer,
> >that should solve the issue without adding a new context...
> >
> >Since it's per-domain, I wonder if step_ctx->dctx might be better
> >candidate than step_ctx either way.
> >
> 
> Hello Jakub, I amended the patch as you proposed. You can now experience the 
> segfault yourself - just query a non-existing netgroup.
> I attached the backtrace as well.
> 
> I can investiage and use talloc reporting if you wish. But still using 
> tmp_ctx seems as way of the least effort...

I think you are right, the step_ctx hierarchy is tricky, so the
temporary context looks like the easiest solution. Please do one more
change in the patch:

> From 0160ee92c4d9a9542abfd3e918686526a52e367d Mon Sep 17 00:00:00 2001
> From: Pavel Reichl 
> Date: Fri, 27 Nov 2015 07:53:00 -0500
> Subject: [PATCH] NSS: Fix memory leak netgroup

[...]

> @@ -461,6 +469,11 @@ static errno_t create_negcache_netgr(struct 
> setent_step_ctx *step_ctx)
>  }
>  
>  done:
> +/* Free netgroup after step_ctx is not needed. */
> +if (netgr_to_be_freed) {
> +talloc_zfree(netgr_to_be_freed);
> +}

Since talloc_free(NULL) is a noop, we should drop the if completely.
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts

2015-12-04 Thread Lukas Slebodnik
On (04/12/15 14:35), Michal Židek wrote:
>On 12/04/2015 02:32 PM, Jakub Hrozek wrote:
>>On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote:
>>>On 12/04/2015 12:29 PM, Lukas Slebodnik wrote:
On (04/12/15 12:11), Jakub Hrozek wrote:
>On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:
>>On (03/12/15 20:22), Jakub Hrozek wrote:
>>>On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:
On (02/12/15 17:10), Michal Židek wrote:
>Hi!
>
>I saw some integration tests failures recently,
>and I think there is a race condition between the
>enumeration refresh timeout and the sleeps
>after some operations that wait for this timeout.
>SSSD fails to populate changes from LDAP in time
>and some asserts can fail because of this.
>
>So far I saw 4 tests to fail like this, which
>is already quite a lot.
>
>The attached patch modifies the timeout values
>and hopefully removes the issue.
>
>Michal

>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001
>From: =?UTF-8?q?Michal=20=C5=BDidek?= 
>Date: Wed, 2 Dec 2015 16:44:48 +0100
>Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
>
>There is a race condation between ldap
>enumeration refresh timeout and the sleeps
>that wait for the ldap changes to populate
>to SSSD if the timeout and the sleeps have
>the same value.
>---
>src/tests/intg/ldap_test.py | 30 +-
>1 file changed, 17 insertions(+), 13 deletions(-)
>
>diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py
>index 757ee20..8ec8dbe 100644
>--- a/src/tests/intg/ldap_test.py
>+++ b/src/tests/intg/ldap_test.py
>@@ -33,7 +33,11 @@ import ldap_ent
>from util import *
>
>LDAP_BASE_DN = "dc=example,dc=com"
>-INTERACTIVE_TIMEOUT = 4
>+INTERACTIVE_TIMEOUT = 2
>+
>+
>+def wait_for_ldap_enum_refresh():
>+time.sleep(INTERACTIVE_TIMEOUT + 4)
Why does it need to be INTERACTIVE_TIMEOUT + 4

Could it be INTERACTIVE_TIMEOUT + 3 or + 5

>>>
>>>Regardless of the value we choose, can we move this patch forward? I see
>>>the related failure quite often in SSSD.
>>Adding timeout without real explanation is not a solution.
>>
>>The main problem is with empiric values.
>>If they are very high then test are slow.
>>And there still can be slow/fast machine where lower values caused 
>>troubles.
>>
>>The ideal solution would be to get rid of enumeration
>>in ldap tests.
>
>Enumeration is a codepath that is different from non-enumeration, so it
>should be tested. Not as priority, not as the only ldap tests, but it's
>a valid case, so it should be there.
>
>>If we want to test enumeration than it should be in separate
>>test.
>
>Maybe, but we do have enumeration tests and we should fix them.
>
Adding sleep is not a fix. It's just a workaround
because all sleep timeout are just an empiric values.
and we should fix test and not adding workaround/hacks.

If we cannot fix the test and don't want te rewrite test without enumeration
then we should remove/revert problematic tests.

LS
>>>
>>>I will send a new patch with an explanation (sort of),
>>>but it still will be a guess. I am not sure what the
>>>real safe value should be, only that the sleep's
>>>after operation should be longer than the ldap
>>>refresh and enum cache timeouts (and that the
>>>current values do not cope well wit the CI load).
>>
>>Would it be more acceptable then to define the ldap refresh and enum
>>cache timeouts as variables in the test and sleep for (enum_timeout +
>>cache_timeout + 1) ?
>>
>>At least that would be more readable than a magic constant..
>
>Will do. All will be derived from INTERACTIVE_TIMEOUT
>so that we need to change just one value if needed in the
>future.
>
Will it be reliable?

Will it work on slow(arm) machines?

I plan to run integration test in "%check" phase
of rpm build. And koji/fedora has rpm machines.

I already have a POC patch.

LS
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts

2015-12-04 Thread Nikolai Kondrashov

On 12/04/2015 12:42 PM, Lukas Slebodnik wrote:

On (03/12/15 20:22), Jakub Hrozek wrote:

On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:

On (02/12/15 17:10), Michal Židek wrote:

Hi!

I saw some integration tests failures recently,
and I think there is a race condition between the
enumeration refresh timeout and the sleeps
after some operations that wait for this timeout.
SSSD fails to populate changes from LDAP in time
and some asserts can fail because of this.

So far I saw 4 tests to fail like this, which
is already quite a lot.

The attached patch modifies the timeout values
and hopefully removes the issue.

Michal


>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001

From: =?UTF-8?q?Michal=20=C5=BDidek?= 
Date: Wed, 2 Dec 2015 16:44:48 +0100
Subject: [PATCH] ldap_test.py: Modify enum cache timeouts

There is a race condation between ldap
enumeration refresh timeout and the sleeps
that wait for the ldap changes to populate
to SSSD if the timeout and the sleeps have
the same value.
---
src/tests/intg/ldap_test.py | 30 +-
1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py
index 757ee20..8ec8dbe 100644
--- a/src/tests/intg/ldap_test.py
+++ b/src/tests/intg/ldap_test.py
@@ -33,7 +33,11 @@ import ldap_ent
from util import *

LDAP_BASE_DN = "dc=example,dc=com"
-INTERACTIVE_TIMEOUT = 4
+INTERACTIVE_TIMEOUT = 2
+
+
+def wait_for_ldap_enum_refresh():
+time.sleep(INTERACTIVE_TIMEOUT + 4)

Why does it need to be INTERACTIVE_TIMEOUT + 4

Could it be INTERACTIVE_TIMEOUT + 3 or + 5



Regardless of the value we choose, can we move this patch forward? I see
the related failure quite often in SSSD.

Adding timeout without real explanation is not a solution.

The main problem is with empiric values.
If they are very high then test are slow.
And there still can be slow/fast machine where lower values caused troubles.

The ideal solution would be to get rid of enumeration
in ldap tests. If we want to test enumeration than it should be in separate
test. I'm not sure we would be able to get rid of "sleep()"
in enumeration test but all such values should pre properly documented why it
is big enough ...


Yeah, these timeouts are messy.

The problem is not the empiric timeout values themselves but rather the
guesswork of when certain cache state changes are supposed to happen in
relation to them. If we can reason about event deadlines then we can use them.

If not, and the code is too complicated, can we perhaps introduce some
explicit synchronization with sssd caching mechanism, where sssd behavior will
drive the tests? E.g. the tests would actually wait for sssd to do something
with the cache and after sssd reports it is done, the tests will verify the
time and the result?

We would still get to test that sssd does something we need in the expected
timeframe, but we can make the tests faster - i.e. as fast as sssd can perform
and be configured to.

Perhaps add something to the sss_cache tool?

Nick
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD][PATCH] Keep external members of IPA groups

2015-12-04 Thread Jakub Hrozek
Hi,

it seems like https://fedorahosted.org/sssd/ticket/2492 was not fixed
completely. Attached are two patches that do the trick for me -- they
are not polished (at the very least, the first one needs a test..) but I
would like to get another opinion if they at least aim in the right
direction.

So my groups on IPA server are set like this:

[jhrozek@unidirect] sssd $ [(ext_mem)] ipa group-show topgr --all --raw
  dn: cn=topgr,cn=groups,cn=accounts,dc=ipa,dc=test
  cn: topgr
  gidnumber: 24024
  member: cn=ext_ipa_gr,cn=groups,cn=accounts,dc=ipa,dc=test
  ipaUniqueID: 193b7c04-91e9-11e5-bc59-5254005f7b59
  objectClass: ipaobject
  objectClass: top
  objectClass: ipausergroup
  objectClass: posixgroup
  objectClass: groupofnames
  objectClass: nestedgroup
[jhrozek@unidirect] sssd $ [(ext_mem)] ipa group-show ext_ipa_gr --all --raw
  dn: cn=ext_ipa_gr,cn=groups,cn=accounts,dc=ipa,dc=test
  cn: ext_ipa_gr
  ipaexternalmember: S-1-5-21-1897531548-1940899517-361317264-500
  ipaUniqueID: ad2bd978-91e8-11e5-9d52-5254005f7b59
  memberof: cn=topgr,cn=groups,cn=accounts,dc=ipa,dc=test
  objectClass: ipaobject
  objectClass: top
  objectClass: nestedgroup
  objectClass: ipausergroup
  objectClass: groupofnames
  objectClass: ipaexternalgroup

The SID S-1-5-21-1897531548-1940899517-361317264-500 is an AD user
(administrator)

Now when I do:
sudo sss_cache -E
$ id -G administra...@win.trust.test
$ sudo sss_cache -G
$ getent group 24024

Then the "topgr" group gets resolved and the code gets into sdap_save_grpmem:
908 /* This is a temporal solution until the IPA provider is able to
909  * resolve external group membership.
910  * https://fedorahosted.org/sssd/ticket/2522
911  */
912 if (opts->schema_type == SDAP_SCHEMA_IPA_V1) {
913 ret = sysdb_attrs_get_string(attrs, SYSDB_SID_STR, &group_sid);
914 if (ret != EOK) {
915 DEBUG(SSSDBG_TRACE_FUNC, "Failed to get group sid\n");
916 group_sid = NULL;

Here we set the group_sid variable to NULL, because topgr, being an IPA
group, doesn't have a SID...

917 }
918
919 if (group_sid != NULL) {
920 ret = retain_extern_members(memctx, dom, group_name, group_sid,
921 &userdns, &nuserdns);

Which means we don't call retain_extern_members at all.

922 if (ret != EOK) {
923 DEBUG(SSSDBG_TRACE_INTERNAL,
924   "retain_extern_members failed: %d:[%s].\n",
925   ret, sss_strerror(ret));
926 }
927 }
928 }

But since the whole block of code was added in the same commit, I guess
it must have some purpose..so is going on without a SID the right thing
to do? What was the use-case of the original code?
>From 582b81d08f0ba70612aa173dae5f31263295826f Mon Sep 17 00:00:00 2001
From: Jakub Hrozek 
Date: Fri, 4 Dec 2015 13:51:28 +0100
Subject: [PATCH 1/2] SYSDB: Treat empty elements as not found in
 sysdb_attrs_get_string()

---
 src/db/sysdb.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/db/sysdb.c b/src/db/sysdb.c
index 
a71364d7c4b600eafd10fafa6641eac7b2292764..97a7c52bb49dc072d371ffb657540d2ed2f3d0b7
 100644
--- a/src/db/sysdb.c
+++ b/src/db/sysdb.c
@@ -371,7 +371,13 @@ int sysdb_attrs_get_string(struct sysdb_attrs *attrs, 
const char *name,
 return ret;
 }
 
-if (el->num_values != 1) {
+/* There is no guarantee that the element wasn't added before
+ * with sysdb_attrs_get_el() as empty, we should treat empty
+ * elements as not found
+ */
+if (el->num_values == 0) {
+return ENOENT;
+} else if (el->num_values != 1) {
 return ERANGE;
 }
 
-- 
2.4.3

>From c1e19528a071d4a354af1d8667c2b00edf9a5b7b Mon Sep 17 00:00:00 2001
From: Jakub Hrozek 
Date: Fri, 4 Dec 2015 14:11:38 +0100
Subject: [PATCH 2/2] LDAP: Link external group members of IPA groups, too

---
 src/providers/ldap/sdap_async_groups.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/src/providers/ldap/sdap_async_groups.c 
b/src/providers/ldap/sdap_async_groups.c
index 
c15a6a2bd724498aaf59b7ed3172ec4ed09abb9d..b3f7ae51edbc2b1bbd94a49722459c3938fa6d98
 100644
--- a/src/providers/ldap/sdap_async_groups.c
+++ b/src/providers/ldap/sdap_async_groups.c
@@ -781,6 +781,12 @@ are_sids_from_same_dom(const char *sid1, const char *sid2, 
bool *_result)
 char *rid1, *rid2;
 bool result;
 
+if ((sid1 == NULL || sid2 == NULL) && sid1 != sid2) {
+/* IPA and AD group probably */
+*_result = false;
+return EOK;
+}
+
 rid1 = strrchr(sid1, '-');
 if (rid1 == NULL) {
 return EINVAL;
@@ -911,19 +917,19 @@ static int sdap_save_grpmem(TALLOC_CTX *memctx,
  */
 if (opts->schema_type == SDAP_SCHEMA_IPA_V1) {
 ret = sysdb_attrs_get_string(attrs, SYSDB_SID_STR, &group_sid);
-if (ret != EOK) {
-DEBUG(SSSDBG_TRACE_

[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts

2015-12-04 Thread Michal Židek

On 12/04/2015 02:32 PM, Jakub Hrozek wrote:

On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote:

On 12/04/2015 12:29 PM, Lukas Slebodnik wrote:

On (04/12/15 12:11), Jakub Hrozek wrote:

On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:

On (03/12/15 20:22), Jakub Hrozek wrote:

On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:

On (02/12/15 17:10), Michal Židek wrote:

Hi!

I saw some integration tests failures recently,
and I think there is a race condition between the
enumeration refresh timeout and the sleeps
after some operations that wait for this timeout.
SSSD fails to populate changes from LDAP in time
and some asserts can fail because of this.

So far I saw 4 tests to fail like this, which
is already quite a lot.

The attached patch modifies the timeout values
and hopefully removes the issue.

Michal


>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001

From: =?UTF-8?q?Michal=20=C5=BDidek?= 
Date: Wed, 2 Dec 2015 16:44:48 +0100
Subject: [PATCH] ldap_test.py: Modify enum cache timeouts

There is a race condation between ldap
enumeration refresh timeout and the sleeps
that wait for the ldap changes to populate
to SSSD if the timeout and the sleeps have
the same value.
---
src/tests/intg/ldap_test.py | 30 +-
1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py
index 757ee20..8ec8dbe 100644
--- a/src/tests/intg/ldap_test.py
+++ b/src/tests/intg/ldap_test.py
@@ -33,7 +33,11 @@ import ldap_ent

>from util import *


LDAP_BASE_DN = "dc=example,dc=com"
-INTERACTIVE_TIMEOUT = 4
+INTERACTIVE_TIMEOUT = 2
+
+
+def wait_for_ldap_enum_refresh():
+time.sleep(INTERACTIVE_TIMEOUT + 4)

Why does it need to be INTERACTIVE_TIMEOUT + 4

Could it be INTERACTIVE_TIMEOUT + 3 or + 5



Regardless of the value we choose, can we move this patch forward? I see
the related failure quite often in SSSD.

Adding timeout without real explanation is not a solution.

The main problem is with empiric values.
If they are very high then test are slow.
And there still can be slow/fast machine where lower values caused troubles.

The ideal solution would be to get rid of enumeration
in ldap tests.


Enumeration is a codepath that is different from non-enumeration, so it
should be tested. Not as priority, not as the only ldap tests, but it's
a valid case, so it should be there.


If we want to test enumeration than it should be in separate
test.


Maybe, but we do have enumeration tests and we should fix them.


Adding sleep is not a fix. It's just a workaround
because all sleep timeout are just an empiric values.
and we should fix test and not adding workaround/hacks.

If we cannot fix the test and don't want te rewrite test without enumeration
then we should remove/revert problematic tests.

LS


I will send a new patch with an explanation (sort of),
but it still will be a guess. I am not sure what the
real safe value should be, only that the sleep's
after operation should be longer than the ldap
refresh and enum cache timeouts (and that the
current values do not cope well wit the CI load).


Would it be more acceptable then to define the ldap refresh and enum
cache timeouts as variables in the test and sleep for (enum_timeout +
cache_timeout + 1) ?

At least that would be more readable than a magic constant..


Will do. All will be derived from INTERACTIVE_TIMEOUT
so that we need to change just one value if needed in the
future.





Lukas is right that I pulled the values out of
place where no knowledge resides so let us make
a compromise. I will push this patch to the
CI a lot of times (let's say 40-50) over the
weekend and see if it fails.

I am also not sure if lowering the INTERACTIVE_TIMEOUT
was a good idea, I did it in order to make the
execution a little faster and it is a timeout that
needs to be shorter than the sleeps that wait
for the ldap change to propagate to sysdb (and I did not
want the sleeps to be too long). With
the weekend "stress testing" we can hopefully avoid
additional adjustments with new patches.

Michal
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org

___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts

2015-12-04 Thread Jakub Hrozek
On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote:
> On 12/04/2015 12:29 PM, Lukas Slebodnik wrote:
> >On (04/12/15 12:11), Jakub Hrozek wrote:
> >>On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:
> >>>On (03/12/15 20:22), Jakub Hrozek wrote:
> On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:
> >On (02/12/15 17:10), Michal Židek wrote:
> >>Hi!
> >>
> >>I saw some integration tests failures recently,
> >>and I think there is a race condition between the
> >>enumeration refresh timeout and the sleeps
> >>after some operations that wait for this timeout.
> >>SSSD fails to populate changes from LDAP in time
> >>and some asserts can fail because of this.
> >>
> >>So far I saw 4 tests to fail like this, which
> >>is already quite a lot.
> >>
> >>The attached patch modifies the timeout values
> >>and hopefully removes the issue.
> >>
> >>Michal
> >
> >>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001
> >>From: =?UTF-8?q?Michal=20=C5=BDidek?= 
> >>Date: Wed, 2 Dec 2015 16:44:48 +0100
> >>Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
> >>
> >>There is a race condation between ldap
> >>enumeration refresh timeout and the sleeps
> >>that wait for the ldap changes to populate
> >>to SSSD if the timeout and the sleeps have
> >>the same value.
> >>---
> >>src/tests/intg/ldap_test.py | 30 +-
> >>1 file changed, 17 insertions(+), 13 deletions(-)
> >>
> >>diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py
> >>index 757ee20..8ec8dbe 100644
> >>--- a/src/tests/intg/ldap_test.py
> >>+++ b/src/tests/intg/ldap_test.py
> >>@@ -33,7 +33,11 @@ import ldap_ent
> >>from util import *
> >>
> >>LDAP_BASE_DN = "dc=example,dc=com"
> >>-INTERACTIVE_TIMEOUT = 4
> >>+INTERACTIVE_TIMEOUT = 2
> >>+
> >>+
> >>+def wait_for_ldap_enum_refresh():
> >>+time.sleep(INTERACTIVE_TIMEOUT + 4)
> >Why does it need to be INTERACTIVE_TIMEOUT + 4
> >
> >Could it be INTERACTIVE_TIMEOUT + 3 or + 5
> >
> 
> Regardless of the value we choose, can we move this patch forward? I see
> the related failure quite often in SSSD.
> >>>Adding timeout without real explanation is not a solution.
> >>>
> >>>The main problem is with empiric values.
> >>>If they are very high then test are slow.
> >>>And there still can be slow/fast machine where lower values caused 
> >>>troubles.
> >>>
> >>>The ideal solution would be to get rid of enumeration
> >>>in ldap tests.
> >>
> >>Enumeration is a codepath that is different from non-enumeration, so it
> >>should be tested. Not as priority, not as the only ldap tests, but it's
> >>a valid case, so it should be there.
> >>
> >>>If we want to test enumeration than it should be in separate
> >>>test.
> >>
> >>Maybe, but we do have enumeration tests and we should fix them.
> >>
> >Adding sleep is not a fix. It's just a workaround
> >because all sleep timeout are just an empiric values.
> >and we should fix test and not adding workaround/hacks.
> >
> >If we cannot fix the test and don't want te rewrite test without enumeration
> >then we should remove/revert problematic tests.
> >
> >LS
> 
> I will send a new patch with an explanation (sort of),
> but it still will be a guess. I am not sure what the
> real safe value should be, only that the sleep's
> after operation should be longer than the ldap
> refresh and enum cache timeouts (and that the
> current values do not cope well wit the CI load).

Would it be more acceptable then to define the ldap refresh and enum
cache timeouts as variables in the test and sleep for (enum_timeout +
cache_timeout + 1) ?

At least that would be more readable than a magic constant..

> 
> Lukas is right that I pulled the values out of
> place where no knowledge resides so let us make
> a compromise. I will push this patch to the
> CI a lot of times (let's say 40-50) over the
> weekend and see if it fails.
> 
> I am also not sure if lowering the INTERACTIVE_TIMEOUT
> was a good idea, I did it in order to make the
> execution a little faster and it is a timeout that
> needs to be shorter than the sleeps that wait
> for the ldap change to propagate to sysdb (and I did not
> want the sleeps to be too long). With
> the weekend "stress testing" we can hopefully avoid
> additional adjustments with new patches.
> 
> Michal
> ___
> sssd-devel mailing list
> sssd-devel@lists.fedorahosted.org
> https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts

2015-12-04 Thread Michal Židek

On 12/04/2015 12:29 PM, Lukas Slebodnik wrote:

On (04/12/15 12:11), Jakub Hrozek wrote:

On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:

On (03/12/15 20:22), Jakub Hrozek wrote:

On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:

On (02/12/15 17:10), Michal Židek wrote:

Hi!

I saw some integration tests failures recently,
and I think there is a race condition between the
enumeration refresh timeout and the sleeps
after some operations that wait for this timeout.
SSSD fails to populate changes from LDAP in time
and some asserts can fail because of this.

So far I saw 4 tests to fail like this, which
is already quite a lot.

The attached patch modifies the timeout values
and hopefully removes the issue.

Michal


>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001

From: =?UTF-8?q?Michal=20=C5=BDidek?= 
Date: Wed, 2 Dec 2015 16:44:48 +0100
Subject: [PATCH] ldap_test.py: Modify enum cache timeouts

There is a race condation between ldap
enumeration refresh timeout and the sleeps
that wait for the ldap changes to populate
to SSSD if the timeout and the sleeps have
the same value.
---
src/tests/intg/ldap_test.py | 30 +-
1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py
index 757ee20..8ec8dbe 100644
--- a/src/tests/intg/ldap_test.py
+++ b/src/tests/intg/ldap_test.py
@@ -33,7 +33,11 @@ import ldap_ent
from util import *

LDAP_BASE_DN = "dc=example,dc=com"
-INTERACTIVE_TIMEOUT = 4
+INTERACTIVE_TIMEOUT = 2
+
+
+def wait_for_ldap_enum_refresh():
+time.sleep(INTERACTIVE_TIMEOUT + 4)

Why does it need to be INTERACTIVE_TIMEOUT + 4

Could it be INTERACTIVE_TIMEOUT + 3 or + 5



Regardless of the value we choose, can we move this patch forward? I see
the related failure quite often in SSSD.

Adding timeout without real explanation is not a solution.

The main problem is with empiric values.
If they are very high then test are slow.
And there still can be slow/fast machine where lower values caused troubles.

The ideal solution would be to get rid of enumeration
in ldap tests.


Enumeration is a codepath that is different from non-enumeration, so it
should be tested. Not as priority, not as the only ldap tests, but it's
a valid case, so it should be there.


If we want to test enumeration than it should be in separate
test.


Maybe, but we do have enumeration tests and we should fix them.


Adding sleep is not a fix. It's just a workaround
because all sleep timeout are just an empiric values.
and we should fix test and not adding workaround/hacks.

If we cannot fix the test and don't want te rewrite test without enumeration
then we should remove/revert problematic tests.

LS


I will send a new patch with an explanation (sort of),
but it still will be a guess. I am not sure what the
real safe value should be, only that the sleep's
after operation should be longer than the ldap
refresh and enum cache timeouts (and that the
current values do not cope well wit the CI load).

Lukas is right that I pulled the values out of
place where no knowledge resides so let us make
a compromise. I will push this patch to the
CI a lot of times (let's say 40-50) over the
weekend and see if it fails.

I am also not sure if lowering the INTERACTIVE_TIMEOUT
was a good idea, I did it in order to make the
execution a little faster and it is a timeout that
needs to be shorter than the sleeps that wait
for the ldap change to propagate to sysdb (and I did not
want the sleeps to be too long). With
the weekend "stress testing" we can hopefully avoid
additional adjustments with new patches.

Michal
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts

2015-12-04 Thread Lukas Slebodnik
On (04/12/15 12:11), Jakub Hrozek wrote:
>On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:
>> On (03/12/15 20:22), Jakub Hrozek wrote:
>> >On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:
>> >> On (02/12/15 17:10), Michal Židek wrote:
>> >> >Hi!
>> >> >
>> >> >I saw some integration tests failures recently,
>> >> >and I think there is a race condition between the
>> >> >enumeration refresh timeout and the sleeps
>> >> >after some operations that wait for this timeout.
>> >> >SSSD fails to populate changes from LDAP in time
>> >> >and some asserts can fail because of this.
>> >> >
>> >> >So far I saw 4 tests to fail like this, which
>> >> >is already quite a lot.
>> >> >
>> >> >The attached patch modifies the timeout values
>> >> >and hopefully removes the issue.
>> >> >
>> >> >Michal
>> >> 
>> >> >From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001
>> >> >From: =?UTF-8?q?Michal=20=C5=BDidek?= 
>> >> >Date: Wed, 2 Dec 2015 16:44:48 +0100
>> >> >Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
>> >> >
>> >> >There is a race condation between ldap
>> >> >enumeration refresh timeout and the sleeps
>> >> >that wait for the ldap changes to populate
>> >> >to SSSD if the timeout and the sleeps have
>> >> >the same value.
>> >> >---
>> >> > src/tests/intg/ldap_test.py | 30 +-
>> >> > 1 file changed, 17 insertions(+), 13 deletions(-)
>> >> >
>> >> >diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py
>> >> >index 757ee20..8ec8dbe 100644
>> >> >--- a/src/tests/intg/ldap_test.py
>> >> >+++ b/src/tests/intg/ldap_test.py
>> >> >@@ -33,7 +33,11 @@ import ldap_ent
>> >> > from util import *
>> >> > 
>> >> > LDAP_BASE_DN = "dc=example,dc=com"
>> >> >-INTERACTIVE_TIMEOUT = 4
>> >> >+INTERACTIVE_TIMEOUT = 2
>> >> >+
>> >> >+
>> >> >+def wait_for_ldap_enum_refresh():
>> >> >+time.sleep(INTERACTIVE_TIMEOUT + 4)
>> >> Why does it need to be INTERACTIVE_TIMEOUT + 4
>> >> 
>> >> Could it be INTERACTIVE_TIMEOUT + 3 or + 5
>> >> 
>> >
>> >Regardless of the value we choose, can we move this patch forward? I see
>> >the related failure quite often in SSSD.
>> Adding timeout without real explanation is not a solution.
>> 
>> The main problem is with empiric values.
>> If they are very high then test are slow.
>> And there still can be slow/fast machine where lower values caused troubles.
>> 
>> The ideal solution would be to get rid of enumeration
>> in ldap tests.
>
>Enumeration is a codepath that is different from non-enumeration, so it
>should be tested. Not as priority, not as the only ldap tests, but it's
>a valid case, so it should be there.
>
>> If we want to test enumeration than it should be in separate
>> test.
>
>Maybe, but we do have enumeration tests and we should fix them.
>
Adding sleep is not a fix. It's just a workaround
because all sleep timeout are just an empiric values.
and we should fix test and not adding workaround/hacks.

If we cannot fix the test and don't want te rewrite test without enumeration
then we should remove/revert problematic tests.

LS
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts

2015-12-04 Thread Jakub Hrozek
On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:
> On (03/12/15 20:22), Jakub Hrozek wrote:
> >On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:
> >> On (02/12/15 17:10), Michal Židek wrote:
> >> >Hi!
> >> >
> >> >I saw some integration tests failures recently,
> >> >and I think there is a race condition between the
> >> >enumeration refresh timeout and the sleeps
> >> >after some operations that wait for this timeout.
> >> >SSSD fails to populate changes from LDAP in time
> >> >and some asserts can fail because of this.
> >> >
> >> >So far I saw 4 tests to fail like this, which
> >> >is already quite a lot.
> >> >
> >> >The attached patch modifies the timeout values
> >> >and hopefully removes the issue.
> >> >
> >> >Michal
> >> 
> >> >From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001
> >> >From: =?UTF-8?q?Michal=20=C5=BDidek?= 
> >> >Date: Wed, 2 Dec 2015 16:44:48 +0100
> >> >Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
> >> >
> >> >There is a race condation between ldap
> >> >enumeration refresh timeout and the sleeps
> >> >that wait for the ldap changes to populate
> >> >to SSSD if the timeout and the sleeps have
> >> >the same value.
> >> >---
> >> > src/tests/intg/ldap_test.py | 30 +-
> >> > 1 file changed, 17 insertions(+), 13 deletions(-)
> >> >
> >> >diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py
> >> >index 757ee20..8ec8dbe 100644
> >> >--- a/src/tests/intg/ldap_test.py
> >> >+++ b/src/tests/intg/ldap_test.py
> >> >@@ -33,7 +33,11 @@ import ldap_ent
> >> > from util import *
> >> > 
> >> > LDAP_BASE_DN = "dc=example,dc=com"
> >> >-INTERACTIVE_TIMEOUT = 4
> >> >+INTERACTIVE_TIMEOUT = 2
> >> >+
> >> >+
> >> >+def wait_for_ldap_enum_refresh():
> >> >+time.sleep(INTERACTIVE_TIMEOUT + 4)
> >> Why does it need to be INTERACTIVE_TIMEOUT + 4
> >> 
> >> Could it be INTERACTIVE_TIMEOUT + 3 or + 5
> >> 
> >
> >Regardless of the value we choose, can we move this patch forward? I see
> >the related failure quite often in SSSD.
> Adding timeout without real explanation is not a solution.
> 
> The main problem is with empiric values.
> If they are very high then test are slow.
> And there still can be slow/fast machine where lower values caused troubles.
> 
> The ideal solution would be to get rid of enumeration
> in ldap tests.

Enumeration is a codepath that is different from non-enumeration, so it
should be tested. Not as priority, not as the only ldap tests, but it's
a valid case, so it should be there.

> If we want to test enumeration than it should be in separate
> test.

Maybe, but we do have enumeration tests and we should fix them.

> I'm not sure we would be able to get rid of "sleep()"
> in enumeration test but all such values should pre properly documented why it
> is big enough ...
> 
> LS
> ___
> sssd-devel mailing list
> sssd-devel@lists.fedorahosted.org
> https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD]Re: [PATCH] ldap_test.py: Modify enum cache timeouts

2015-12-04 Thread Lukas Slebodnik
On (03/12/15 20:22), Jakub Hrozek wrote:
>On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:
>> On (02/12/15 17:10), Michal Židek wrote:
>> >Hi!
>> >
>> >I saw some integration tests failures recently,
>> >and I think there is a race condition between the
>> >enumeration refresh timeout and the sleeps
>> >after some operations that wait for this timeout.
>> >SSSD fails to populate changes from LDAP in time
>> >and some asserts can fail because of this.
>> >
>> >So far I saw 4 tests to fail like this, which
>> >is already quite a lot.
>> >
>> >The attached patch modifies the timeout values
>> >and hopefully removes the issue.
>> >
>> >Michal
>> 
>> >From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 2001
>> >From: =?UTF-8?q?Michal=20=C5=BDidek?= 
>> >Date: Wed, 2 Dec 2015 16:44:48 +0100
>> >Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
>> >
>> >There is a race condation between ldap
>> >enumeration refresh timeout and the sleeps
>> >that wait for the ldap changes to populate
>> >to SSSD if the timeout and the sleeps have
>> >the same value.
>> >---
>> > src/tests/intg/ldap_test.py | 30 +-
>> > 1 file changed, 17 insertions(+), 13 deletions(-)
>> >
>> >diff --git a/src/tests/intg/ldap_test.py b/src/tests/intg/ldap_test.py
>> >index 757ee20..8ec8dbe 100644
>> >--- a/src/tests/intg/ldap_test.py
>> >+++ b/src/tests/intg/ldap_test.py
>> >@@ -33,7 +33,11 @@ import ldap_ent
>> > from util import *
>> > 
>> > LDAP_BASE_DN = "dc=example,dc=com"
>> >-INTERACTIVE_TIMEOUT = 4
>> >+INTERACTIVE_TIMEOUT = 2
>> >+
>> >+
>> >+def wait_for_ldap_enum_refresh():
>> >+time.sleep(INTERACTIVE_TIMEOUT + 4)
>> Why does it need to be INTERACTIVE_TIMEOUT + 4
>> 
>> Could it be INTERACTIVE_TIMEOUT + 3 or + 5
>> 
>
>Regardless of the value we choose, can we move this patch forward? I see
>the related failure quite often in SSSD.
Adding timeout without real explanation is not a solution.

The main problem is with empiric values.
If they are very high then test are slow.
And there still can be slow/fast machine where lower values caused troubles.

The ideal solution would be to get rid of enumeration
in ldap tests. If we want to test enumeration than it should be in separate
test. I'm not sure we would be able to get rid of "sleep()"
in enumeration test but all such values should pre properly documented why it
is big enough ...

LS
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


[SSSD]Re: about fedorahosted-to-github mirror

2015-12-04 Thread Pavel Březina

On 12/03/2015 09:00 PM, Jakub Hrozek wrote:

Hi,

I was looking at options we have for setting up an automated way to
mirror our fedorahosted.org repo to github.com. Unfortunately, the
github mirror functionality seems to be discontinued[*], so the next
best thing to do is to set up a github deploy key:
 https://developer.github.com/guides/managing-deploy-keys/#deploy-keys

The private key would be on the machine we'd mirror from, the public key
would be uploaded to github. My question is -- do we want to set up the
push job on fedorahosted.org or one of our machines?

1) fedorahosted.org
   [+] We don't have to manage the machine, dedicated admins do
   [-] We'd have to give read ACL to an identity that pushes /all/
   fedorahosted.org projects.

2) Our own (CI?) machines
   [+] We manage the machine with the private key. We keep control of the
   key.
   [-] We manage the machine with the private key. We're developers, not
   admins.

I would personally prefer 1) because if the git user on fedorahosted is
compromised, all bets are off anyway and the concern about a push key to
our /mirror/ repo would not be the primary one. But at the same time, I
don't feel comfortable doing the decision without asking the
list.


I also prefer 1).



So -- is anyone opposed to me asking fedorahosted.org to generate a keypair
and giving us the public key that I would upload to github?

Thanks!



[*] github has gained enough traction already, so they don't care about
this functionality anymore..
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org


___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org