Sorry I didn’t realize you might want all sssd logs… Working on it. Dan
> On Jan 20, 2017, at 10:27 AM, Sumit Bose <sb...@redhat.com> wrote: > > On Fri, Jan 20, 2017 at 03:41:46PM +0000, Sullivan, Daniel [CRI] wrote: >> Hi, >> >> I have some more information on this issue. I’m tracing it down through the >> slapd logs and I am continuing to struggle; I was hoping that somebody >> could possibly help me provided this additional information. >> >> On the domain logs (on the DC), I see an ldap_search_ext operation 155 with >> a timeout of 60: >> >> (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with >> [(&(objectClass=ipaOverrideAnchor)(ipaAnchorUUID=:SID:S-1-5-21-xxx))][cn=Default >> Trust View,cn=views,cn=accounts,dc=xxx,dc=xxx,dc=uchicago,dc=edu]. >> (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 155 >> (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_op_add] >> (0x2000): New operation 155 timeout 60 >> >> Then, on the slapd log for the server, I see the incoming request. Based on >> the result, it looks like the request completes successfully within 1 second: >> >> [20/Jan/2017:08:42:35.179890746 -0600] conn=1518 op=31 SRCH base="cn=Default >> Trust View,cn=views,cn=accounts,dc=xxx,dc=xxx,dc=uchicago,dc=edu" scope=2 >> filter="(&(objectClass=ipaOverrideAnchor)(ipaAnchorUUID=:SID:S-1-5-21-xxx))" >> attrs=ALL >> [20/Jan/2017:08:42:35.683485674 -0600] conn=1518 op=31 RESULT err=0 tag=101 >> nentries=0 etime=1 notes=U >> >> Then, subsequently, the domain log (on the DC), I see the same operation 155 >> timeout (expectedly almost a minute later). > > So it looks like ns-ldap send the response but it never reached SSSD in > time. Can you send what it happening in the SSSD logs between 08:42:34 > and 08:43:34 (if you prefer you can send it to me directly). Maybe there > is an operation which blocks SSSD for too long so that it returns too > late to the main loop to process the response? > > bye, > Sumit > >> >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [sdap_op_timeout] (0x1000): Issuing timeout for 155 >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [sdap_op_destructor] (0x1000): Abandoning operation 155 >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [generic_ext_search_handler] (0x0040): sdap_get_generic_ext_recv failed >> [110]: Connection timed out >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [ipa_get_ad_override_done] (0x0040): ipa_get_ad_override request failed. >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [sdap_id_op_destroy] (0x4000): releasing operation connection >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [ipa_initgr_get_overrides_override_done] (0x0040): IPA override lookup >> failed: 110 >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [ipa_id_get_groups_overrides_done] (0x0040): IPA resolve user groups >> overrides failed [110]. >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [be_mark_dom_offline] (0x1000): Marking subdomain bsdad.uchicago.edu offline >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [be_mark_subdom_offline] (0x1000): Marking subdomain bsdad.uchicago.edu as >> inactive >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [ipa_srv_ad_acct_lookup_done] (0x0040): ipa_get_*_acct request failed: >> [110]: Connection timed out. >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [ipa_subdomain_account_done] (0x0040): ipa_get_*_acct request failed: [110]: >> Connection timed out. >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [sdap_id_op_destroy] (0x4000): releasing operation connection >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [dp_reply_std_set] (0x0080): DP Error is OK on failed request? >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_req_done] >> (0x0400): DP Request [Account #4292]: Request handler finished [0]: Success >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [_dp_req_recv] >> (0x0400): DP Request [Account #4292]: Receiving request data. >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [dp_req_reply_list_success] (0x0400): DP Request [Account #4292]: Finished. >> Success. >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [dp_req_reply_std] (0x1000): DP Request [Account #4292]: Returning >> [Success]: 0,110,Success >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [dp_table_value_destructor] (0x0400): Removing >> [0:1:0x0001:1:1::bsdad.uchicago.edu:name=user...@domain.uchicago.edu] from >> reply table >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [dp_req_destructor] (0x0400): DP Request [Account #4292]: Request removed. >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [dp_req_destructor] (0x0400): Number of active DP request: 1 >> >> It seems like whatever problem I have is a communication issue between sssd >> on the domain controller, and ns-slapd. I’m starting to venture down the >> road of doing trace/packet level debug logging on the LDAP server; this >> could be very time consuming and frustrating (I already have it enabled, I >> am starting to manipulate the log settings to yield potentially valuable >> data), I’m hoping that somebody might be able to provide some insight based >> on the information above; I can definitely lookup the user on both domain >> controllers & both IPA servers only use themselves for IPA servers. Thank >> you so much for reading and for your help. >> >> Dan >> >> >>> On Jan 19, 2017, at 4:15 PM, Sullivan, Daniel [CRI] >>> <dsulliv...@bsd.uchicago.edu> wrote: >>> >>> Hi, >>> >>> I’ve received incredibly good support from this mailing list previously; I >>> am hoping that somebody can help me succeed in my ongoing efforts. I have >>> spent a few days on this at this point and I can’t seem to figure it out >>> how to address this issue. On my DCs I am seeing excessive ldap_search_ext >>> and sdap_get_generic_ext_recv timeouts created solely by the invocation of >>> the ‘id’ command on sssd clients. This problem seems to present itself >>> only when I parallelize lookups for an ‘uncached’ user (i.e. I have never >>> performed an initial lookup). Individual arbitrary one-off lookups for a >>> single uncached user on a single system almost always work fine. This >>> leads me to believe this is a performance tuning issue. >>> >>> We operate in an academic research computing unit (i.e. we have an HPC >>> cluster), and I need the ability to lookup the same user in parallel (using >>> the id command) across a relatively large number of systems, for example to >>> spawn jobs that require large amounts of CPU cores and/or memory. Right >>> now I am doing about 50 parallel lookups for the same user to induce this >>> problem. >>> >>> Here is some background information: >>> >>> 1) I have read Jakub's “Anatomy of an SSSD Lookup” as well as “Performance >>> Tuning of SSSD for large IPA-AD deployments”, as well as implemented >>> recommendations from the performance tuning doc, including moving the sssd >>> cache to tmpfs. >>> 2) We are on ipa-server 4.4.0-14.el7_3.4 using a trusted AD domain; all of >>> our consumed users and groups are in the AD trusted domain. We have two >>> domain controllers; each is a RHEL 7.3 VM with 6 GB of memory. Almost all >>> (if not all) of our clients are running at least sssd 1.14, and are all >>> RHEL 6/7. Neither DC is swapping, and both have 2 CPUs. >>> 3) I have tuned SSSD clients on the DCs and all clients to include these >>> options (the problem persists): >>> a) ldap_opt_timeout = 60 >>> b) ldap_search_timeout = 60 >>> 4) On both DCs, I can clear the SSSD cache, and lookup all 2000 or so users >>> in my environment with 40 concurrent lookups occurring locally on each DC >>> (using UNIX job control). I can process all 2000 lookups in this manner >>> without any failures (on either DC), and have ‘pre-populated’ the SSSD >>> cache on both DC’s by doing this. >>> 6) I have made no additional performance tuning changes other than what has >>> been described. >>> >>> Would anybody be able to advise on any potential tuning that would be >>> required (presumably on the DCs), to facilitate 50 parallel lookups without >>> experiencing sdap_get_generic_ext_recv or ldap_search_ext timeouts? >>> Should I be able to do this sort of thing with relative ease? I was hoping >>> this would be the sort of thing that would just work, but based on my >>> relatively extensive testing it doesn’t. Any advice anybody could provide >>> would be greatly appreciated. >>> >>> Thank you, >>> >>> Dan Sullivan >>> >>> >> >> >> -- >> Manage your subscription for the Freeipa-users mailing list: >> https://www.redhat.com/mailman/listinfo/freeipa-users >> Go to http://freeipa.org for more info on the project > > -- > Manage your subscription for the Freeipa-users mailing list: > https://www.redhat.com/mailman/listinfo/freeipa-users > Go to http://freeipa.org for more info on the project -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project