Hi,

I have some more information on this issue.  I’m tracing it down through the 
slapd logs and  I am continuing to struggle; I was hoping that somebody could 
possibly help me provided this additional information.

On the domain logs (on the DC), I see an ldap_search_ext operation 155 with a 
timeout of 60:

(Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] 
[sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with 
[(&(objectClass=ipaOverrideAnchor)(ipaAnchorUUID=:SID:S-1-5-21-xxx))][cn=Default
 Trust View,cn=views,cn=accounts,dc=xxx,dc=xxx,dc=uchicago,dc=edu].
(Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] 
[sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 155
(Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_op_add] 
(0x2000): New operation 155 timeout 60

Then, on the slapd log for the server, I see the incoming request.  Based on 
the result, it looks like the request completes successfully within 1 second:

[20/Jan/2017:08:42:35.179890746 -0600] conn=1518 op=31 SRCH base="cn=Default 
Trust View,cn=views,cn=accounts,dc=xxx,dc=xxx,dc=uchicago,dc=edu" scope=2 
filter="(&(objectClass=ipaOverrideAnchor)(ipaAnchorUUID=:SID:S-1-5-21-xxx))" 
attrs=ALL
[20/Jan/2017:08:42:35.683485674 -0600] conn=1518 op=31 RESULT err=0 tag=101 
nentries=0 etime=1 notes=U

Then, subsequently, the domain log (on the DC), I see the same operation 155 
timeout (expectedly almost a minute later).

(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_op_timeout] 
(0x1000): Issuing timeout for 155
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] 
[sdap_op_destructor] (0x1000): Abandoning operation 155
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] 
[generic_ext_search_handler] (0x0040): sdap_get_generic_ext_recv failed [110]: 
Connection timed out
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] 
[ipa_get_ad_override_done] (0x0040): ipa_get_ad_override request failed.
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] 
[sdap_id_op_destroy] (0x4000): releasing operation connection
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] 
[ipa_initgr_get_overrides_override_done] (0x0040): IPA override lookup failed: 
110
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] 
[ipa_id_get_groups_overrides_done] (0x0040): IPA resolve user groups overrides 
failed [110].
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] 
[be_mark_dom_offline] (0x1000): Marking subdomain bsdad.uchicago.edu offline
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] 
[be_mark_subdom_offline] (0x1000): Marking subdomain bsdad.uchicago.edu as 
inactive
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] 
[ipa_srv_ad_acct_lookup_done] (0x0040): ipa_get_*_acct request failed: [110]: 
Connection timed out.
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] 
[ipa_subdomain_account_done] (0x0040): ipa_get_*_acct request failed: [110]: 
Connection timed out.
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] 
[sdap_id_op_destroy] (0x4000): releasing operation connection
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_reply_std_set] 
(0x0080): DP Error is OK on failed request?
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_req_done] 
(0x0400): DP Request [Account #4292]: Request handler finished [0]: Success
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [_dp_req_recv] 
(0x0400): DP Request [Account #4292]: Receiving request data.
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] 
[dp_req_reply_list_success] (0x0400): DP Request [Account #4292]: Finished. 
Success.
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_req_reply_std] 
(0x1000): DP Request [Account #4292]: Returning [Success]: 0,110,Success
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] 
[dp_table_value_destructor] (0x0400): Removing 
[0:1:0x0001:1:1::bsdad.uchicago.edu:name=user...@domain.uchicago.edu] from 
reply table
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_req_destructor] 
(0x0400): DP Request [Account #4292]: Request removed.
(Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_req_destructor] 
(0x0400): Number of active DP request: 1

It seems like whatever problem I have is a communication issue between sssd on 
the domain controller, and ns-slapd.  I’m starting to venture down the road of 
doing trace/packet level debug logging on the LDAP server; this could be very 
time consuming and frustrating (I already have it enabled, I am starting to 
manipulate the log settings to yield potentially valuable data), I’m hoping 
that somebody might be able to provide some insight based on the information 
above; I can definitely lookup the user on both domain controllers & both IPA 
servers only use themselves for IPA servers.  Thank you so much for reading and 
for your help.

Dan


> On Jan 19, 2017, at 4:15 PM, Sullivan, Daniel [CRI] 
> <dsulliv...@bsd.uchicago.edu> wrote:
> 
> Hi,
> 
> I’ve received incredibly good support from this mailing list previously; I am 
> hoping that somebody can help me succeed in my ongoing efforts.  I have spent 
> a few days on this at this point and I can’t seem to figure it out how to 
> address this issue.  On my DCs I am seeing excessive ldap_search_ext and 
> sdap_get_generic_ext_recv timeouts created solely by the invocation of the 
> ‘id’ command on sssd clients.  This problem seems to present itself only when 
> I parallelize lookups for an ‘uncached’ user (i.e. I have never performed an 
> initial lookup).  Individual arbitrary one-off lookups for a single uncached 
> user on a single system almost always work fine.  This leads me to believe 
> this is a performance tuning issue.
> 
> We operate in an academic research computing unit (i.e. we have an HPC 
> cluster), and I need the ability to lookup the same user in parallel (using 
> the id command) across a relatively large number of systems, for example to 
> spawn jobs that require large amounts of CPU cores and/or memory.  Right now 
> I am doing about 50 parallel lookups for the same user to induce this 
> problem.  
> 
> Here is some background information:
> 
> 1) I have read Jakub's “Anatomy of an SSSD Lookup” as well as “Performance 
> Tuning of SSSD for large IPA-AD deployments”, as well as implemented 
> recommendations from the performance tuning doc, including moving the sssd 
> cache to tmpfs.
> 2) We are on ipa-server 4.4.0-14.el7_3.4 using a trusted AD domain; all of 
> our consumed users and groups are in the AD trusted domain.  We have two 
> domain controllers; each is a RHEL 7.3 VM with 6 GB of memory.  Almost all 
> (if not all) of our clients are running at least sssd 1.14, and are all RHEL 
> 6/7.  Neither DC is swapping, and both have 2 CPUs.
> 3) I have tuned SSSD clients on the DCs and all clients to include these 
> options (the problem persists):
>  a) ldap_opt_timeout = 60
>  b) ldap_search_timeout = 60
> 4) On both DCs, I can clear the SSSD cache, and lookup all 2000 or so users 
> in my environment with 40 concurrent lookups occurring locally on each DC 
> (using UNIX job control).  I can process all 2000 lookups in this manner 
> without any failures (on either DC), and have ‘pre-populated’ the SSSD cache 
> on both DC’s by doing this.
> 6) I have made no additional performance tuning changes other than what has 
> been described.
> 
> Would anybody be able to advise on any potential tuning that would be 
> required (presumably on the DCs), to facilitate 50 parallel lookups without 
> experiencing sdap_get_generic_ext_recv or  ldap_search_ext  timeouts?  Should 
> I be able to do this sort of thing with relative ease?  I was hoping this 
> would be the sort of thing that would just work, but based on my relatively 
> extensive testing it doesn’t.  Any advice anybody could provide would be 
> greatly appreciated.
> 
> Thank you,
> 
> Dan Sullivan
> 
> 


-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Reply via email to