Forgot to CC the ML. Sorry.

 --
 Daryl Fonseca-Holt
 IST/CNS/Unix Server Team
 University of Manitoba
 204.480.1079

On Wed, 23 Mar 2016, Daryl Fonseca-Holt wrote:

Hi Thierry,

I have not filed a support request with RedHat for two reasons. First, it seems that the NIS priming may not be a problem in the post 4.2.0 release. Second, I am able to work around the problem by modifying the code where it uses the number of krb5kdc daemons to start thus alleviating the crush of 64 daemons starting. As suggested by Alexander I'm patching ipaserver/install/krbinstance.py.

# diff -c ipaserver/install/krbinstance.py.df ipaserver/install/krbinstance.py *** ipaserver/install/krbinstance.py.df 2015-06-18 07:54:49.000000000 -0500
--- ipaserver/install/krbinstance.py    2016-03-22 13:37:16.056210609 -0500
***************
*** 355,360 ****
--- 355,362 ----

         MIN_KRB5KDC_WITH_WORKERS = "1.9"
         cpus = os.sysconf('SC_NPROCESSORS_ONLN')
+     #XXX
+     cpus = 1
         workers = False
(stdout, stderr, rc) = ipautil.run(['klist', '-V'], raiseonerr=False)
         if rc == 0:


With this patch the constant in /etc/sysconfig/krb5kdc is lower. After the ipa-replica-install completes I will increase this manually to the original value.

This has allowed the building phase of the project to continue.

Thanks, Daryl

On 03/23/16 04:40, thierry bordaz wrote:
Hi Daryl,

Me again... :-)
As a follow up of this issue I would like to know if you already open a case to RH support ?

Also, have you identified a workaround to make ipa-replica-install successful or are you still suffering from this issue ?

best regards
thierry

On 03/16/2016 04:24 PM, thierry bordaz wrote:
Hello Daryl,

    I can reproduce locally the slow DS startup (due to slapi-nis
    priming). In fact the version I was using had not the slapi-nis
    fix to differ the priming.

    I failed to reproduce the intensive load on DS when krb5kdc startup.
    Looking at yours logs, we can see that krb5kdc startup triggers a
    set of requests during 3s up to 8s. The logs are looking like
    (note the etime can go up to 2s):

    [10/Mar/2016:14:20:35 -0600] conn=40 fd=87 slot=87 connection
    from local to /var/run/slapd-UOFMT1.socket
    [10/Mar/2016:14:20:36 -0600] conn=40 AUTOBIND dn="cn=Directory
    Manager"
    [10/Mar/2016:14:20:36 -0600] conn=40 op=0 BIND dn="cn=Directory
    Manager" method=sasl version=3 mech=EXTERNAL
    [10/Mar/2016:14:20:36 -0600] conn=40 op=0 RESULT err=0 tag=97
    nentries=0 *etime=1* dn="cn=Directory Manager"
    [10/Mar/2016:14:20:36 -0600] conn=40 op=1 SRCH
    base="cn=UOFMT1,cn=kerberos,dc=uofmt1" scope=0
    filter="(objectClass=*)" attrs=ALL
    [10/Mar/2016:14:20:36 -0600] conn=40 op=1 RESULT err=0 tag=101
    nentries=1 etime=0
    [10/Mar/2016:14:20:36 -0600] conn=40 op=2 SRCH
    base="cn=ipaConfig,cn=etc,dc=uofmt1" scope=0
    filter="(objectClass=*)" attrs="ipaConfigString ipaKrbAuthzData
    ipaUserAuthType"
    [10/Mar/2016:14:20:36 -0600] conn=40 op=2 RESULT err=0 tag=101
    nentries=1 etime=0
    [10/Mar/2016:14:20:36 -0600] conn=40 op=3 SRCH base="dc=uofmt1"
    scope=2 filter="(objectClass=ipaNTDomainAttrs)"
    attrs="ipaNTFlatName ipaNTFallbackPrimaryGroup
    ipaNTSecurityIdentifier"
    [10/Mar/2016:14:20:37 -0600] conn=40 op=3 RESULT err=0 tag=101
    nentries=0 *etime=1*
    [10/Mar/2016:14:20:37 -0600] conn=40 op=4 SRCH
    base="cn=UOFMT1,cn=kerberos,dc=uofmt1" scope=0
    filter="(krbMKey=*)" attrs="krbMKey"
    [10/Mar/2016:14:20:37 -0600] conn=40 op=4 RESULT err=0 tag=101
    nentries=1 etime=0
    [10/Mar/2016:14:20:37 -0600] conn=40 op=5 SRCH base="dc=uofmt1"
    scope=2
    
filter="(&(|(objectClass=krbprincipalaux)(objectClass=krbprincipal))(krbPrincipalName=K/M@UOFMT1))"
    attrs="krbPrincipalName krbCanonicalName ipaKrbPrincipalAlias
    krbUPEnabled krbPrincipalKey krbTicketPolicyReference
    krbPrincipalExpiration krbPasswordExpiration
    krbPwdPolicyReference krbPrincipalType krbPwdHistory
    krbLastPwdChange krbPrincipalAliases krbLastSuccessfulAuth
    krbLastFailedAuth krbLoginFailedCount krbExtraData
    krbLastAdminUnlock krbObjectReferences krbTicketFlags
    krbMaxTicketLife krbMaxRenewableAge nsAccountLock passwordHistory
    ipaKrbAuthzData ipaUserAuthType ipatokenRadiusConfigLink objectClass"
    [10/Mar/2016:14:20:38 -0600] conn=40 op=5 RESULT err=0 tag=101
    nentries=1 *etime=1*
    [10/Mar/2016:14:20:39 -0600] conn=40 op=6 UNBIND
    [10/Mar/2016:14:20:39 -0600] conn=40 op=6 fd=87 closed - U1


    I think the request op=3 (SRCH base="dc=uofmt1" scope=2
    filter="(objectClass=ipaNTDomainAttrs)") is slow also because of
    slapi-nis. In fact it is indexed and returns 0 entry. So only
    plugins can create this high etime.
    An improvement in slapi-nis makes its search callback noop when
    it comes from krb and I am running this improvement.

    In conclusion I think both slow DS startup and KRB5 startup are
    fixed in RHEL 7.

    thanks
    theirry


On 03/14/2016 08:46 PM, Daryl Fonseca-Holt wrote:
Hello Thierry,

Attached is the pstacks from only the final DS restart. I don't think they will show the whole picture.

According to the debug log /var/log/ipareplica-install.log (attached) the start of the krb5kdc.service (19:13:16Z) is successful, but the krb5kdc log (attach) shows it is unable to fetch the master K/M at 14:31:31CDT (-5hour offset). This is when the install log shows kadmind failing.

In my experience with the master observing top there are two intense times for ns-slapd-<instance>. The first when it start, of course, and the second when krb5kdc starts. I assume this is because krb5kdc must get it's configuration and data from the same DS. krb5kdc fails but the ipareplica-install script isn't aware of it. Finally kadmin.service tries to access krb5kdc and finds that it is dead.

Please note these logs are with Schema Compatability and NIS plugins turned off per the other e-mail from Alexander.

I've noticed on a running master I can prevent this type of failure by manually starting dirsrv (systemctl start dirsrv@<instance>.service), watch top until all threads of ns-slapd have settled, then systemctl start krb5kdc.service, again watching top until ns-slapd threads have settled down before systemctl start kadmin.service. This kind of manual intervention is is not possible when running the ipareplica-install script.

I will look into introducing a delay at the completion of the dirsrv and krb5kdc systemd units and see if I can accommodate ipareplica-install. Just as an experiment for now. I need to advance the project into High Availability testing but cannot do so without a functioning replica.

Regards, Daryl

On 03/14/16 09:20, thierry bordaz wrote:
Hi Daryl,

Thanks for all the data. I will look at the pstacks. A first look shows that you capture import, bind... so may be a complete ipa-replica-install session. I will try to retrieve the specific startup time to see what was going on at that time. If you have the time to monitor only startup, it will help me shrinking the set of pstacks. Startup of DS last > 1min. If you may start DS and as soon as the ns-slapd process is launched, do regular pstacks. Then when you are able to send a simple ldapsearch (ldapsearch -x -b "" -s base), you may stop taking pstacks.

thanks
thierry

On 03/14/2016 03:06 PM, Daryl Fonseca-Holt wrote:
Hi Thierry,

I moved the old logs into a subdirectory called try1. I did the recommended ipa-server-install --uninstall. Tried the replica install again. Failed during kadmind start like the previous time.

The log from ipa-replica-install (with -d) is at http://home.cc.umanitoba.ca/~fonsecah/ipa/ipareplica-install.log The console script (mostly the same as the log but with my entries) is at http://home.cc.umanitoba.ca/~fonsecah/ipa/ipa-replica-install.console The 5 second pstacks are at http://home.cc.umanitoba.ca/~fonsecah/ipa/slapd-pstacks.console

Thanks, Daryl


On 03/11/16 02:40, thierry bordaz wrote:
Hello Deryl,

    My understanding is that ns-slapd is first slow to startup.
    Then when krb5kdc is starting it may load ns-slapd.

    We identified krb5kdc may be impacted by the number of users
    accounts.
    From the ns-slapd errors log it is not clear why it is so
    slow to start.

    Would you provide the ns-slapd  access logs from that period.
    Also in order to know where ns-slapd is spending time, it
    would really help if you can get regular (each 5s) pstacks
    (with 389-ds-debuginfo), during DS startup and then later
    during krb5kdc startup.

    best regards
    thierry


On 03/10/2016 11:10 PM, Daryl Fonseca-Holt wrote:
Environment:
  RHEL 7.2
  IPA 4.2.0-15
  nss 3.19.1-19
  389-ds-base 1.3.4.0-26
  sssd 1.13.0-40


I've encountered this problem in IPA 3.0.0 but hoped it was addressed in 4.2.0.

Trying to set up a replica of a master with 150,000+ user accounts, NIS and Schema Compatability enabled on the master.

During ipa-replica-install it attempts to start IPA. dirsrv starts, krb5kdc starts, but then kadmind fails because krb5kdc has gone missing.

This happens during restart of IPA in version 3.0.0 too. There it can be overcome by manually starting each component of IPA _but_ waiting until ns-slapd-<instance> has settled down (as seen from top) before starting krb5kdc. I also think that the startup of krb5kdc loads the LDAP instance quite a bit.

There is a problem in the startup logic where dirsrv is so busy that even though krb5kdc successfully starts and allows the kadmin to begin kdb5kdc is not really able to do its duties.

I'm reporting this since there must be some way to delay the start of krb5kdc and then kadmind until ns-slapd-<instance> is really open for business.

# systemctl status krb5kdc.service
● krb5kdc.service - Kerberos 5 KDC
Loaded: loaded (/usr/lib/systemd/system/krb5kdc.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

Mar 10 14:19:13 jutta.cc.umanitoba.ca systemd[1]: Stopped Kerberos 5 KDC. Mar 10 14:20:36 jutta.cc.umanitoba.ca systemd[1]: Starting Kerberos 5 KDC... Mar 10 14:20:39 jutta.cc.umanitoba.ca systemd[1]: Started Kerberos 5 KDC.

# systemctl status krb5kdc.service
● krb5kdc.service - Kerberos 5 KDC
Loaded: loaded (/usr/lib/systemd/system/krb5kdc.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

Mar 10 14:19:13 jutta.cc.umanitoba.ca systemd[1]: Stopped Kerberos 5 KDC. Mar 10 14:20:36 jutta.cc.umanitoba.ca systemd[1]: Starting Kerberos 5 KDC... Mar 10 14:20:39 jutta.cc.umanitoba.ca systemd[1]: Started Kerberos 5 KDC.

journalctl -xe was stale by the time I got to it so I've attached /var/log/messages instead.

The log from ipa-replica-install (with -d) is at http://home.cc.umanitoba.ca/~fonsecah/ipa/ipareplica-install.log The console script (mostly the same as the log but with my entries) is at http://home.cc.umanitoba.ca/~fonsecah/ipa/ipa-replica-install.console The /var/log/dirsrv/ns-slapd-<instance> access log is at http://home.cc.umanitoba.ca/~fonsecah/ipa/access

Regards, Daryl





--
  --
  Daryl Fonseca-Holt
  IST/CNS/Unix Server Team
  University of Manitoba
  204.480.1079


--
  --
  Daryl Fonseca-Holt
  IST/CNS/Unix Server Team
  University of Manitoba
  204.480.1079





--
--
Daryl Fonseca-Holt
IST/CNS/Unix Server Team
University of Manitoba
204.480.1079

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Reply via email to