Currently in my environment I have 6 servers 2 in my local office and 2 in each region in AWS. The AWS servers are all running CentOS 7.x with FreeIPA 4.5.x running on all 6. The AWS servers are all t2.medium w/ unlimited turned on. Occasionally we issues with all 6 where one of the processes for freeipa stops working completely. This could be the ipa.service or the named-pkcs11, or dir...@myrealm.org. Sometimes it will be resource constraints, other times the whole system could come to a crawl for no reason whatsoever. Looking through the IPA logs doesn't always tell me what was going on. I usually have to restart the service or reboot the whole instance/machine to get it back to a working state.
Also as of right now i'm seeing that dir...@myrealm.net will not start because a configured resource limit was exceeded. Here is the error i'm getting all of a sudden: ● dir...@example.net.service - 389 Directory Server EXAMPLE.NET. Loaded: loaded (/usr/lib/systemd/system/dirsrv@.service; enabled; vendor preset: disabled) Active: failed (Result: resources) Jan 14 19:55:21 freeipa01.west.example.net systemd[1]: Failed to load environment files: No such file or directoryJan 14 19:55:21 freeipa01.west.example.net systemd[1]: dir...@example.net.service failed to run 'start-pre' task: No such file or directoryJan 14 19:55:21 freeipa01.west.example.net systemd[1]: Failed to start 389 Directory Server EXAMPLE.NET..Jan 14 19:55:21 freeipa01.west.example.net systemd[1]: Unit dir...@example.net.service entered failed state.Jan 14 19:55:21 freeipa01.west.example.net systemd[1]: dir...@example.net.service failed.Jan 14 19:55:21 freeipa01.west.example.net systemd[1]: Starting 389 Directory Server EXAMPLE.NET....[andrew.meyer@freeipa01 ~]$ Here is a snippet from the logs:/var/log/dirsrv/slapd-EXAMPLE-NET/errors [14/Jan/2019:19:51:42.631413149 +0000] - NOTICE - ldbm_back_start - found 3880412k physical memory[14/Jan/2019:19:51:42.632553293 +0000] - NOTICE - ldbm_back_start - found 3273584k available[14/Jan/2019:19:51:42.633584210 +0000] - NOTICE - ldbm_back_start - cache autosizing: db cache: 97010k[14/Jan/2019:19:51:42.634560420 +0000] - NOTICE - ldbm_back_start - cache autosizing: userRoot entry cache (3 total): 131072k[14/Jan/2019:19:51:42.636236633 +0000] - NOTICE - ldbm_back_start - cache autosizing: userRoot dn cache (3 total): 65536k[14/Jan/2019:19:51:42.639592221 +0000] - NOTICE - ldbm_back_start - cache autosizing: ipaca entry cache (3 total): 131072k[14/Jan/2019:19:51:42.641296133 +0000] - NOTICE - ldbm_back_start - cache autosizing: ipaca dn cache (3 total): 65536k[14/Jan/2019:19:51:42.643594212 +0000] - NOTICE - ldbm_back_start - cache autosizing: changelog entry cache (3 total): 131072k[14/Jan/2019:19:51:42.645241367 +0000] - NOTICE - ldbm_back_start - cache autosizing: changelog dn cache (3 total): 65536k[14/Jan/2019:19:51:42.646916994 +0000] - NOTICE - ldbm_back_start - total cache size: 683450613 B;[14/Jan/2019:19:51:42.650449731 +0000] - NOTICE - dblayer_start - Detected Disorderly Shutdown last time Directory Server was running, recovering database.[14/Jan/2019:19:51:49.346656922 +0000] - ERR - schema-compat-plugin - scheduled schema-compat-plugin tree scan in about 5 seconds after the server startup![14/Jan/2019:19:51:49.544963162 +0000] - ERR - NSACLPlugin - acl_parse - The ACL target cn=groups,cn=compat,dc=example,dc=net does not exist[14/Jan/2019:19:51:49.564630973 +0000] - ERR - NSACLPlugin - acl_parse - The ACL target cn=computers,cn=compat,dc=example,dc=net does not exist[14/Jan/2019:19:51:49.584635724 +0000] - ERR - NSACLPlugin - acl_parse - The ACL target cn=ng,cn=compat,dc=example,dc=net does not exist[14/Jan/2019:19:51:49.604545604 +0000] - ERR - NSACLPlugin - acl_parse - The ACL target ou=sudoers,dc=example,dc=net does not exist[14/Jan/2019:19:51:49.624542861 +0000] - ERR - NSACLPlugin - acl_parse - The ACL target cn=users,cn=compat,dc=example,dc=net does not exist[14/Jan/2019:19:51:49.684538158 +0000] - ERR - NSACLPlugin - acl_parse - The ACL target cn=ad,cn=etc,dc=example,dc=net does not exist[14/Jan/2019:19:51:49.825646593 +0000] - ERR - NSACLPlugin - acl_parse - The ACL target cn=casigningcert cert-pki-ca,cn=ca_renewal,cn=ipa,cn=etc,dc=example,dc=net does not exist[14/Jan/2019:19:51:49.844539895 +0000] - ERR - NSACLPlugin - acl_parse - The ACL target cn=casigningcert cert-pki-ca,cn=ca_renewal,cn=ipa,cn=etc,dc=example,dc=net does not exist[14/Jan/2019:19:51:49.990796503 +0000] - ERR - NSACLPlugin - acl_parse - The ACL target cn=automember rebuild membership,cn=tasks,cn=config does not exist[14/Jan/2019:19:52:00.365183146 +0000] - NOTICE - NSMMReplicationPlugin - changelog program - _cl5ConstructRUV - Rebuilding the replication changelog RUV, this may take several minutes...[14/Jan/2019:19:52:00.445692209 +0000] - NOTICE - NSMMReplicationPlugin - changelog program - _cl5ConstructRUV - Rebuilding replication changelog RUV complete. Result 0 (Success)[14/Jan/2019:19:52:00.446842275 +0000] - NOTICE - NSMMReplicationPlugin - changelog program - _cl5ConstructRUV - Rebuilding the replication changelog RUV, this may take several minutes...[14/Jan/2019:19:52:00.513081196 +0000] - NOTICE - NSMMReplicationPlugin - changelog program - _cl5ConstructRUV - Rebuilding replication changelog RUV complete. Result 0 (Success)[14/Jan/2019:19:52:00.514432537 +0000] - NOTICE - NSMMReplicationPlugin - changelog program - _cl5ConstructRUV - Rebuilding the replication changelog RUV, this may take several minutes...[14/Jan/2019:19:52:00.515614720 +0000] - NOTICE - NSMMReplicationPlugin - changelog program - _cl5ConstructRUV - Rebuilding replication changelog RUV complete. Result 0 (Success)[14/Jan/2019:19:52:00.516689146 +0000] - NOTICE - NSMMReplicationPlugin - changelog program - _cl5ConstructRUV - Rebuilding the replication changelog RUV, this may take several minutes...[14/Jan/2019:19:52:00.517810761 +0000] - NOTICE - NSMMReplicationPlugin - changelog program - _cl5ConstructRUV - Rebuilding replication changelog RUV complete. Result 0 (Success)[14/Jan/2019:19:52:00.526034848 +0000] - WARN - NSMMReplicationPlugin - replica_check_for_data_reload - Disorderly shutdown for replica o=ipaca. Check if DB RUV needs to be updated[14/Jan/2019:19:52:00.584599421 +0000] - ERR - set_krb5_creds - Could not get initial credentials for principal [ldap/freeipa01.west.example....@example.net] in keytab [FILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC for requested realm)[14/Jan/2019:19:52:00.604610301 +0000] - WARN - NSMMReplicationPlugin - replica_check_for_data_reload - Disorderly shutdown for replica dc=example,dc=net. Check if DB RUV needs to be updated[14/Jan/2019:19:52:00.624573161 +0000] - ERR - set_krb5_creds - Could not get initial credentials for principal [ldap/freeipa01.west.example....@example.net] in keytab [FILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC for requested realm)[14/Jan/2019:19:52:00.644588420 +0000] - NOTICE - NSMMReplicationPlugin - Force update of database RUV (from CL RUV) -> 5c3cdd700000001b0000[14/Jan/2019:19:52:00.675318674 +0000] - INFO - slapd_daemon - slapd started. Listening on All Interfaces port 389 for LDAP requests[14/Jan/2019:19:52:00.684551689 +0000] - INFO - slapd_daemon - Listening on All Interfaces port 636 for LDAPS requests[14/Jan/2019:19:52:00.704669406 +0000] - INFO - slapd_daemon - Listening on /var/run/slapd-EXAMPLE-NET.socket for LDAPI requests[14/Jan/2019:19:52:00.752022024 +0000] - ERR - schema-compat-plugin - schema-compat-plugin tree scan will start in about 5 seconds![14/Jan/2019:19:52:04.456691971 +0000] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=meTofreeipa03.east.example.net" (freeipa03:389) - Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()[14/Jan/2019:19:52:05.055410972 +0000] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=meTofreeipa02.west.example.net" (freeipa02:389) - Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()[14/Jan/2019:19:52:06.180112049 +0000] - ERR - schema-compat-plugin - warning: no entries set up under cn=computers, cn=compat,dc=example,dc=net[14/Jan/2019:19:52:06.225167327 +0000] - ERR - schema-compat-plugin - Finished plugin initialization.[14/Jan/2019:19:52:09.118232562 +0000] - INFO - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=meTofreeipa02.west.example.net" (freeipa02:389): Replication bind with GSSAPI auth resumed[14/Jan/2019:19:52:09.417962320 +0000] - INFO - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=meTofreeipa03.east.example.net" (freeipa03:389): Replication bind with GSSAPI auth resumed For the on premise freeipa servers I have upgraded the RAM/CPU to 4x4. However I wanted to reach out to the mailing list to find out what to do about the servers in AWS. Regards,
_______________________________________________ FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org