For several years we have run Kerberos (built from source) on Solaris 9/10. We often dump our production master, copy the file over to our test master, and reload it with kdb5_util. In this environment everything has always worked as expected. A full resync of our test slave is immediately initiated, the database on the test master is dumped, kprop'd to the slave, and reloaded on the slave. This take just a few minutes to complete.
We're in the process of moving our Kerberos environment from Solaris to CentOS Linux. We're running two CentOS 6.6 Linux VMs with Kerberos RPMs (the latest available from standard repositories) of krb5-libs-1.10.3-37.el6_6.x86_64 krb5-workstation-1.10.3-37.el6_6.x86_64 krb5-server-1.10.3-37.el6_6.x86_64 For reference: h.test.admin.private is the master (172.30.110.45) r.test.admin.private is the slave (172.30.110.46) And a few other items of note: Both hosts are running NTP so the clocks are in sync iptables is disabled at boot MTUs on the network interfaces are default at 1500 After configuring things I initialize the master (from a dump of our production master) using kdb5_util, start krb5kdc and kadmin on the master and then start kpropd on the slave, expecting a full resync to start and the slave to be initialized. But I'm seeing strange and erratic behaviour. In syslog, I often see this from kadmind on the master: starting Authentication attempt failed: 172.30.110.46, GSS-API error strings are: Unspecified GSS failure. Minor code may provide more information Clock skew too great GSS-API error strings complete. Authentication attempt failed: 172.30.110.46, GSS-API error strings are: Unspecified GSS failure. Minor code may provide more information Key version number for principal in key table is incorrect GSS-API error strings complete. Authentication attempt failed: 172.30.110.46, GSS-API error strings are: Unspecified GSS failure. Minor code may provide more information Clock skew too great GSS-API error strings complete. Authentication attempt failed: 172.30.110.46, GSS-API error strings are: Unspecified GSS failure. Minor code may provide more information Key version number for principal in key table is incorrect GSS-API error strings complete. closing down fd 32 closing down fd 33 Authentication attempt failed: 172.30.110.46, GSS-API error strings are: Unspecified GSS failure. Minor code may provide more information Clock skew too great GSS-API error strings complete. Authentication attempt failed: 172.30.110.46, GSS-API error strings are: Unspecified GSS failure. Minor code may provide more information Key version number for principal in key table is incorrect GSS-API error strings complete. closing down fd 32 Request: iprop_get_updates_1, UPDATE_FULL_RESYNC_NEEDED; Incoming SerialNo=0; Outgoing SerialNo=N/A, success, client=kiprop/r.test.admin.priv...@wmich.edu, service=kiprop/h.test.admin.priv...@wmich.edu, addr=172.30.110.46 Request: iprop_full_resync_1, spawned resync process 2944, client=kiprop/r.test.admin.priv...@wmich.edu, service=kiprop/h.test.admin.priv...@wmich.edu, addr=172.30.110.46 Sometimes I just see the Request: iprop_get_updates line, and no complaints about authentication failures or clock skew. On the slave I see syslog entries showing repeated problems with kpropd connecting to the master: /usr/sbin/kpropd: GSS-API (or Kerberos) error while initializing /usr/sbin/kpropd interface, retrying I start kpropd with -d -S and use strace on it and I see that repeatedly opens /dev/urandom and reads from it just before I see the above error. When running kpropd in standalone mode and with debug enabled I see syslog entry above and eventually I see do_standalone: setting resync alarm to 10 waiting for a kprop connection doit: setting resync alarm to 5s Connection from h.test.admin.private krb5_recvauth(6, kprop5_01, host/r.test.admin.priv...@wmich.edu, ...) authenticated client: host/h.test.admin.priv...@wmich.edu (etype == DES cbc mode with CRC-32) /usr/sbin/kpropd: Connection reset by peer while reading database block starting at offset 92340224 Full resync was unsuccessful Unfortunately, the resync was not successful. Often (but not always), when kprop -f starts on the master, the slave_datatrans file will *partially* copy to the slave, often 60-90% of the data, before the connection hangs and then times out. I have run strace on both the kprop and kpropd processes while they are connected. The kprop on the master hangs during a write() for several minutes and then eventually times out: Process 3183 attached - interrupt to quit writev(4, [{"\240\37\26+[\16\247\tC\21\6/\243\217\340\0231f\362\245\3\214$\246\227\231N\265\351\366\1\233"..., 22106}], 1) = -1 ETIMEDOUT (Connection timed out) write(2, "kprop", 5) = 5 write(2, ": ", 2) = 2 write(2, "Connection timed out", 20) = 20 write(2, " ", 1) = 1 write(2, "while sending database block sta"..., 49) = 49 ioctl(2, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffadfee150) = -1 ENOTTY (Inappropriate ioctl for device) write(2, "\n", 1) = 1 exit_group(1) = ? Process 3183 detached Occasionally, when left alone, the slave will be successfully sync'd. Sometimes this happens in minutes, even hours, and, on a few occasions, overnight. When successful, the syslog entries on both the master and slave look nominal. Once sync'd, incremental propagation works fine. In my debugging attempts, I tried starting kpropd with kpropd -S -d -P NNN and then attempt to run kprop -f slave_datatrans -P NNN r.test.admin.private on the master but kpropd on the slave doesn't appear to be listening on port NNN. Am I misunderstanding something? In syslog on the slave, on an aborted resync I sometimes see May 6 17:18:50 r.test.admin.private kpropd[7078]: /usr/sbin/kpropd: Connection reset by peer while reading database block starting at offset 0 May 6 17:18:50 r.test.admin.private kpropd[7032]: kpropd: Full resync, invalid return. Thinking that perhaps this was a problem with Kerberos that is available in the repositories for CentOS I decided to download the latest version (1.13.2) and build it from source. It built cleanly and passed the tests but when I tried it I noticed the same behaviour as the Kerberos from the RPMs. Has anyone else seen anything like this? TIA... __ Leonard J. Peirce Western Michigan University Office of Information Technology Kalamazoo, MI 49008 ________________________________________________ Kerberos mailing list Kerberos@mit.edu https://mailman.mit.edu/mailman/listinfo/kerberos