Is replication from a 1.2.8.3 server to a 1.2.10.4 server known to work or not 
work?  We're having changelog issues.

Background:

We have an ldap service consisting of 3 masters, 2 hubs and 16 slaves.  All 
were running 1.2.8.3 since last summer with no issues.  This summer, we decided 
to bring them all up to the latest stable release, 1.2.10.4.  We can't afford a 
lot of downtime for the service as a whole, but with the redundancy level we 
have, we can take down a machine or two at a time without user impact.

We started with one slave, did a clean install of 1.2.10.4 on it, set up 
replication agreements from our 1.2.8.3 hubs to it and watched it for a week or 
so.  Everything looked fine, so we started rolling through the rest of the 
slave servers, got them all running 1.2.10.4 and so far haven't seen any 
problems.

A couple of days ago, I did one of our two hubs.  The first time I bring up the 
daemon after doing the initial import of our ldap data everything seems fine.  
However, we start seeing errors the first time we restart:

[11/Jul/2012:10:43:58 -0400] - slapd shutting down - signaling operation threads
[11/Jul/2012:10:43:58 -0400] - slapd shutting down - waiting for 2 threads to 
terminate
[11/Jul/2012:10:44:01 -0400] - slapd shutting down - closing down internal 
subsystems and plugins
[11/Jul/2012:10:44:02 -0400] - Waiting for 4 database threads to stop
[11/Jul/2012:10:44:04 -0400] - All database threads now stopped
[11/Jul/2012:10:44:04 -0400] - slapd stopped.
[11/Jul/2012:10:45:00 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up
[11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max 
CSN [4ffdca7e000000330000] from RUV [changelog max RUV] is larger than the max 
CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 
4ffb602b000300330000 4ffdca7e000000330000]
[11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - 
replica_check_for_data_reload: Warning: data for replica 
ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu does not 
match the data in the changelog. Recreating the changelog file. This could 
affect replication with replica's consumers in which case the consumers should 
be reinitialized.
[11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max 
CSN [4ffdca70000000340000] from RUV [changelog max RUV] is larger than the max 
CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 
4ffb6ea2000000340000 4ffdca70000000340000]
[11/Jul/2012:10:45:07 -0400] NSMMReplicationPlugin - 
replica_check_for_data_reload: Warning: data for replica 
ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. 
Recreating the changelog file. This could affect replication with replica's 
consumers in which case the consumers should be reinitialized.
[11/Jul/2012:10:45:08 -0400] - slapd started.  Listening on All Interfaces port 
389 for LDAP requests
[11/Jul/2012:10:45:08 -0400] - Listening on All Interfaces port 636 for LDAPS 
requests

The _second_ restart is even worse, we get more error messages (see below) and 
then the daemon dies after it says it's listening on it's ports:

[11/Jul/2012:10:45:32 -0400] - slapd shutting down - signaling operation threads
[11/Jul/2012:10:45:32 -0400] - slapd shutting down - waiting for 29 threads to 
terminate
[11/Jul/2012:10:45:34 -0400] - slapd shutting down - closing down internal 
subsystems and plugins
[11/Jul/2012:10:45:35 -0400] - Waiting for 4 database threads to stop
[11/Jul/2012:10:45:36 -0400] - All database threads now stopped
[11/Jul/2012:10:45:36 -0400] - slapd stopped.
[11/Jul/2012:10:46:11 -0400] - 389-Directory/1.2.10.4 B2012.101.2023 starting up
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV 
[changelog max RUV] does not contain element [{replica 68 
ldap://gtedm3.iam.gatech.edu:389} 4be339e6000000440000 4ffdc9a1000000440000] 
which is present in RUV [database RUV]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV 
[changelog max RUV] does not contain element [{replica 71 
ldap://gtedm4.iam.gatech.edu:389} 4be6031e000000470000 4ffdc9a8000000470000] 
which is present in RUV [database RUV]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max 
CSN [4ffb62a2000100330000] from RUV [changelog max RUV] is larger than the max 
CSN [4ffb605d000000330000] from RUV [database RUV] for element [{replica 51} 
4ffb605d000000330000 4ffb62a2000100330000]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - 
replica_check_for_data_reload: Warning: data for replica 
ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu does not 
match the data in the changelog. Recreating the changelog file. This could 
affect replication with replica's consumers in which case the consumers should 
be reinitialized.
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV 
[changelog max RUV] does not contain element [{replica 69 
ldap://gtedm3.iam.gatech.edu:389} 4be339e4000000450000 4ffdc9a2000000450000] 
which is present in RUV [database RUV]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: RUV 
[changelog max RUV] does not contain element [{replica 72 
ldap://gtedm4.iam.gatech.edu:389} 4be6031d000000480000 4ffdc9a9000300480000] 
which is present in RUV [database RUV]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - ruv_compare_ruv: the max 
CSN [4ffb78bc000000340000] from RUV [changelog max RUV] is larger than the max 
CSN [4ffb7098000100340000] from RUV [database RUV] for element [{replica 52} 
4ffb7098000100340000 4ffb78bc000000340000]
[11/Jul/2012:10:46:11 -0400] NSMMReplicationPlugin - 
replica_check_for_data_reload: Warning: data for replica 
ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog. 
Recreating the changelog file. This could affect replication with replica's 
consumers in which case the consumers should be reinitialized.
[11/Jul/2012:10:46:11 -0400] - slapd started.  Listening on All Interfaces port 
389 for LDAP requests
[11/Jul/2012:10:46:11 -0400] - Listening on All Interfaces port 636 for LDAPS 
requests

At this point, the only way I've found to get it back is to clean out the 
changelog and db directories and re-import the ldap data from scratch.  
Essentially we can't restart without having to re-import.  I've done this a 
couple of times already and it's entirely reproducible.

I've checked and ensured that there's no obsolete masters that need to be 
CLEANRUVed.  I've also noticed that the errors _seem_ to be only affecting our 
second and third suffix.  We have three suffixes defined, but I haven't seen 
any error messages for the first one.

Has anyone seen anything like this?  We're not sure if this is a general 
1.2.10.4 issue or if it only occurs if when replicating from 1.2.8.3 to 
1.2.10.4.  If it's the former, we cannot proceed with getting the rest of the 
servers up to 1.2.10.4.  If it's the latter, then we need to expedite getting 
everything up to 1.2.10.4.
--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users

Reply via email to