On 09/06/2012 12:13 PM, Rich Megginson wrote:
Yeah, sorry not that familiar with winsync(didn't know it used the same repl agmts). I will have to do another fix...On 09/06/2012 10:09 AM, Martin Kosek wrote:On 09/06/2012 06:09 PM, Martin Kosek wrote:On 09/06/2012 06:05 PM, Martin Kosek wrote:On 09/06/2012 05:55 PM, Rob Crittenden wrote:Just one last thing (otherwise the patch is OK) - I don't think this is what weRob Crittenden wrote:Apparently there were two unsaved changes, one of which was lost. This adds inRob Crittenden wrote:Slightly revised patch. I still had a window open with one unsaved change.Martin Kosek wrote:No, it is there because it is safe to break out of it. The task willOn 09/05/2012 08:06 PM, Rob Crittenden wrote:0) As I wrote in a review for your patch 1041, changelog entry slippedRob Crittenden wrote:Martin Kosek wrote:Updated patch using new mechanism in 389-ds-base. This should more thoroughly clean out RUV data when a replica is being deleted, and provide for a way to delete RUV data afterwards too if necessary.On 07/05/2012 08:39 PM, Rob Crittenden wrote:Thanks, almost there! I just found one more issue which needs to beMartin Kosek wrote:On 07/03/2012 04:41 PM, Rob Crittenden wrote:Deleting a replica can leave a replication vector (RUV) on theGood work there, this should make cleaning RUVs much easier thanother servers.This can confuse things if the replica is re-added, and it alsocauses theserver to calculate changes against a server that may no longerexist.389-ds-base provides a new task that self-propogates itself to allavailable replicas to clean this RUV data.This patch will create this task at deletion time to hopefullyclean things up.It isn't perfect. If any replica is down or unavailable at thetime thecleanruv task fires, and then comes back up, the old RUV datamay be re-propogated around.To make things easier in this case I've added two new commands to ipa-replica-manage. The first lists the replication ids of all theservers we have a RUV for. Using this you can call clean_ruv with the replication id of aserver that no longer exists to try the cleanallruv step again.This is quite dangerous though. If you run cleanruv against areplica id thatdoes exist it can cause a loss of data. I believe I've put inenough scary warnings about this. robwith the previous version. This is what I found during review:1) list_ruv and clean_ruv command help in man is quite lost. Ithink it wouldhelp if we for example have all info for commands indented. Thisway user could simply over-look the new commands in the man page.2) I would rename new commands to clean-ruv and list-ruv to makethem consistent with the rest of the commands (re-initialize, force-sync). 3) It would be nice to be able to run clean_ruv command in an unattended way(for better testing), i.e. respect --force option as we alreadydo foripa-replica-manage del. This fix would aid test automation in thefuture.4) (minor) The new question (and the del too) does not react toowell for CTRL+D: # ipa-replica-manage clean_ruv 3 --force Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389 Cleaning the wrong replica ID will cause that server to no longer replicate so it may miss updates while the process is running. It would need to be re-initialized to maintain consistency. Be very careful. Continue to clean? [no]: unexpected error:5) Help for clean_ruv command without a required parameter is quiteconfusing as it reports that command is wrong and not the parameter: # ipa-replica-manage clean_ruv Usage: ipa-replica-manage [options]ipa-replica-manage: error: must provide a command [clean_ruv |force-sync | disconnect | connect | del | re-initialize | list | list_ruv] It seems you just forgot to specify the error message in the command definition6) When the remote replica is down, the clean_ruv command failswith an unexpected error: [root@vm-086 ~]# ipa-replica-manage clean_ruv 5 Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389 Cleaning the wrong replica ID will cause that server to no longer replicate so it may miss updates while the process is running. It would need to be re-initialized to maintain consistency. Be very careful. Continue to clean? [no]: y unexpected error: {'desc': 'Operations error'} /var/log/dirsrv/slapd-IDM-LAB-BOS-REDHAT-COM/errors: [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - cleanAllRUV_task: failed to connect to repl agreement connection (cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mappingtree,cn=config), error 105 [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - cleanAllRUV_task: replica (cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mappingtree, cn=config) has not been cleaned. You will need to rerunthe CLEANALLRUV task on this replica. [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - cleanAllRUV_task: Task failed (1)In this case I think we should inform user that the command failed,possiblybecause of disconnected replicas and that they could enable thereplicas and try again. 7) (minor) "pass" is now redundant in replication.py: + except ldap.INSUFFICIENT_ACCESS:+ # We can't make the server we're removing read-onlybut + # this isn't a show-stopper+ root_logger.debug("No permission to switch replica toread-only, continuing anyway") + passI think this addresses everything. robfixed before we push: # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force Directory Manager password:Unable to connect to replica vm-055.idm.lab.bos.redhat.com, forcingremovalFailed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc':"Can't contact LDAP server"} Forcing removal on 'vm-086.idm.lab.bos.redhat.com' There were issues removing a connection: %d format: a number is required, not strFailed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc':"Can't contact LDAP server"} This is a traceback I retrieved: Traceback (most recent call last): File "/sbin/ipa-replica-manage", line 425, in del_masterdel_link(realm, r, hostname, options.dirman_passwd, force=True)File "/sbin/ipa-replica-manage", line 271, in del_link repl1.cleanallruv(replica_id) File"/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",line 1094, in cleanallruvroot_logger.debug("Creating CLEANALLRUV task for replica id%d" % replicaId)The problem here is that you don't convert replica_id to int in thispart: + replica_id = None + if repl2: + replica_id = repl2._get_replica_id(repl2.conn, None) + else: + servers = get_ruv(realm, replica1, dirman_passwd) + for (netloc, rid) in servers: + if netloc.startswith(replica2): + replica_id = rid + break MartinrobRebased patch robelsewhere.1) The following KeyboardInterrupt except class looks suspicious. Iknow whyyou have it there, but since it is generally a bad thing to do, somecomment why it is needed would be useful. @@ -256,6 +263,17 @@ def del_link(realm, replica1, replica2, dirman_passwd, force=False): repl1.delete_agreement(replica2) repl1.delete_referral(replica2) + if type1 == replication.IPA_REPLICA: + if repl2: + ruv = repl2._get_replica_id(repl2.conn, None) + else: + ruv = get_ruv_by_host(realm, replica1, replica2, dirman_passwd) + + try: + repl1.cleanallruv(ruv) + except KeyboardInterrupt: + pass + Maybe you just wanted to do some cleanup and then "raise" again?continue to run. I added some verbiage.2) This is related to 1), but when some replica is down, "ipa-replica-managedel" may wait indefinitely when some remote replica is down, right?# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com Deleting a master is irreversible. To reconnect to the remote master you will need to prepare a new replica file and re-install. Continue to delete? [no]: y ipa: INFO: Setting agreementcn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mappingtree,cn=config schedule to 2358-2359 0 to force synch ipa: INFO: Deleting schedule 2358-2359 0 from agreementcn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mappingtree,cn=configipa: INFO: Replication Update in progress: FALSE: status: 0 Replicaacquired successfully: Incremental update succeeded: start: 0: end: 0 Background task created to clean replication data ... after about a minute I hit CTRL+C^CDeleted replication agreement from 'vm-086.idm.lab.bos.redhat.com' to'vm-055.idm.lab.bos.redhat.com'Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: NS recorddoes not contain 'vm-055.idm.lab.bos.redhat.com.' You may need to manually remove them from the treeI think it would be better to inform user that some remote replica isdown orat least that we are waiting for the task to complete. Something likethat: # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com ... Background task created to clean replication dataReplication data clean up may take very long time if some replica isunreachable Hit CTRL+C to interrupt the wait ^C Clean up wait interrupted .... [continue with del]Yup, did this in #1.3) (minor) When there is a cleanruv task running and you run "ipa-replica-manage del", there is a unexpected error message with duplicate task object in LDAP: # ipa-replica-manage del vm-072.idm.lab.bos.redhat.com --forceUnable to connect to replica vm-072.idm.lab.bos.redhat.com, forcingremoval FAILFailed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': "Can'tcontact LDAP server"} Forcing removal on 'vm-086.idm.lab.bos.redhat.com' There were issues removing a connection: This entry already exists <<<<<<<<<Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': "Can'tcontact LDAP server"}Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS recorddoes not contain 'vm-072.idm.lab.bos.redhat.com.' You may need to manually remove them from the treeI think it should be enough to just catch for "entry already exists" in cleanallruv function, and in such case print a relevant error messagebail out.Thus, self.conn.checkTask(dn, dowait=True) would not be called too.Good catch, fixed.Yeah, this is one of my common mistakes. I put in a pass initially, then add logging in front of it and forget to delete the pass. Its gone now.4) (minor): In make_readonly function, there is a redundant "pass" statement: + def make_readonly(self): + """ + Make the current replication agreement read-only. + """ + dn = DN(('cn', 'userRoot'), ('cn', 'ldbm database'), + ('cn', 'plugins'), ('cn', 'config')) + + mod = [(ldap.MOD_REPLACE, 'nsslapd-readonly', 'on')] + try: + self.conn.modify_s(dn, mod) + except ldap.INSUFFICIENT_ACCESS:+ # We can't make the server we're removing read-only but+ # this isn't a show-stopper + root_logger.debug("No permission to switch replica to read-only, continuing anyway") + pass<<<<<<<<<<<<<<<5) In clean_ruv, I think allowing a --force option to bypass the user_input would be helpful (at least for test automation): + if not ipautil.user_input("Continue to clean?", False): + sys.exit("Aborted")Yup, added. robrobthe 'entry already exists' fix. robwant :-) # ipa-replica-manage clean-ruv 8Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389Cleaning the wrong replica ID will cause that server to no longer replicate so it may miss updates while the process is running. It would need to be re-initialized to maintain consistency. Be very careful. Continue to clean? [no]: y<<<<<< Aborted Nor this exception, (your are checking for wrong exception): # ipa-replica-manage clean-ruv 8Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389Cleaning the wrong replica ID will cause that server to no longer replicate so it may miss updates while the process is running. It would need to be re-initialized to maintain consistency. Be very careful. Continue to clean? [no]: unexpected error: This entry already exists This is the exception: Traceback (most recent call last): File "/sbin/ipa-replica-manage", line 651, in<module> main() File "/sbin/ipa-replica-manage", line 648, in main clean_ruv(realm, args[1], options) File "/sbin/ipa-replica-manage", line 373, in clean_ruv thisrepl.cleanallruv(ruv)File "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",line 1136, in cleanallruv self.conn.addEntry(e)File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line 503, inaddEntry self.__handle_errors(e, arg_desc=arg_desc)File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line 321, in__handle_errors raise errors.DuplicateEntry() ipalib.errors.DuplicateEntry: This entry already exists MartinOn another matter, I just noticed that CLEANRUV is not proceeding if I have awinsync replica defined (and it is even up): # ipa-replica-manage list dc.ad.test: winsync<<<<<<< vm-072.idm.lab.bos.redhat.com: master vm-086.idm.lab.bos.redhat.com: master[06/Sep/2012:11:59:10 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Waitingfor all the replicas to receive all the deleted replica updates...[06/Sep/2012:11:59:10 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Failed to contact agmt (agmt="cn=meTodc.ad.test" (dc:389)) error (10), will retry later. [06/Sep/2012:11:59:10 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Not allreplicas caught up, retrying in 10 seconds[06/Sep/2012:11:59:20 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Failed to contact agmt (agmt="cn=meTodc.ad.test" (dc:389)) error (10), will retry later. [06/Sep/2012:11:59:20 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Not allreplicas caught up, retrying in 20 seconds[06/Sep/2012:11:59:40 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Failed to contact agmt (agmt="cn=meTodc.ad.test" (dc:389)) error (10), will retry later. [06/Sep/2012:11:59:40 -0400] NSMMReplicationPlugin - CleanAllRUV Task: Not allreplicas caught up, retrying in 40 seconds I don't think this is OK. Adding Rich to CC to follow on this one. MartinAnd now the actual CC. MartinYeah - looks like CLEANALLRUV needs to ignore windows sync agreements
_______________________________________________ Freeipa-devel mailing list Freeipa-devel@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-devel
-- Mark Reynolds Senior Software Engineer Red Hat, Inc mreyno...@redhat.com _______________________________________________ Freeipa-devel mailing list Freeipa-devel@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-devel