Martin Kosek wrote:
On Thu, 2012-09-06 at 17:22 -0400, Rob Crittenden wrote:
Martin Kosek wrote:
On 08/31/2012 07:40 PM, Rob Crittenden wrote:
Rob Crittenden wrote:
It was possible use ipa-replica-manage connect/disconnect/del to end up
orphaning or or more IPA masters. This is an attempt to catch and
prevent that case.

I tested with this topology, trying to delete B.

A <-> B <-> C

I got here by creating B and C from A, connecting B to C then deleting
the link from A to B, so it went from A -> B and A -> C to the above.

What I do is look up the servers that the delete candidate host has
connections to and see if we're the last link.

I added an escape clause if there are only two masters.

rob

Oh, this relies on my cleanruv patch 1031.

rob


1) When I run ipa-replica-manage del --force to an already uninstalled host,
the new code will prevent me the deletation because it cannot connect to it. It
also crashes with UnboundLocalError:

# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force

Unable to connect to replica vm-055.idm.lab.bos.redhat.com, forcing removal
Traceback (most recent call last):
    File "/sbin/ipa-replica-manage", line 708, in <module>
      main()
    File "/sbin/ipa-replica-manage", line 677, in main
      del_master(realm, args[1], options)
    File "/sbin/ipa-replica-manage", line 476, in del_master
      sys.exit("Failed read master data from '%s': %s" % (delrepl.hostname, 
str(e)))
UnboundLocalError: local variable 'delrepl' referenced before assignment

Fixed.



I also hit this error when removing a winsync replica.

Fixed.



2) As I wrote before, I think having --force option override the user inquiries
would benefit test automation:

+            if not ipautil.user_input("Continue to delete?", False):
+                sys.exit("Aborted")

Fixed.



3) I don't think this code won't cover this topology:

A - B - C - D - E

It would allow you deleting a replica C even though it would separate A-B and
D-E. Though we may not want to cover this situation now, what you got is
definitely helping.

I think you may be right. I only tested with 4 servers. With this B and
D would both still have 2 agreements so wouldn't be covered by the last
link test.

Everything looks good now, so ACK. We just need to push it along with
CLEANALLRUV patch.

Martin


Not to look a gift ACK In the mouth but here is a revised patch. I've added a cleanup routine to remove an orphaned master properly. If you had tried the mechanism I outlined in the man page it would have worked but was less-than-complete. This way is better, just don't try it on a live master.

I also added a cleanruv abort command, in case you want to kill an existing task.

rob

>From a8718439ac1f51d857a03ff27776850324c43cda Mon Sep 17 00:00:00 2001
From: Rob Crittenden <rcrit...@redhat.com>
Date: Fri, 31 Aug 2012 11:56:58 -0400
Subject: [PATCH] When deleting a master, try to prevent orphaning other
 servers.

If you have a replication topology like A <-> B <-> C and you try
to delete server B that will leave A and C orphaned. It may also
prevent re-installation of a new master on B because the cn=masters
entry for it probably still exists on at least one of the other masters.

Check on each master that it connects to to ensure that it isn't the
last link, and fail if it is. If any of the masters are not up then
warn that this could be a bad thing but let the user continue if
they want.

Add a new option to the del command, --cleanup, which runs the
replica_cleanup() routine to completely clean up references to a master.

Add a command to run the abort-clean-ruv task.

https://fedorahosted.org/freeipa/ticket/2797
---
 install/tools/ipa-replica-manage       | 115 ++++++++++++++++++++++++++++++++-
 install/tools/man/ipa-replica-manage.1 |  17 +++++
 ipaserver/install/replication.py       |  22 +++++++
 3 files changed, 153 insertions(+), 1 deletion(-)

diff --git a/install/tools/ipa-replica-manage b/install/tools/ipa-replica-manage
index c6ef51b7215164c9538afae942e3d42285ca860b..60d8ecbc22bfde66042b9ff4ea1068ad79566ed1 100755
--- a/install/tools/ipa-replica-manage
+++ b/install/tools/ipa-replica-manage
@@ -49,6 +49,7 @@ commands = {
     "re-initialize":(0, 0, "", ""),
     "force-sync":(0, 0, "", ""),
     "clean-ruv":(1, 1, "Replica ID of to clean", ""),
+    "abort-clean-ruv":(1, 1, "Replica ID of to abort cleaning", ""),
 }
 
 def convert_error(exc):
@@ -70,6 +71,8 @@ def parse_options():
                       help="provide additional information")
     parser.add_option("-f", "--force", dest="force", action="store_true", default=False,
                       help="ignore some types of errors")
+    parser.add_option("-c", "--cleanup", dest="cleanup", action="store_true", default=False,
+                      help="DANGER: clean up references to a ghost master")
     parser.add_option("--binddn", dest="binddn", default=None, type="dn",
                       help="Bind DN to use with remote server")
     parser.add_option("--bindpw", dest="bindpw", default=None,
@@ -398,9 +401,80 @@ def clean_ruv(realm, ruv, options):
     thisrepl.cleanallruv(ruv)
     print "Cleanup task created"
 
+def abort_clean_ruv(realm, ruv, options):
+    """
+    Given an RID abort a CLEANALLRUV task.
+    """
+    try:
+        ruv = int(ruv)
+    except ValueError:
+        sys.exit("Replica ID must be an integer: %s" % ruv)
+
+    servers = get_ruv(realm, options.host, options.dirman_passwd)
+    found = False
+    for (netloc, rid) in servers:
+        if ruv == int(rid):
+           found = True
+           hostname = netloc
+           break
+
+    if not found:
+        sys.exit("Replica ID %s not found" % ruv)
+
+    print "Aborting the clean Replication Update Vector task for %s" % hostname
+    print
+    thisrepl = replication.ReplicationManager(realm, options.host,
+                                              options.dirman_passwd)
+    thisrepl.abortcleanallruv(ruv)
+
+    print "Cleanup task stopped"
+
+def check_last_link(delrepl, realm, dirman_passwd, force):
+    """
+    We don't want to orphan a server when deleting another one. If you have
+    A topology that looks like this:
+
+             A     B
+             |     |
+             |     |
+             |     |
+             C---- D
+
+    If we try to delete host D it will orphan host B.
+
+    What we need to do is if the master being deleted has only a single
+    agreement, connect to that master and make sure it has agreements with
+    more than just this master.
+
+    @delrepl: a ReplicationManager object of the master being deleted
+
+    returns: hostname of orphaned server or None
+    """
+    replica_names = delrepl.find_ipa_replication_agreements()
+
+    orphaned = []
+    # Connect to each remote server and see what agreements it has
+    for replica in replica_names:
+        try:
+            repl = replication.ReplicationManager(realm, replica, dirman_passwd)
+        except ldap.SERVER_DOWN, e:
+            print "Unable to validate that '%s' will not be orphaned." % replica
+            if not force and not ipautil.user_input("Continue to delete?", False):
+                sys.exit("Aborted")
+            continue
+        names = repl.find_ipa_replication_agreements()
+        if len(names) == 1 and names[0] == delrepl.hostname:
+            orphaned.append(replica)
+
+    if len(orphaned):
+        return ', '.join(orphaned)
+    else:
+        return None
+
 def del_master(realm, hostname, options):
 
     force_del = False
+    delrepl = None
 
     # 1. Connect to the local server
     try:
@@ -413,7 +487,21 @@ def del_master(realm, hostname, options):
     # 2. Ensure we have an agreement with the master
     agreement = thisrepl.get_replication_agreement(hostname)
     if agreement is None:
-        sys.exit("'%s' has no replication agreement for '%s'" % (options.host, hostname))
+        if options.cleanup:
+            """
+            We have no agreement with the current master, so this is a
+            candidate for cleanup. This is VERY dangerous to do because it
+            removes that master from the list of masters. If the master
+            were to try to come back online it wouldn't work at all.
+            """
+            print "Cleaning a master is irreversible."
+            print "This should not normally be require, so use cautiously."
+            if not ipautil.user_input("Continue to clean master?", False):
+                sys.exit("Cleanup aborted")
+            thisrepl.replica_cleanup(hostname, realm, force=True)
+            sys.exit(0)
+        else:
+            sys.exit("'%s' has no replication agreement for '%s'" % (options.host, hostname))
 
     # 3. If an IPA agreement connect to the master to be removed.
     repltype = thisrepl.get_agreement_type(hostname)
@@ -451,6 +539,29 @@ def del_master(realm, hostname, options):
         if not ipautil.user_input("Continue to delete?", False):
             sys.exit("Deletion aborted")
 
+    # Check for orphans if the remote server is up.
+    if delrepl and not winsync:
+        masters_dn = DN(('cn', 'masters'), ('cn', 'ipa'), ('cn', 'etc'), ipautil.realm_to_suffix(realm))
+        try:
+            masters = delrepl.conn.getList(masters_dn, ldap.SCOPE_ONELEVEL)
+        except Exception, e:
+            masters = []
+            print "Failed to read masters data from '%s': %s" % (delrepl.hostname, convert_error(e))
+            print "Skipping calculation to determine if one or more masters would be orphaned."
+            if not options.force:
+                sys.exit(1)
+
+        # This only applies if we have more than 2 IPA servers, otherwise
+        # there is no chance of an orphan.
+        if len(masters) > 2:
+            orphaned_server = check_last_link(delrepl, realm, options.dirman_passwd, options.force)
+            if orphaned_server is not None:
+                print "Deleting this server will orphan '%s'. " % orphaned_server
+                print "You will need to reconfigure your replication topology to delete this server."
+                sys.exit(1)
+    else:
+        print "Skipping calculation to determine if one or more masters would be orphaned."
+
     # 4. Remove each agreement
     for r in replica_names:
         try:
@@ -670,6 +781,8 @@ def main():
         del_link(realm, replica1, replica2, dirman_passwd, clean_ruv=False)
     elif args[0] == "clean-ruv":
         clean_ruv(realm, args[1], options)
+    elif args[0] == "abort-clean-ruv":
+        abort_clean_ruv(realm, args[1], options)
 
 try:
     main()
diff --git a/install/tools/man/ipa-replica-manage.1 b/install/tools/man/ipa-replica-manage.1
index 4a1c489f33591ff6ac98fe7f9a16ebb6a52ee28a..a614ae391c4fef20ffe9c38c5d797b9b2493fd8a 100644
--- a/install/tools/man/ipa-replica-manage.1
+++ b/install/tools/man/ipa-replica-manage.1
@@ -42,12 +42,18 @@ Manages the replication agreements of an IPA server.
 \fBforce\-sync\fR
 \- Immediately flush any data to be replicated from a server specified with the \-\-from option
 .TP
+\fBcleanup\fR
+\- Remove leftover references to a deleted master.
+.TP
 \fBlist\-ruv\fR
 \- List the replication IDs on this server.
 .TP
 \fBclean\-ruv\fR [REPLICATION_ID]
 \- Run the CLEANALLRUV task to remove a replication ID.
 .TP
+\fBabort\-clean\-ruv\fR [REPLICATION_ID]
+\- Abort a running CLEANALLRUV task.
+.TP
 The connect and disconnect options are used to manage the replication topology. When a replica is created it is only connected with the master that created it. The connect option may be used to connect it to other existing replicas.
 .TP
 The disconnect option cannot be used to remove the last link of a replica. To remove a replica from the topology use the del option.
@@ -59,6 +65,14 @@ Each IPA master server has a unique replication ID. This ID is used by 389\-ds\-
 When a master is removed, all other masters need to remove its replication ID from the list of masters. Normally this occurs automatically when a master is deleted with ipa\-replica\-manage. If one or more masters was down or unreachable when ipa\-replica\-manage was executed then this replica ID may still exist. The clean\-ruv command may be used to clean up an unused replication ID.
 .TP
 \fBNOTE\fR: clean\-ruv is \fBVERY DANGEROUS\fR. Execution against the wrong replication ID can result in inconsistent data on that master. The master should be re\-initialized from another if this happens.
+.TP
+The replication topology is examined when a master is deleted and will attempt to prevent a master from being orphaned. For example, if your topology is A <\-> B <\-> C and you attempt to delete master B it will fail because that would leave masters and A and C orphaned.
+.TP
+The list of masters is stored in cn=masters,cn=ipa,cn=etc,dc=example,dc=com. This should be cleaned up automatically when a master is deleted. If it occurs that you have deleted the master and all the agreements but these entries still exist then you will not be able to re\-install IPA on it, the installation will fail with:
+.TP
+An IPA master host cannot be deleted or disabled using standard commands (host\-del, for example).
+.TP
+An orphaned master may be cleaned up using the del directive with the \-\-cleanup option. This will remove the entries from cn=masters,cn=ipa,cn=etc that otherwise prevent host\-del from working, its dna profile, s4u2proxy configuration, service principals and remove it from the default DUA profile defaultServerList.
 .SH "OPTIONS"
 .TP
 \fB\-H\fR \fIHOST\fR, \fB\-\-host\fR=\fIHOST\fR
@@ -129,6 +143,9 @@ List the replication IDs in use:
  # ipa\-replica\-manage list\-ruv
  srv1.example.com:389: 7
  srv2.example.com:389: 4
+.TP
+Remove references to an orphaned and deleted master:
+ # ipa\-replica\-manage del \-\-force \-\-cleanup master.example.com
 .SH "WINSYNC"
 Creating a Windows AD Synchronization agreement is similar to creating an IPA replication agreement, there are just a couple of extra steps.
 
diff --git a/ipaserver/install/replication.py b/ipaserver/install/replication.py
index 552d61841ac20d8ee077a0764ec2443e92ba8c57..bd111049560f4943256201d3a10d67f57af6054d 100644
--- a/ipaserver/install/replication.py
+++ b/ipaserver/install/replication.py
@@ -1140,5 +1140,27 @@ class ReplicationManager(object):
             print "Background task created to clean replication data. This may take a while."
 
         print "This may be safely interrupted with Ctrl+C"
+        self.conn.checkTask(dn, dowait=True)
+
+    def abortcleanallruv(self, replicaId):
+        """
+        Create a task to abort a CLEANALLRUV operation.
+        """
+        root_logger.debug("Creating task to abort a CLEANALLRUV operation for replica id %d" % replicaId)
+
+        dn = DN(('cn', 'abort %d' % replicaId), ('cn', 'abort cleanallruv'),('cn', 'tasks'), ('cn', 'config'))
+        e = ipaldap.Entry(dn)
+        e.setValues('objectclass', ['top', 'extensibleObject'])
+        e.setValue('replica-base-dn', api.env.basedn)
+        e.setValue('replica-id', replicaId)
+        e.setValue('cn', 'abort %d' % replicaId)
+        try:
+            self.conn.addEntry(e)
+        except errors.DuplicateEntry:
+            print "An abort CLEANALLRUV task for replica id %d already exists." % replicaId
+        else:
+            print "Background task created. This may take a while."
+
+        print "This may be safely interrupted with Ctrl+C"
 
         self.conn.checkTask(dn, dowait=True)
-- 
1.7.11.4

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Reply via email to