The branch, master has been updated via 1541c50 selftest: Add some tests for linked attribute conflicts via 46c1f7b getncchanges.c: max_links calculation didn't work well in some cases via 3c0d80d replmd: Avoid duplicated debug/warnings via 44ca841 replmd: Allow missing targets if GET_TGT has already been set via 278039f getncchanges.c: Support GET_TGT better with large numbers of links via 7ba1084 getncchanges.c: Refactor to track more state using repl_chunk via 10df9f6 getncchanges.py: Add a multi-valued linked attribute test via 693e3ad getncchanges.py: Add a test for dropped cross-partition links via 9697190 getncchanges.py: Add test for replicating reanimated objects via ed2fc52 drs: Add basic GET_TGT support via 821094d getncchanges.py: Add tests for object deletion during replication via 469aed0 getnc_exop.py: Extend EXOP_REPL_OBJ test case to use GET_TGT via 00b20c8 getncchanges.py: Add test for GET_ANC and GET_TGT combined via 6ec9ef2 getncchanges.py: Add test for adding links during replication via af82bde getncchanges.py: Add some GET_TGT test cases via 172eedc getnc_exop.py: Fix GET_TGT behaviour in DRS tests from af38d73 s4/smbd: set the process group.
https://git.samba.org/?p=samba.git;a=shortlog;h=master - Log ----------------------------------------------------------------- commit 1541c50b370e8695bf5c958cc41073d1552a3c52 Author: Tim Beale <timbe...@catalyst.net.nz> Date: Wed Aug 23 12:45:09 2017 +1200 selftest: Add some tests for linked attribute conflicts Currently we have tests that check we can resolve object conflicts, but these don't test anything related to conflicting linked attributes. This patch adds some basic tests that checks that Samba can resolve conflicting linked attributes. This highlights some problems with Samba, as the following tests currently fail: - test_conflict_single_valued_link: Samba currently can't resolve a conflicting targets for a single-valued linked attribute - the replication exits with an error. - test_link_deletion_conflict: If 2 DCs add the same linked attribute, currently when they resolve this conflict the RMD_VERSION for the linked attribute incorrectly gets incremented. This means the version numbers get out of step and subsequent changes to the linked attribute can be dropped/ignored. - test_full_sync_link_conflict: fails for the same reason as above. Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> Reviewed-by: Garming Sam <garm...@catalyst.net.nz> Autobuild-User(master): Garming Sam <garm...@samba.org> Autobuild-Date(master): Mon Sep 18 09:56:41 CEST 2017 on sn-devel-144 commit 46c1f7bdeee7330ffdf43edada032205d594567f Author: Tim Beale <timbe...@catalyst.net.nz> Date: Tue Aug 15 16:15:14 2017 +1200 getncchanges.c: max_links calculation didn't work well in some cases The max_links calculation didn't work particularly well if max_links was set to a value lower than max_objects. As soon as repl_chunk->object_count exceeded repl_chunk->max_links, the chunk would be deemed full, even if there was only one link to send (or even worse, no links to send). For example, if max_objects=100 and max_links=10, then it would send back chunks of 10 objects (or 9 objects and 1 link). I believe the historic reason this logic exists is to avoid overfilling the response message. It's hard to tell what the appropriate limit would be because the total message size would depend on how many attributes each object has. I couldn't think of logic that would be suitable for all cases. I toyed with the idea of working out a percentage of how full the message is. However, adjusting the max_links doesn't really make sense when the settings are small enough, e.g. max_objects=100 and max_links=100 is never going to overfill the message, so there's no reason to alter the values. In the end I went with: - If the user is using non-default values, just use those. - In the default value case, just use the historic calculation Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> BUG: https://bugzilla.samba.org/show_bug.cgi?id=12972 Reviewed-by: Garming Sam <garm...@catalyst.net.nz> commit 3c0d80d0c36f5c65efa3f8e81c880b6d8c26d30f Author: Tim Beale <timbe...@catalyst.net.nz> Date: Fri Aug 11 15:39:35 2017 +1200 replmd: Avoid duplicated debug/warnings We display warnings if a target object is missing but it's still OK to continue the replication. Currently we need to check the target twice - once to verify it when we first receive it, and once when we actually commit it (we can't skip the 2nd check altogether because in the join case, they could occur quite far apart). One annoying side-effect is we get the same warning message coming out twice in these special cases. In the cases where we're checking the dsdb_repl_flags, we can actually just bypass the verification checks for the target object (if it doesn't exist we still continue anyway). This may save us a tiny bit of unnecessary work. For cross-partition links, we can limit logging these warnings to when the objects are actually being committed. This avoids spurious warnings in the join case (i.e. we receive the link before we receive the target object's partition, but we have received all partitions by the time we actually commit the objects). Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> BUG: https://bugzilla.samba.org/show_bug.cgi?id=12972 Reviewed-by: Garming Sam <garm...@catalyst.net.nz> commit 44ca84166e5a909f1acd77323a444ff2e14a8db8 Author: Tim Beale <timbe...@catalyst.net.nz> Date: Fri Aug 11 13:53:31 2017 +1200 replmd: Allow missing targets if GET_TGT has already been set While running the selftests, I noticed a case where DC replication unexpectedly sends a linked attribute for a deleted object (created in the drs.ridalloc_exop tests). The problem is due to the msDS-NC-Replica-Locations attribute, which is a (known) one-way link. Because it is a one-way link, when the test demotes the DC and deletes the link target, there is no backlink to delete the link from the source object. After much debate and head-scratching, we decided that there wasn't an ideal way to resolve this problem. Any automated intervention could potentially do the wrong thing, especially if the link spans partitions. Running dbcheck will find this problem and is able to fix it (providing the deleted object is still a tombstone). So the recommendation is to run dbcheck on your DCs every 6 months (or more frequently if using a lower tombstone lifetime setting). However, it does highlight a problem with the current GET_TGT implementation. If the tombstone object had been expunged and you upgraded to 4.8, then you would be stuck - replication would fail because the target object can't be resolved, even with GET_TGT, and dbcheck would not be able to fix the hanging link. The solution is to not fail the replication for an unknown target if GET_TGT has already been set (i.e. the dsdb_repl_flags contains DSDB_REPL_FLAG_TARGETS_UPTODATE). It's debatable whether we should add a hanging link in this case or ignore/drop the link. Some cases to consider: - If you're talking to a DC that still sends all the links last, you could still get object deletion between processing the source object's links and sending the target (GET_TGT just restarts the replication cycle from scratch). Adding a hanging link in this case would be incorrect and would add spurious information to the DB. - Suppose there's a bug in Samba that incorrectly results in an object disappearing. If other DCs then remove any links that pointed to that object, it makes recovering from the problem harder. However, simply ignoring the link shouldn't result in data loss, i.e. replication won't remove the existing link information from other DCs. Data loss in this case would only occur if a new DC were brought online, or if it were a new link that was affected. Based on this, I think ignoring the link does the least harm. This problem also highlights that we should really be using the same logic in both the unknown target and the deleted target cases. Combining the logic and moving it into a common replmd_allow_missing_target() function fixes the problem. (This also has the side-effect of fixing another logic flaw - in the deleted object case we would unnecessarily retry with GET_TGT if the target object was in another partition. This is pointless work, because GET_TGT won't resolve the target). Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> BUG: https://bugzilla.samba.org/show_bug.cgi?id=12972 Reviewed-by: Garming Sam <garm...@catalyst.net.nz> commit 278039ff78c97bda143b21318a59edbd18fa3825 Author: Tim Beale <timbe...@catalyst.net.nz> Date: Fri Aug 11 16:08:15 2017 +1200 getncchanges.c: Support GET_TGT better with large numbers of links A source object can potentially link to thousands of target objects. We have to be careful not to overfill the GetNCChanges response message with more data than it's possible to send. We also don't want the client to timeout while we're busy checking the linked attributes. The GET_TGT support added so far is fairly dumb - this patch extends it to better handle larger numbers of links. To do so, this extends the repl_chunk usage so that it also works out if the current chunk is full of links. Now as soon as the chunk is full of either links or objects, we stop and send it back. These changes now mean that we need to also check: - that all the links for the last source object in the previous chunk have been sent, before we move on and send the next object. This only takes effect when immediate_link_sync is configured. It also means that a chunk in the middle of the replication cycle can now contain only links, and no objects. - when GET_TGT is used, we only send back the links that we've verified the target object for. i.e. if we stop checking links because we timed out, we only send back the links whose targets were checked. Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> Reviewed-by: Garming Sam <garm...@catalyst.net.nz> commit 7ba10844d1b4c8c1d08c62353150dc03ee166150 Author: Garming Sam <garm...@catalyst.net.nz> Date: Tue Aug 8 16:27:18 2017 +1200 getncchanges.c: Refactor to track more state using repl_chunk To prepare GET_TGT to deal with a large number of links better, there is now a 'repl_chunk' struct to help keep track of all the factors relating to the current chunk of replication data (i.e. how many objects/links we can send and how many we've already processed). This means we can have a consistent way of working out whether the current chunk is full (whether that be due to objects, links, or just too much time taken). This patch should not alter functionality. This is just a refactor to add the basic framework, which will be used in the next patch. Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Signed-off-by: Garming Sam <garm...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 10df9f6bfd04af08d095b6796f902888f1172b9f Author: Tim Beale <timbe...@catalyst.net.nz> Date: Tue Aug 15 12:18:02 2017 +1200 getncchanges.py: Add a multi-valued linked attribute test Add a test where a source object links to multiple different targets. First we do the replication without GET_TGT and check that the server can handle sending a chunk containing only links (in the middle of the replication). Then we repeat the replication forcing GET_TGT to be used. To avoid having to create 1500 objects/links, I've lowered the 'max link sync' setting on the vampire_dc testenv to 250. Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> Reviewed-by: Garming Sam <garm...@catalyst.net.nz> commit 693e3adc1aebdf75d6d9f9866ec8f39ea9d5557b Author: Tim Beale <timbe...@catalyst.net.nz> Date: Mon Jul 24 14:43:54 2017 +1200 getncchanges.py: Add a test for dropped cross-partition links Samba would drop linked attributes that span partitions if it didn't know about the target object. This patch adds a test that exposes the problem. I've re-used the code from the previous re-animation test to do this. I've also added a very basic DcConnection helper class that basically stores the connection state information the drs_base.py uses for replication. This allows us to switch the DC we want to replicate from easily. This approach could potentially be retro-fitted to some of the existing test cases, as it allows us to test both the DRS client code and server code at the same time. Note this test case relates to the code change for commit fae5df891c11f642cb. Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> BUG: https://bugzilla.samba.org/show_bug.cgi?id=12972 Reviewed-by: Garming Sam <garm...@catalyst.net.nz> commit 96971906009cf425b2757725eb627769d1917509 Author: Tim Beale <timbe...@catalyst.net.nz> Date: Thu Jul 20 17:06:14 2017 +1200 getncchanges.py: Add test for replicating reanimated objects Reading between the lines, this scenario seems to be the main reason that Microsoft added the GET_TGT flag. MS AD can handle getting links for unknown targets OK, but if it receives links for a deleted/recycled target then it would tend to drop the received links. Samba client also used to drop the links if talking to a Microsoft DC (or a Samba server with GET_TGT support). The specific scenario is the client side already knows about a deleted object. That object is then re-animated and used as the target for a linked attribute. *Then* the target object gets updated again so it gets sent in a later replication chunk to the linked attribute, i.e. the client receives the link before it learns that the target object has been re-animated. In this test we're interested in particular at how the client behaves when it receives a linked attribute for a deleted object. (It *should* retry with GET_TGT to make sure the target is up-to-date. However, it was just dropping the linked attribute). To exercise the client-side, we disable replication, setup the links/objects on one DC the way we want them, then force a replication to the second DC. We then check that when we query each DC, they both tell us about the links/objects we're expecting (i.e. no links got lost). Note that this wasn't a problem with older versions of Samba-to-Samba because sending the links last guaranteed that the target objects were always up-to-date. Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> Reviewed-by: Garming Sam <garm...@catalyst.net.nz> commit ed2fc522439349832ad2a8a56abbf63e56011fb9 Author: Tim Beale <timbe...@catalyst.net.nz> Date: Wed Aug 23 10:23:10 2017 +1200 drs: Add basic GET_TGT support This adds basic DRS_GET_TGT support. If the GET_TGT flag is specified then the server will use the object cache to store the objects it sends back. If the target object for a linked attribute is not in the cache (i.e. it has not been sent already), then it is added to the response message. Note that large numbers of linked attributes will not be handled well yet - the server could potentially try to send more than will fit in a single repsonse message. Also note that the client can sometimes set the GET_TGT flag even if the server is still sending the links last. In this case, we know the client supports GET_TGT so it's safe to send the links interleaved with the source objects (the alternative of fetching the target objects but not sending the links until last doesn't really make any sense). Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> Reviewed-by: Garming Sam <garm...@catalyst.net.nz> commit 821094d50b0f54700c0ce619e381bbbdb9e21cd0 Author: Tim Beale <timbe...@catalyst.net.nz> Date: Wed Jul 19 11:38:55 2017 +1200 getncchanges.py: Add tests for object deletion during replication Add tests that delete the source and target objects for linked attributes in the middle of a replication cycle. Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> Reviewed-by: Garming Sam <garm...@catalyst.net.nz> commit 469aed088f460cbc1e697b5cd83b6f9739ab7532 Author: Tim Beale <timbe...@catalyst.net.nz> Date: Mon Jul 17 14:04:38 2017 +1200 getnc_exop.py: Extend EXOP_REPL_OBJ test case to use GET_TGT We already check that when we use GET_ANC that we still only receive a single object when EXOP_REPL_OBJ is used. This extends the test to also check that only a single object is returned when GET_TGT is used. Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> Reviewed-by: Garming Sam <garm...@catalyst.net.nz> commit 00b20c825c3961966ba9d77a28dfbc9166790977 Author: Tim Beale <timbe...@catalyst.net.nz> Date: Thu Jul 13 11:47:16 2017 +1200 getncchanges.py: Add test for GET_ANC and GET_TGT combined The code has to handle needing GET_ANC and GET_TGT in combination, i.e. where we fetch the target object for the linked attribute and the target object's parent is unknown as well. This patch adds a test case to exercise this code path. The second part of this test exercises GET_ANC/GET_TGT for an incremental replication, where the objects are getting filtered by an uptodateness-vector/HWM. Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> Reviewed-by: Garming Sam <garm...@catalyst.net.nz> commit 6ec9ef2bebe5f9d587963f29c279b938f098893e Author: Tim Beale <timbe...@catalyst.net.nz> Date: Tue Jun 13 12:14:45 2017 +1200 getncchanges.py: Add test for adding links during replication We have identified a case where the Samba server can send linked attributes but not the target object. In this case, the Samba DRS client would hit the "Failed to re-resolve GUID" case in replmd and silently discard the linked attribute. However, Samba will resend the linked attribute in the next cycle (because its USN is still higher than the committed HWM), so it should recover OK. On older releases, this may have caused problems if the first error resulting in a hanging link (which might mean the second time it's processed it still fails to be added). Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> Reviewed-by: Garming Sam <garm...@catalyst.net.nz> commit af82bdefcc3b7413c9eed81b4aa5937817b2a9ff Author: Tim Beale <timbe...@catalyst.net.nz> Date: Wed Jul 12 14:23:35 2017 +1200 getncchanges.py: Add some GET_TGT test cases test_repl_get_tgt: - Adds 2 sets of objects - Links one set to the other - Changes the order so the target object comes last in the replication (which means the client has to use GET_TGT) - Checks that when GET_TGT is used that we have received all target objects we need to resolve the linked attibutes - Checks that we expect to receive the linked attributes *before* the last chunk is sent (by default, Samba sends all the links at the end, so this fails) - Checks that we eventually receive all expected objects, and all links we receive match what is expected test_repl_get_tgt_chain: This adds the linked attributes in a more complicated chain. We add 300 objects, but the links for 100 objects will point to a linked chain of 200 objects. This was mainly to determine whether or not Windows follows the target object (i.e. whether it sends all the links for the target object as well). It turns out Windows maintains its own linked attribute DB, so it sends the links based on USN. Note that the 2 testenvs fail for different reasons. promoted_dc fails because it is sending all the linked attributes last. vampire_dc fails because it doesn't support GET_TGT yet, so it sends the link before the peer knows about the target object. Note that to test against vampire_dc (rather than the ad_dc_ntvfs DC), we need to send the GetNCChanges requests to DC2 instead of DC1. I've left the DC numbering scheme as is, but I've addeed a test_ldb_dc handle to drs_base.py - it defaults to DC1, but tests can override it easily and still have everything work. While running the new tests through autobuild, I noticed an intermittent LDAP_ENTRY_ALREADY_EXISTS failure in the test setup(). This appears to be due to a timing issue in the background replication between the multiple testenvs. Adding some randomness so that the test base OU is unique seems to avoid the problem. Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> Reviewed-by: Garming Sam <garm...@catalyst.net.nz> commit 172eedc0760b5d471af091c2f08ea70f17b36293 Author: Tim Beale <timbe...@catalyst.net.nz> Date: Tue May 23 14:37:56 2017 +1200 getnc_exop.py: Fix GET_TGT behaviour in DRS tests The existing code never passed the more_flags parameter into the actual getNCChanges request, i.e. _getnc_req10(). This meant the existing GET_TGT tests effectively did nothing. Passing the flag through properly means we have to now change the tests as the DNs returned by Windows now include any target objects in the linked attributes. These tests now fail against Samba (because it doesn't support GET_TGT yet). Also added comments to the tests to help explain what they are actually doing. Note that Samba and Windows can return the objects in different orders, due to significant differences in their underlying DB implementations (Windows stores links in a separate DB, so sends links ordered strictly by USN, whereas Samba sends links based on the USN of the source object). To make the test a fair comparison between Windows and Samba, we need to use dn_ordered=False. Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> Reviewed-by: Garming Sam <garm...@catalyst.net.nz> ----------------------------------------------------------------------- Summary of changes: selftest/knownfail.d/getncchanges | 6 + selftest/knownfail.d/link_conflicts | 9 + selftest/target/Samba4.pm | 3 +- source4/dsdb/common/util.c | 40 ++ source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 197 +++--- source4/rpc_server/drsuapi/getncchanges.c | 513 +++++++++++++--- source4/selftest/tests.py | 5 + source4/torture/drs/python/drs_base.py | 33 +- source4/torture/drs/python/getnc_exop.py | 87 ++- source4/torture/drs/python/getncchanges.py | 775 ++++++++++++++++++++++-- source4/torture/drs/python/link_conflicts.py | 498 +++++++++++++++ 11 files changed, 1931 insertions(+), 235 deletions(-) create mode 100644 selftest/knownfail.d/getncchanges create mode 100644 selftest/knownfail.d/link_conflicts create mode 100644 source4/torture/drs/python/link_conflicts.py Changeset truncated at 500 lines: diff --git a/selftest/knownfail.d/getncchanges b/selftest/knownfail.d/getncchanges new file mode 100644 index 0000000..5ef1bc9 --- /dev/null +++ b/selftest/knownfail.d/getncchanges @@ -0,0 +1,6 @@ +# GET_TGT tests currently only work for testenvs that send the links at the +# same time as the source objects. Currently this is only the vampire_dc +samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt\(promoted_dc\) +samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt_chain\(promoted_dc\) +samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt_and_anc\(promoted_dc\) +samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt_multivalued_links\(promoted_dc\) diff --git a/selftest/knownfail.d/link_conflicts b/selftest/knownfail.d/link_conflicts new file mode 100644 index 0000000..1c41335 --- /dev/null +++ b/selftest/knownfail.d/link_conflicts @@ -0,0 +1,9 @@ +# Currently Samba can't resolve a conflict for a single-valued link attribute +samba4.drs.link_conflicts.python\(vampire_dc\).link_conflicts.DrsReplicaLinkConflictTestCase.test_conflict_single_valued_link\(vampire_dc\) +samba4.drs.link_conflicts.python\(promoted_dc\).link_conflicts.DrsReplicaLinkConflictTestCase.test_conflict_single_valued_link\(promoted_dc\) +# There's a bug where Samba can incorrectly increment the attribute's version number +samba4.drs.link_conflicts.python\(vampire_dc\).link_conflicts.DrsReplicaLinkConflictTestCase.test_link_deletion_conflict\(vampire_dc\) +samba4.drs.link_conflicts.python\(promoted_dc\).link_conflicts.DrsReplicaLinkConflictTestCase.test_link_deletion_conflict\(promoted_dc\) +samba4.drs.link_conflicts.python\(vampire_dc\).link_conflicts.DrsReplicaLinkConflictTestCase.test_full_sync_link_conflict\(vampire_dc\) +samba4.drs.link_conflicts.python\(promoted_dc\).link_conflicts.DrsReplicaLinkConflictTestCase.test_full_sync_link_conflict\(promoted_dc\) + diff --git a/selftest/target/Samba4.pm b/selftest/target/Samba4.pm index 3d14885..7930a4e 100755 --- a/selftest/target/Samba4.pm +++ b/selftest/target/Samba4.pm @@ -1293,7 +1293,8 @@ sub provision_vampire_dc($$$) if ($fl == "2000") { $name = "vampire2000dc"; } else { - $extra_conf = "drs: immediate link sync = yes"; + $extra_conf = "drs: immediate link sync = yes + drs: max link sync = 250"; } # We do this so that we don't run the provision. That's the job of 'net vampire'. diff --git a/source4/dsdb/common/util.c b/source4/dsdb/common/util.c index 430620e..49345e5 100644 --- a/source4/dsdb/common/util.c +++ b/source4/dsdb/common/util.c @@ -5555,3 +5555,43 @@ int dsdb_user_obj_set_primary_group_id(struct ldb_context *ldb, struct ldb_messa return LDB_SUCCESS; } + +/** + * Returns True if the source and target DNs both have the same naming context, + * i.e. they're both in the same partition. + */ +bool dsdb_objects_have_same_nc(struct ldb_context *ldb, + TALLOC_CTX *mem_ctx, + struct ldb_dn *source_dn, + struct ldb_dn *target_dn) +{ + TALLOC_CTX *tmp_ctx; + struct ldb_dn *source_nc; + struct ldb_dn *target_nc; + int ret; + bool same_nc = true; + + tmp_ctx = talloc_new(mem_ctx); + + ret = dsdb_find_nc_root(ldb, tmp_ctx, source_dn, &source_nc); + if (ret != LDB_SUCCESS) { + DBG_ERR("Failed to find base DN for source %s\n", + ldb_dn_get_linearized(source_dn)); + talloc_free(tmp_ctx); + return true; + } + + ret = dsdb_find_nc_root(ldb, tmp_ctx, target_dn, &target_nc); + if (ret != LDB_SUCCESS) { + DBG_ERR("Failed to find base DN for target %s\n", + ldb_dn_get_linearized(target_dn)); + talloc_free(tmp_ctx); + return true; + } + + same_nc = (ldb_dn_compare(source_nc, target_nc) == 0); + + talloc_free(tmp_ctx); + + return same_nc; +} diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c index d2c2084..909a1be 100644 --- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c +++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c @@ -6726,43 +6726,96 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct } /** - * Returns True if the source and target DNs both have the same naming context, - * i.e. they're both in the same partition. + * Checks how to handle an missing target - either we need to fail the + * replication and retry with GET_TGT, ignore the link and continue, or try to + * add a partial link to an unknown target. */ -static bool replmd_objects_have_same_nc(struct ldb_context *ldb, - TALLOC_CTX *mem_ctx, - struct ldb_dn *source_dn, - struct ldb_dn *target_dn) +static int replmd_allow_missing_target(struct ldb_module *module, + TALLOC_CTX *mem_ctx, + struct ldb_dn *target_dn, + struct ldb_dn *source_dn, + bool is_obj_commit, + struct GUID *guid, + uint32_t dsdb_repl_flags, + bool *ignore_link, + const char * missing_str) { - TALLOC_CTX *tmp_ctx; - struct ldb_dn *source_nc; - struct ldb_dn *target_nc; - int ret; - bool same_nc = true; + struct ldb_context *ldb = ldb_module_get_ctx(module); + bool is_in_same_nc; - tmp_ctx = talloc_new(mem_ctx); + /* + * we may not be able to resolve link targets properly when + * dealing with subsets of objects, e.g. the source is a + * critical object and the target isn't + * + * TODO: + * When we implement Trusted Domains we need to consider + * whether they get treated as an incomplete replica here or not + */ + if (dsdb_repl_flags & DSDB_REPL_FLAG_OBJECT_SUBSET) { - ret = dsdb_find_nc_root(ldb, tmp_ctx, source_dn, &source_nc); - if (ret != LDB_SUCCESS) { - DBG_ERR("Failed to find base DN for source %s\n", - ldb_dn_get_linearized(source_dn)); - talloc_free(tmp_ctx); - return true; + /* + * Ignore the link. We don't increase the highwater-mark in + * the object subset cases, so subsequent replications should + * resolve any missing links + */ + DEBUG(2, ("%s target %s linked from %s\n", missing_str, + ldb_dn_get_linearized(target_dn), + ldb_dn_get_linearized(source_dn))); + *ignore_link = true; + return LDB_SUCCESS; } - ret = dsdb_find_nc_root(ldb, tmp_ctx, target_dn, &target_nc); - if (ret != LDB_SUCCESS) { - DBG_ERR("Failed to find base DN for target %s\n", - ldb_dn_get_linearized(target_dn)); - talloc_free(tmp_ctx); - return true; + if (dsdb_repl_flags & DSDB_REPL_FLAG_TARGETS_UPTODATE) { + + /* + * target should already be up-to-date so there's no point in + * retrying. This could be due to bad timing, or if a target + * on a one-way link was deleted. We ignore the link rather + * than failing the replication cycle completely + */ + *ignore_link = true; + DBG_WARNING("%s is %s but up to date. Ignoring link from %s\n", + ldb_dn_get_linearized(target_dn), missing_str, + ldb_dn_get_linearized(source_dn)); + return LDB_SUCCESS; + } + + is_in_same_nc = dsdb_objects_have_same_nc(ldb, + mem_ctx, + source_dn, + target_dn); + if (is_in_same_nc) { + /* fail the replication and retry with GET_TGT */ + ldb_asprintf_errstring(ldb, "%s target %s GUID %s linked from %s\n", + missing_str, + ldb_dn_get_linearized(target_dn), + GUID_string(mem_ctx, guid), + ldb_dn_get_linearized(source_dn)); + return LDB_ERR_NO_SUCH_OBJECT; } - same_nc = (ldb_dn_compare(source_nc, target_nc) == 0); + /* + * The target of the cross-partition link is missing. Continue + * and try to at least add the forward-link. This isn't great, + * but a partial link can be fixed by dbcheck, so it's better + * than dropping the link completely. + */ + *ignore_link = false; - talloc_free(tmp_ctx); + if (is_obj_commit) { - return same_nc; + /* + * Only log this when we're actually committing the objects. + * This avoids spurious logs, i.e. if we're just verifying the + * received link during a join. + */ + DBG_WARNING("%s cross-partition target %s linked from %s\n", + missing_str, ldb_dn_get_linearized(target_dn), + ldb_dn_get_linearized(source_dn)); + } + + return LDB_SUCCESS; } /** @@ -6775,6 +6828,7 @@ static int replmd_check_target_exists(struct ldb_module *module, struct dsdb_dn *dsdb_dn, struct la_entry *la_entry, struct ldb_dn *source_dn, + bool is_obj_commit, struct GUID *guid, bool *ignore_link) { @@ -6847,46 +6901,14 @@ static int replmd_check_target_exists(struct ldb_module *module, if (target_res->count == 0) { /* - * we may not be able to resolve link targets properly when - * dealing with subsets of objects, e.g. the source is a - * critical object and the target isn't - * - * TODO: - * When we implement Trusted Domains we need to consider - * whether they get treated as an incomplete replica here or not + * target object is unknown. Check whether to ignore the link, + * fail the replication, or add a partial link */ - if (la_entry->dsdb_repl_flags & DSDB_REPL_FLAG_OBJECT_SUBSET) { + ret = replmd_allow_missing_target(module, tmp_ctx, dsdb_dn->dn, + source_dn, is_obj_commit, guid, + la_entry->dsdb_repl_flags, + ignore_link, "Unknown"); - /* - * We don't increase the highwater-mark in the object - * subset cases, so subsequent replications should - * resolve any missing links - */ - DEBUG(2,(__location__ - ": Failed to find target %s linked from %s\n", - ldb_dn_get_linearized(dsdb_dn->dn), - ldb_dn_get_linearized(source_dn))); - *ignore_link = true; - - } else if (replmd_objects_have_same_nc(ldb, tmp_ctx, source_dn, - dsdb_dn->dn)) { - ldb_asprintf_errstring(ldb, "Unknown target %s GUID %s linked from %s\n", - ldb_dn_get_linearized(dsdb_dn->dn), - GUID_string(tmp_ctx, guid), - ldb_dn_get_linearized(source_dn)); - ret = LDB_ERR_NO_SUCH_OBJECT; - } else { - - /* - * The target of the cross-partition link is missing. - * Continue and try to at least add the forward-link. - * This isn't great, but if we can add a partial link - * then it's better than nothing. - */ - DEBUG(2,("Failed to resolve cross-partition link between %s and %s\n", - ldb_dn_get_linearized(source_dn), - ldb_dn_get_linearized(dsdb_dn->dn))); - } } else if (target_res->count != 1) { ldb_asprintf_errstring(ldb, "More than one object found matching objectGUID %s\n", GUID_string(tmp_ctx, guid)); @@ -6909,26 +6931,15 @@ static int replmd_check_target_exists(struct ldb_module *module, */ if (target_deletion_state >= OBJECT_RECYCLED) { - if (la_entry->dsdb_repl_flags & DSDB_REPL_FLAG_TARGETS_UPTODATE) { - - /* - * target should already be uptodate so there's no - * point retrying - it's probably just bad timing - */ - *ignore_link = true; - DEBUG(0, ("%s is deleted but up to date. " - "Ignoring link from %s\n", - ldb_dn_get_linearized(dsdb_dn->dn), - ldb_dn_get_linearized(source_dn))); - - } else { - ldb_asprintf_errstring(ldb, - "Deleted target %s GUID %s linked from %s", - ldb_dn_get_linearized(dsdb_dn->dn), - GUID_string(tmp_ctx, guid), - ldb_dn_get_linearized(source_dn)); - ret = LDB_ERR_NO_SUCH_OBJECT; - } + /* + * target object is deleted. Check whether to ignore the + * link, fail the replication, or add a partial link + */ + ret = replmd_allow_missing_target(module, tmp_ctx, + dsdb_dn->dn, source_dn, + is_obj_commit, guid, + la_entry->dsdb_repl_flags, + ignore_link, "Deleted"); } } @@ -7084,8 +7095,18 @@ static int replmd_verify_linked_attribute(struct replmd_replicated_request *ar, return ret; } - ret = replmd_check_target_exists(module, tgt_dsdb_dn, la, - src_msg->dn, &guid, &dummy); + /* + * We can skip the target object checks if we're only syncing critical + * objects, or we know the target is up-to-date. If either case, we + * still continue even if the target doesn't exist + */ + if ((la->dsdb_repl_flags & (DSDB_REPL_FLAG_OBJECT_SUBSET | + DSDB_REPL_FLAG_TARGETS_UPTODATE)) == 0) { + + ret = replmd_check_target_exists(module, tgt_dsdb_dn, la, + src_msg->dn, false, &guid, + &dummy); + } /* * When we fail to find the target object, the error code we pass @@ -7173,7 +7194,7 @@ static int replmd_process_linked_attribute(struct ldb_module *module, } ret = replmd_check_target_exists(module, dsdb_dn, la_entry, msg->dn, - &guid, &ignore_link); + true, &guid, &ignore_link); if (ret != LDB_SUCCESS) { talloc_free(tmp_ctx); diff --git a/source4/rpc_server/drsuapi/getncchanges.c b/source4/rpc_server/drsuapi/getncchanges.c index cd3f51f..b48e9c7 100644 --- a/source4/rpc_server/drsuapi/getncchanges.c +++ b/source4/rpc_server/drsuapi/getncchanges.c @@ -46,6 +46,8 @@ #define DBGC_CLASS DBGC_DRS_REPL #define DRS_GUID_SIZE 16 +#define DEFAULT_MAX_OBJECTS 1000 +#define DEFAULT_MAX_LINKS 1500 /* * state of a partially-completed replication cycle. This state persists @@ -60,6 +62,7 @@ struct drsuapi_getncchanges_state { struct GUID ncRoot_guid; bool is_schema_nc; bool is_get_anc; + bool is_get_tgt; uint64_t min_usn; uint64_t max_usn; struct drsuapi_DsReplicaHighWaterMark last_hwm; @@ -87,9 +90,18 @@ struct la_for_sorting { * only exists for a single call to dcesrv_drsuapi_DsGetNCChanges() */ struct getncchanges_repl_chunk { - struct drsuapi_DsGetNCChangesCtr6 *ctr6; + uint32_t max_objects; + uint32_t max_links; + uint32_t tgt_la_count; + bool immediate_link_sync; + time_t max_wait; + time_t start; - /* the last object written to the response */ + /* stores the objects to be sent in this chunk */ + uint32_t object_count; + struct drsuapi_DsReplicaObjectListItemEx *object_list; + + /* the last object added to this replication chunk */ struct drsuapi_DsReplicaObjectListItemEx *last_object; }; @@ -2245,25 +2257,25 @@ static WERROR getncchanges_add_ancestors(struct drsuapi_DsReplicaObjectListItemE } /** - * Adds a list of new objects into the getNCChanges response message + * Adds a list of new objects into the current chunk of replication data to send */ -static void getncchanges_add_objs_to_resp(struct getncchanges_repl_chunk *repl_chunk, - struct drsuapi_DsReplicaObjectListItemEx *obj_list) +static void getncchanges_chunk_add_objects(struct getncchanges_repl_chunk *repl_chunk, + struct drsuapi_DsReplicaObjectListItemEx *obj_list) { struct drsuapi_DsReplicaObjectListItemEx *obj; /* - * We track the last object added to the response message, so just add + * We track the last object added to the replication chunk, so just add * the new object-list onto the end */ - if (repl_chunk->last_object == NULL) { - repl_chunk->ctr6->first_object = obj_list; + if (repl_chunk->object_list == NULL) { + repl_chunk->object_list = obj_list; } else { repl_chunk->last_object->next_object = obj_list; } for (obj = obj_list; obj != NULL; obj = obj->next_object) { - repl_chunk->ctr6->object_count += 1; + repl_chunk->object_count += 1; /* * Remember the last object in the response - we'll use this to @@ -2347,6 +2359,337 @@ static WERROR getncchanges_get_obj_to_send(const struct ldb_message *msg, return werr; } +/** + * Returns the number of links that are waiting to be sent + */ +static uint32_t getncchanges_chunk_links_pending(struct getncchanges_repl_chunk *repl_chunk, + struct drsuapi_getncchanges_state *getnc_state) +{ + uint32_t links_to_send = 0; + + if (getnc_state->is_get_tgt) { + + /* + * when the GET_TGT flag is set, only include the linked + * attributes whose target object has already been checked + * (i.e. they're ready to send). + */ + if (repl_chunk->tgt_la_count > getnc_state->la_idx) { + links_to_send = (repl_chunk->tgt_la_count - + getnc_state->la_idx); + } + } else { + links_to_send = getnc_state->la_count - getnc_state->la_idx; + } + + return links_to_send; +} + +/** + * Returns the max number of links that will fit in the current replication chunk + */ +static uint32_t getncchanges_chunk_max_links(struct getncchanges_repl_chunk *repl_chunk) +{ + uint32_t max_links = 0; + + if (repl_chunk->max_links != DEFAULT_MAX_LINKS || + repl_chunk->max_objects != DEFAULT_MAX_OBJECTS) { + + /* + * We're using non-default settings, so don't try to adjust + * them, just trust the user has configured decent values + */ + max_links = repl_chunk->max_links; + + } else if (repl_chunk->max_links > repl_chunk->object_count) { + + /* + * This is just an approximate guess to avoid overfilling the + * replication chunk. It's the logic we've used historically. + * E.g. if we've already sent 1000 objects, then send 1000 fewer + * links. For comparison, the max that Windows seems to send is + * ~2700 links and ~250 objects (although this may vary based + * on timeouts) + */ + max_links = repl_chunk->max_links - repl_chunk->object_count; + } + + return max_links; +} + +/** + * Returns true if the current GetNCChanges() call has taken longer than its + * allotted time. This prevents the client from timing out. + */ +static bool getncchanges_chunk_timed_out(struct getncchanges_repl_chunk *repl_chunk) +{ + return (time(NULL) - repl_chunk->start > repl_chunk->max_wait); +} + +/** + * Returns true if the current chunk of replication data has reached the + * max_objects and/or max_links thresholds. + */ +static bool getncchanges_chunk_is_full(struct getncchanges_repl_chunk *repl_chunk, + struct drsuapi_getncchanges_state *getnc_state) +{ + bool chunk_full = false; + uint32_t links_to_send; + uint32_t chunk_limit; -- Samba Shared Repository