The branch, master has been updated
       via  d6581d213d5 ldb: move struct ldb_debug_ops to ldb_private.h
       via  6dd68d89786 ldb: move struct ldb_utf8_fns to ldb_private.h
       via  a00c0ebd090 s4:dsdb:strcasecmp_with_ldb_val() avoids overflow
       via  b6974030e6a lib/fuzzing: add fuzz_strncasecmp_ldb
       via  b22e1d3207d ldb: don't cast to unsigned for ldb_ascii_toupper()
       via  e33a0dd70f0 ldb: ldb_set_utf8_functions follows README.Coding
       via  4a6a1d1f0af ldb: deprecate ldb_set_utf8_fns
       via  42ae85d70af ldb: remove old ldb_comparison_fold_utf8_broken()
       via  960724a06e4 ldb: ldb_comparison_fold always uses the casecmp 
function
       via  edabb9f4cb9 ldb-samba: use ldb_comparison_fold_utf8()
       via  0becc8a90cb ldb-samba: add ldb_comparison_fold_utf8, wrapping 
strncasecmp_ldb
       via  f9797950fd6 util:charset: strncasecmp_ldb avoids iconv for ASCII
       via  55397514db5 util:charset: strncasecmp_ldb degrades to ASCII 
strncasecmp
       via  eb91e3437b4 util:charset: add strncasecmp_ldb()
       via  7cc3c56293d ldb: ldb_set_utf8_default() sets comparison function
       via  6c27284f7e9 ldb: ldb_comparison_fold_ascii sorts unsigned
       via  92275e27947 ldb: add ldb_comparison_fold_ascii() for default 
comparisons
       via  947f977acb7 ldb: ldb_comparison_fold uses the utf-8 casecmp function
       via  ae7ca36830b ldb: add ldb_set_utf8_functions() for setting casefold 
functions
       via  1624ac7a987 ldb: move ldb_comparison_fold guts into a separate 
function
       via  278a3c7f7c6 ldb: add a utf-8 comparison fold callback
       via  f9fbc7a5067 lib/util/charset: be explicit about INVALID_CODEPOINT 
value
       via  023a7ce7d5a ldb: add test_ldb_comparison_fold
      from  589a9ea6767 s4:kdc: Add comment about possible interaction between 
the krbtgt account and Group Managed Service Accounts

https://git.samba.org/?p=samba.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit d6581d213d5f625da493f14620e1a12e79a8e195
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Thu May 23 09:40:00 2024 +1200

    ldb: move struct ldb_debug_ops to ldb_private.h
    
    Only accessed through struct ldb_context -> debug_ops, which is already 
private.
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>
    
    Autobuild-User(master): Andrew Bartlett <abart...@samba.org>
    Autobuild-Date(master): Thu May 23 00:19:30 UTC 2024 on atb-devel-224

commit 6dd68d897865bd2518a6a71753ca0bc76d51b37e
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Thu May 23 09:36:57 2024 +1200

    ldb: move struct ldb_utf8_fns to ldb_private.h
    
    It is only accessed via ldb functions that find it on the already-private
    struct ldb_context.
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit a00c0ebd090f69f94ce6ba7774a9fc126d7de504
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Mon May 13 11:08:35 2024 +1200

    s4:dsdb:strcasecmp_with_ldb_val() avoids overflow
    
    In the unlikely event that strlen(str) > INT_MAX, the result could
    have overflowed.
    
    This is not a sort transitivity issue, as this is not a symmetric sort
    comparison, but it would affect binary search reliability.
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit b6974030e6a7ddb330894f46631c8da4359b2d18
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Mon May 13 10:39:44 2024 +1200

    lib/fuzzing: add fuzz_strncasecmp_ldb
    
    As well as checking for the usual overflows, this asserts that
    strncasecmp_ldb is always transitive, by splitting the input into 3
    pieces and comparing all pairs.
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit b22e1d3207d90f102247d690bfe31db55d7b681e
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Fri May 17 11:38:10 2024 +1200

    ldb: don't cast to unsigned for ldb_ascii_toupper()
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit e33a0dd70f00481d1c3d9e2fdd227e26431402ef
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Tue May 21 10:55:53 2024 +1200

    ldb: ldb_set_utf8_functions follows README.Coding
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit 4a6a1d1f0afa830a679781a522d724bd861a3601
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Fri May 17 11:35:01 2024 +1200

    ldb: deprecate ldb_set_utf8_fns
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit 42ae85d70af8da1aecbf45f5fb6e7d7ee1c379fb
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Fri May 10 15:43:36 2024 +1200

    ldb: remove old ldb_comparison_fold_utf8_broken()
    
    There are no callers.
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit 960724a06e4dcb793d606c71d6e79387761b3d42
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Thu May 16 17:01:10 2024 +1200

    ldb: ldb_comparison_fold always uses the casecmp function
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit edabb9f4cb9460f382a621a1f494cfdac615232a
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Thu May 16 14:09:46 2024 +1200

    ldb-samba: use ldb_comparison_fold_utf8()
    
    This means ldb-samba/dsdb comparisons will be case-insensitive for
    non-ASCII UTF-8 characters (within the bounds of the 16-bit casefold
    table). And they will remain transitive.
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit 0becc8a90cbeac7022a72061debe2edc5b67680a
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Fri May 10 15:42:46 2024 +1200

    ldb-samba: add ldb_comparison_fold_utf8, wrapping strncasecmp_ldb
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit f9797950fd69c16dfab39804dc53172977a345ee
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Tue May 14 21:33:16 2024 +1200

    util:charset: strncasecmp_ldb avoids iconv for ASCII
    
    This is a common case, and we can save a bit of work.
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit 55397514db568ca7b75acf139afd527ece137bc1
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Mon May 13 11:32:26 2024 +1200

    util:charset: strncasecmp_ldb degrades to ASCII strncasecmp
    
    If strncasecmp_ldb() encounters invalid utf-8 bytes, it compares those
    as greater than any valid bytes (that is, it sorts them to the end of
    the list).
    
    If an invalid sequence is encountered in both strings at once, the
    rest of the strings are now compared using the default ldb_comparison_fold
    rules, as implemented in ldb_comparison_fold_ascii(). That is, each
    byte is compared individually, [a-z] are translated to [A-Z], and runs of
    spaces are collapsed into single spaces.
    
    There is no perfect answer in this case, but this solution is stable,
    fine-grained, and probably close to what is expected. This
    byte-by-byte comparison is equivalent to a utf-8 comparison without
    case-folding of multibyte codes.
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit eb91e3437b44c7ad653aac86d481ceaaddb06b01
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Tue Apr 30 12:41:43 2024 +1200

    util:charset: add strncasecmp_ldb()
    
    This is a function for comparing strings in a way that suits a
    case-insenstive syntaxes in LDB.
    
    We have it here, rahter than in LDB itself, because it needs the
    upcase table. By default uses ASCII-only comparisons. SSSD and
    OpenChange use it in that configuration, but Samba replaces the
    comparison and casefold functions with Unicode aware versions.
    
    Until now Samba has done that in a bad way; this will allow it to do
    better.
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit 7cc3c56293d9c93d9c88fba8df0e998db3f7eaf7
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Fri May 17 11:37:18 2024 +1200

    ldb: ldb_set_utf8_default() sets comparison function
    
    The default is ASCII only, which is used by SSSD and OpenChange.
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit 6c27284f7e9feae7e37072449e0c752034f6b672
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Thu May 9 17:21:29 2024 +1200

    ldb: ldb_comparison_fold_ascii sorts unsigned
    
    Typically in 8-bit character sets, those with the 0x80 bit set are
    seen as 288-255, not negative numbers. This will sort them after 'Z',
    not before 'A'.
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit 92275e27947989706561292f47789a8d715a11d1
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Wed May 15 20:51:08 2024 +1200

    ldb: add ldb_comparison_fold_ascii() for default comparisons
    
    This function is made from the ASCII-only bits of the old
    ldb_comparison_fold() -- that is, what you get if you never follow a
    `goto utf8str` jump. It comparse the bytes, but collapses spaces and
    maps [a-z] to [A-Z].
    
    This does exactly what ldb_comparison_fold_utf8_broken() would do in
    situations where ldb_casfold() calls ldb_casefold_default(). That
    means SSSD.
    
    The comparison is probably using signed char, so high bytes are
    actually low bytes.
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit 947f977acb7946a4521cc8be2e7c0a61bd0e3f1e
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Sun May 19 15:09:26 2024 +1200

    ldb: ldb_comparison_fold uses the utf-8 casecmp function
    
    But only if it is set, which it never is (so far).
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit ae7ca36830be7823dde17bcaeae74b5f46b1aa3d
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Fri May 17 11:34:35 2024 +1200

    ldb: add ldb_set_utf8_functions() for setting casefold functions
    
    This replaces ldb_set_utf8_fns(), which will be deprecated really soon.
    
    The reason for this, as shown in surrounding commits, is that without
    an explicit case-insensitive comparison we need to rely on the casefold,
    and if the casefold can fail (because, e.g. bad utf-8) the comparison
    ends up being a bit chaotic. The strings being compared are generally
    user controlled, and a malicious user might find ways of hiding values
    or perhaps fooling a binary search.
    
    A case-insensitive comparisons that works gradually through the string
    without an all-at-once casefold is better placed to deal with problems
    where they happen, and we are able to separately specialise for the
    ASCII case (used by SSSD) and the UTF-8 case (Samba).
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit 1624ac7a9876b4b8779364542747f66f5832a709
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Thu May 16 14:10:06 2024 +1200

    ldb: move ldb_comparison_fold guts into a separate function
    
    We're going to make this use a configurable pointer.
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit 278a3c7f7c6506134e0e1d15126f55b444f37fbc
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Thu May 9 16:52:53 2024 +1200

    ldb: add a utf-8 comparison fold callback
    
    This isn't used yet, but it will allow library users to select a
    case-insensitive comparison function that matches their chosen casefold.
    
    This will allow the comparisons to be consistent when the strings are bad,
    whereas currently we kind of guess.
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit f9fbc7a5067b78b9fe03e3bcde5e46f82a5704ba
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Wed May 1 15:32:03 2024 +1200

    lib/util/charset: be explicit about INVALID_CODEPOINT value
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

commit 023a7ce7d5ae50ff4f0563c68cb84f9f4ad235f2
Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
Date:   Mon May 20 11:15:47 2024 +1200

    ldb: add test_ldb_comparison_fold
    
    Currently this fails like this:
    
    test_ldb_comparison_fold_default_common: 118 errors out of 256
    test_ldb_comparison_fold_default_ascii:  32 errors out of 100
    test_ldb_comparison_fold_utf8_common:    40 errors out of 256
    test_ldb_comparison_fold_utf8:           28 errors out of 100
    
    Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abart...@samba.org>

-----------------------------------------------------------------------

Summary of changes:
 lib/fuzzing/fuzz_strncasecmp_ldb.c       | 161 +++++++++++++++++++++++
 lib/fuzzing/wscript_build                |   5 +
 lib/ldb-samba/ldb_wrap.c                 |  10 +-
 lib/ldb-samba/ldb_wrap.h                 |   5 +
 lib/ldb-samba/pyldb.c                    |   2 +-
 lib/ldb-samba/samba_extensions.c         |   2 +-
 lib/ldb/ABI/ldb-2.10.0.sigs              |   2 +
 lib/ldb/common/attrib_handlers.c         | 148 ++-------------------
 lib/ldb/common/ldb_utf8.c                |  91 ++++++++++++-
 lib/ldb/include/ldb.h                    |  54 ++++----
 lib/ldb/include/ldb_private.h            |  24 ++++
 lib/ldb/tests/test_ldb_comparison_fold.c | 213 +++++++++++++++++++++++++++++++
 lib/ldb/wscript                          |   5 +
 lib/util/charset/charset.h               |   7 +-
 lib/util/charset/util_unistr.c           | 199 +++++++++++++++++++++++++++++
 selftest/tests.py                        |   1 +
 source4/dsdb/common/tests/dsdb_dn.c      |   6 +-
 source4/dsdb/schema/schema_query.c       |   4 +-
 source4/torture/ldb/ldb.c                |  10 +-
 19 files changed, 766 insertions(+), 183 deletions(-)
 create mode 100644 lib/fuzzing/fuzz_strncasecmp_ldb.c
 create mode 100644 lib/ldb/tests/test_ldb_comparison_fold.c


Changeset truncated at 500 lines:

diff --git a/lib/fuzzing/fuzz_strncasecmp_ldb.c 
b/lib/fuzzing/fuzz_strncasecmp_ldb.c
new file mode 100644
index 00000000000..0f785b5bee7
--- /dev/null
+++ b/lib/fuzzing/fuzz_strncasecmp_ldb.c
@@ -0,0 +1,161 @@
+/*
+   Fuzzing ldb_comparison_fold()
+   Copyright (C) Catalyst IT 2020
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.
+*/
+#include "includes.h"
+#include "fuzzing/fuzzing.h"
+#include "charset.h"
+
+
+int LLVMFuzzerInitialize(int *argc, char ***argv)
+{
+       return 0;
+}
+
+
+int LLVMFuzzerTestOneInput(const uint8_t *input, size_t len)
+{
+       struct ldb_val v[3] = {{},{},{}};
+       size_t i, j, k;
+       int results[9], ab, ac, bc;
+
+       if (len < 3) {
+               return 0;
+       }
+
+       j = 0;
+       k = 0;
+       v[j].data = discard_const(input);
+
+       /*
+        * We split the input into 3 ldb_vals, on the byte '*' (42), chosen
+        * because it is *not* special with regard to termination, utf-8, or
+        * casefolding.
+        *
+        * if there are not 2 '*' bytes, the last value[s] will be empty, with
+        * a NULL pointer and zero length.
+        */
+
+       for (i = 0; i < len; i++) {
+               if (input[i] != '*') {
+                       continue;
+               }
+               v[j].length = i - k;
+               i++;
+               j++;
+               if (j > 2 || i == len) {
+                       break;
+               }
+               k = i;
+               v[j].data = discard_const(input + k);
+       }
+
+       for (i = 0; i < 3; i++) {
+               char *s1 = (char*)v[i].data;
+               size_t len1 = v[i].length;
+               for (j = 0; j < 3; j++) {
+                       char *s2 = (char*)v[j].data;
+                       size_t len2 = v[j].length;
+                       int r = strncasecmp_ldb(s1, len1, s2, len2);
+                       if (abs(r) > 1) {
+                               abort();
+                       }
+                       results[i * 3 + j] = r;
+               }
+       }
+
+       /*
+        * There are nine comparisons we make.
+        *
+        *    A B C
+        *  A = x x
+        *  B - = x
+        *  C - - =
+        *
+        * The diagonal should be all zeros (A == A, etc)
+        * The upper and lower triangles should complement each other
+        * (A > B implies B < A; A == B implies B == A).
+        *
+        * So we check for those identities first.
+        */
+
+       if ((results[0] != 0) ||
+           (results[4] != 0) ||
+           (results[8] != 0)) {
+               abort();
+       }
+
+       ab = results[3];
+       ac = results[6];
+       bc = results[7];
+
+       if (ab != -results[1] ||
+           ac != -results[2] ||
+           bc != -results[5]) {
+               abort();
+       }
+
+        /*
+        * Then there are 27 states within the three comparisons of one
+        * triangle, because each of AB, AC, and BC can be in 3 states.
+        *
+        *  0    (A < B) (A < C) (B < C)   A < B < C
+        *  1    (A < B) (A < C) (B = C)   A < (B|C)
+        *  2    (A < B) (A < C) (B > C)   A < C < B
+        *  3    (A < B) (A = C) (B < C)    invalid
+        *  4    (A < B) (A = C) (B = C)    invalid
+        *  5    (A < B) (A = C) (B > C)   (A|C) < B
+        *  6    (A < B) (A > C) (B < C)    invalid
+        *  7    (A < B) (A > C) (B = C)    invalid
+        *  8    (A < B) (A > C) (B > C)   C < A < B
+        *  9    (A = B) (A < C) (B < C)   (A|B) < C
+        * 10    (A = B) (A < C) (B = C)    invalid
+        * 11    (A = B) (A < C) (B > C)    invalid
+        * 12    (A = B) (A = C) (B < C)    invalid
+        * 13    (A = B) (A = C) (B = C)   A = B = C
+        * 14    (A = B) (A = C) (B > C)    invalid
+        * 15    (A = B) (A > C) (B < C)    invalid
+        * 16    (A = B) (A > C) (B = C)    invalid
+        * 17    (A = B) (A > C) (B > C)   C < (A|B)
+        * 18    (A > B) (A < C) (B < C)   B < C < A
+        * 19    (A > B) (A < C) (B = C)    invalid
+        * 20    (A > B) (A < C) (B > C)    invalid
+        * 21    (A > B) (A = C) (B < C)   B < (A|C)
+        * 22    (A > B) (A = C) (B = C)    invalid
+        * 23    (A > B) (A = C) (B > C)    invalid
+        * 24    (A > B) (A > C) (B < C)   B < C < A
+        * 25    (A > B) (A > C) (B = C)   (B|C) < A
+        * 26    (A > B) (A > C) (B > C)   C < B < A
+        *
+        * It actually turns out to be quite simple:
+        */
+
+       if (ab == 0) {
+               if (ac != bc) {
+                       abort();
+               }
+       } else if (ab < 0) {
+               if (ac >= 0 && bc <= 0) {
+                       abort();
+               }
+       } else {
+               if (ac <= 0 && bc >= 0) {
+                       abort();
+               }
+       }
+
+       return 0;
+}
diff --git a/lib/fuzzing/wscript_build b/lib/fuzzing/wscript_build
index 897a114ca7e..ce2684580ce 100644
--- a/lib/fuzzing/wscript_build
+++ b/lib/fuzzing/wscript_build
@@ -169,6 +169,11 @@ bld.SAMBA_BINARY('fuzz_security_token_vs_descriptor_ds',
                  deps='fuzzing samba-security afl-fuzz-main',
                  fuzzer=True)
 
+bld.SAMBA_BINARY('fuzz_strncasecmp_ldb',
+                 source='fuzz_strncasecmp_ldb.c',
+                 deps='fuzzing samba-util afl-fuzz-main',
+                 fuzzer=True)
+
 
 # The fuzz_type and fuzz_function parameters make the built
 # fuzzer take the same input as ndrdump and so the same that
diff --git a/lib/ldb-samba/ldb_wrap.c b/lib/ldb-samba/ldb_wrap.c
index 437aaee101a..e5876c80a9c 100644
--- a/lib/ldb-samba/ldb_wrap.c
+++ b/lib/ldb-samba/ldb_wrap.c
@@ -125,6 +125,14 @@ char *wrap_casefold(void *context, void *mem_ctx, const 
char *s, size_t n)
        return strupper_talloc_n(mem_ctx, s, n);
 }
 
+int ldb_comparison_fold_utf8(void *ignored,
+                            const struct ldb_val *v1,
+                            const struct ldb_val *v2)
+{
+       return strncasecmp_ldb((const char *)v1->data, v1->length,
+                              (const char *)v2->data, v2->length);
+}
+
 
  struct ldb_context *samba_ldb_init(TALLOC_CTX *mem_ctx,
                                    struct tevent_context *ev,
@@ -144,7 +152,7 @@ char *wrap_casefold(void *context, void *mem_ctx, const 
char *s, size_t n)
 
        ldb_set_debug(ldb, ldb_wrap_debug, NULL);
 
-       ldb_set_utf8_fns(ldb, NULL, wrap_casefold);
+       ldb_set_utf8_functions(ldb, NULL, wrap_casefold, 
ldb_comparison_fold_utf8);
 
        if (session_info) {
                if (ldb_set_opaque(ldb, DSDB_SESSION_INFO, session_info)) {
diff --git a/lib/ldb-samba/ldb_wrap.h b/lib/ldb-samba/ldb_wrap.h
index aa7ccb3a234..274d1e6fddf 100644
--- a/lib/ldb-samba/ldb_wrap.h
+++ b/lib/ldb-samba/ldb_wrap.h
@@ -30,9 +30,14 @@ struct ldb_dn;
 struct cli_credentials;
 struct loadparm_context;
 struct tevent_context;
+struct ldb_val;
 
 char *wrap_casefold(void *context, void *mem_ctx, const char *s, size_t n);
 
+int ldb_comparison_fold_utf8(void *ignored,
+                            const struct ldb_val *v1,
+                            const struct ldb_val *v2);
+
 struct ldb_context *ldb_wrap_connect(TALLOC_CTX *mem_ctx,
                                     struct tevent_context *ev,
                                     struct loadparm_context *lp_ctx,
diff --git a/lib/ldb-samba/pyldb.c b/lib/ldb-samba/pyldb.c
index 958b3ad4b16..b2a485aaefa 100644
--- a/lib/ldb-samba/pyldb.c
+++ b/lib/ldb-samba/pyldb.c
@@ -88,7 +88,7 @@ static PyObject *py_ldb_set_utf8_casefold(PyObject *self,
 
        ldb = pyldb_Ldb_AS_LDBCONTEXT(self);
 
-       ldb_set_utf8_fns(ldb, NULL, wrap_casefold);
+       ldb_set_utf8_functions(ldb, NULL, wrap_casefold, 
ldb_comparison_fold_utf8);
 
        Py_RETURN_NONE;
 }
diff --git a/lib/ldb-samba/samba_extensions.c b/lib/ldb-samba/samba_extensions.c
index be92d982dde..aecc2d70dea 100644
--- a/lib/ldb-samba/samba_extensions.c
+++ b/lib/ldb-samba/samba_extensions.c
@@ -144,7 +144,7 @@ static int extensions_hook(struct ldb_context *ldb, enum 
ldb_module_hook_type t)
                        return ldb_operr(ldb);
                }
 
-               ldb_set_utf8_fns(ldb, NULL, wrap_casefold);
+               ldb_set_utf8_functions(ldb, NULL, wrap_casefold, 
ldb_comparison_fold_utf8);
                break;
        }
 
diff --git a/lib/ldb/ABI/ldb-2.10.0.sigs b/lib/ldb/ABI/ldb-2.10.0.sigs
index 2266387cd60..f23014ffaaa 100644
--- a/lib/ldb/ABI/ldb-2.10.0.sigs
+++ b/lib/ldb/ABI/ldb-2.10.0.sigs
@@ -23,6 +23,7 @@ ldb_casefold_default: char *(void *, TALLOC_CTX *, const char 
*, size_t)
 ldb_check_critical_controls: int (struct ldb_control **)
 ldb_comparison_binary: int (struct ldb_context *, void *, const struct ldb_val 
*, const struct ldb_val *)
 ldb_comparison_fold: int (struct ldb_context *, void *, const struct ldb_val 
*, const struct ldb_val *)
+ldb_comparison_fold_ascii: int (void *, const struct ldb_val *, const struct 
ldb_val *)
 ldb_connect: int (struct ldb_context *, const char *, unsigned int, const char 
**)
 ldb_control_to_string: char *(TALLOC_CTX *, const struct ldb_control *)
 ldb_controls_except_specified: struct ldb_control **(struct ldb_control **, 
TALLOC_CTX *, struct ldb_control *)
@@ -275,6 +276,7 @@ ldb_set_timeout: int (struct ldb_context *, struct 
ldb_request *, int)
 ldb_set_timeout_from_prev_req: int (struct ldb_context *, struct ldb_request 
*, struct ldb_request *)
 ldb_set_utf8_default: void (struct ldb_context *)
 ldb_set_utf8_fns: void (struct ldb_context *, void *, char *(*)(void *, void 
*, const char *, size_t))
+ldb_set_utf8_functions: void (struct ldb_context *, void *, char *(*)(void *, 
void *, const char *, size_t), int (*)(void *, const struct ldb_val *, const 
struct ldb_val *))
 ldb_setup_wellknown_attributes: int (struct ldb_context *)
 ldb_should_b64_encode: int (struct ldb_context *, const struct ldb_val *)
 ldb_standard_syntax_by_name: const struct ldb_schema_syntax *(struct 
ldb_context *, const char *)
diff --git a/lib/ldb/common/attrib_handlers.c b/lib/ldb/common/attrib_handlers.c
index e6d412bd3cf..145ff487310 100644
--- a/lib/ldb/common/attrib_handlers.c
+++ b/lib/ldb/common/attrib_handlers.c
@@ -327,146 +327,18 @@ int ldb_comparison_binary(struct ldb_context *ldb, void 
*mem_ctx,
 }
 
 /*
-  compare two case insensitive strings, ignoring multiple whitespaces
-  and leading and trailing whitespaces
-  see rfc2252 section 8.1
-
-  try to optimize for the ascii case,
-  but if we find out an utf8 codepoint revert to slower but correct function
-*/
+ * ldb_comparison_fold is a schema syntax comparison_fn for utf-8 strings that
+ * collapse multiple spaces into one (e.g. "Directory String" syntax).
+ *
+ * The default comparison function only performs ASCII case-folding, and only
+ * collapses multiple spaces, not tabs and other whitespace (contrary to
+ * RFC4518). To change the comparison function (as Samba does), use
+ * ldb_set_utf8_functions().
+ */
 int ldb_comparison_fold(struct ldb_context *ldb, void *mem_ctx,
-                              const struct ldb_val *v1, const struct ldb_val 
*v2)
+                       const struct ldb_val *v1, const struct ldb_val *v2)
 {
-       const char *s1=(const char *)v1->data, *s2=(const char *)v2->data;
-       size_t n1 = v1->length, n2 = v2->length;
-       char *b1, *b2;
-       const char *u1, *u2;
-       int ret;
-
-       while (n1 && *s1 == ' ') { s1++; n1--; };
-       while (n2 && *s2 == ' ') { s2++; n2--; };
-
-       while (n1 && n2 && *s1 && *s2) {
-               /* the first 127 (0x7F) chars are ascii and utf8 guarantees they
-                * never appear in multibyte sequences */
-               if (((unsigned char)s1[0]) & 0x80) goto utf8str;
-               if (((unsigned char)s2[0]) & 0x80) goto utf8str;
-               if (ldb_ascii_toupper(*s1) != ldb_ascii_toupper(*s2)) {
-                       break;
-               }
-               if (*s1 == ' ') {
-                       while (n1 > 1 && s1[0] == s1[1]) { s1++; n1--; }
-                       while (n2 > 1 && s2[0] == s2[1]) { s2++; n2--; }
-               }
-               s1++; s2++;
-               n1--; n2--;
-       }
-
-       /* check for trailing spaces only if the other pointers has
-        * reached the end of the strings otherwise we can
-        * mistakenly match.  ex. "domain users" <->
-        * "domainUpdates"
-        */
-       if (n1 && *s1 == ' ' && (!n2 || !*s2)) {
-               while (n1 && *s1 == ' ') { s1++; n1--; }
-       }
-       if (n2 && *s2 == ' ' && (!n1 || !*s1)) {
-               while (n2 && *s2 == ' ') { s2++; n2--; }
-       }
-       if (n1 == 0 && n2 != 0) {
-               return -(int)ldb_ascii_toupper(*s2);
-       }
-       if (n2 == 0 && n1 != 0) {
-               return (int)ldb_ascii_toupper(*s1);
-       }
-       if (n1 == 0 && n2 == 0) {
-               return 0;
-       }
-       return (int)ldb_ascii_toupper(*s1) - (int)ldb_ascii_toupper(*s2);
-
-utf8str:
-       /*
-        * No need to recheck from the start, just from the first utf8 charu
-        * found. Note that the callback of ldb_casefold() needs to be ascii
-        * compatible.
-        *
-        * Probably ldb_casefold() is wrap_casefold() which wraps
-        * strupper_talloc_n().
-        */
-       b1 = ldb_casefold(ldb, mem_ctx, s1, n1);
-       b2 = ldb_casefold(ldb, mem_ctx, s2, n2);
-
-       if (!b1 || !b2) {
-               /*
-                * One of the strings was not UTF8, so we have no
-                * options but to do a binary compare.
-                *
-                * FIXME: this can be non-transitive.
-                *
-                * consider {
-                *           CA 8A  "ʊ"
-                *           C6 B1  "Ʊ"
-                *           C8 FE  invalid utf-8
-                *          }
-                *
-                * The byte "0xfe" is always invalid in utf-8, so the
-                * comparisons against that string end up coming this way,
-                * while the "Ʊ" vs "ʊ" comparison goes via the ldb_casefold
-                * branch. Then:
-                *
-                *  "ʊ" == "Ʊ"     by casefold.
-                *  "ʊ" > {c8 fe}  by byte comparison.
-                *  "Ʊ" < {c8 fe}  by byte comparison.
-                *
-                * In many cases there are no invalid encodings between the
-                * upper and lower case letters, but the string as a whole
-                * might also compare differently due to the space-eating in
-                * the other branch.
-                */
-               talloc_free(b1);
-               talloc_free(b2);
-               ret = memcmp(s1, s2, MIN(n1, n2));
-               if (ret == 0) {
-                       if (n1 == n2) {
-                               return 0;
-                       }
-                       if (n1 > n2) {
-                               if (s1[n2] == '\0') {
-                                       return 0;
-                               }
-                               return 1;
-                       } else {
-                               if (s2[n1] == '\0') {
-                                       return 0;
-                               }
-                               return -1;
-                       }
-               }
-               return ret;
-       }
-
-       u1 = b1;
-       u2 = b2;
-
-       while (*u1 & *u2) {
-               if (*u1 != *u2)
-                       break;
-               if (*u1 == ' ') {
-                       while (u1[0] == u1[1]) u1++;
-                       while (u2[0] == u2[1]) u2++;
-               }
-               u1++; u2++;
-       }
-       if (! (*u1 && *u2)) {
-               while (*u1 == ' ') u1++;
-               while (*u2 == ' ') u2++;
-       }
-       ret = NUMERIC_CMP(*u1, *u2);
-
-       talloc_free(b1);
-       talloc_free(b2);
-
-       return ret;
+       return ldb->utf8_fns.casecmp(ldb->utf8_fns.context, v1, v2);
 }
 
 
diff --git a/lib/ldb/common/ldb_utf8.c b/lib/ldb/common/ldb_utf8.c
index 178bdd86de1..6891de84101 100644
--- a/lib/ldb/common/ldb_utf8.c
+++ b/lib/ldb/common/ldb_utf8.c
@@ -34,6 +34,27 @@
 #include "ldb_private.h"
 #include "system/locale.h"
 
+/*
+ * Set functions for comparing and case-folding case-insensitive ldb val
+ * strings.
+ */
+void ldb_set_utf8_functions(struct ldb_context *ldb,
+                           void *context,
+                           char *(*casefold)(void *, void *, const char *, 
size_t),
+                           int (*casecmp)(void *ctx,
+                                          const struct ldb_val *v1,
+                                          const struct ldb_val *v2))
+{
+       if (context) {
+               ldb->utf8_fns.context = context;
+       }
+       if (casefold) {
+               ldb->utf8_fns.casefold = casefold;
+       }
+       if (casecmp) {
+               ldb->utf8_fns.casecmp = casecmp;
+       }
+}
 
 /*
   this allow the user to pass in a caseless comparison
@@ -43,12 +64,10 @@ void ldb_set_utf8_fns(struct ldb_context *ldb,
                      void *context,
                      char *(*casefold)(void *, void *, const char *, size_t))
 {
-       if (context)
-               ldb->utf8_fns.context = context;
-       if (casefold)
-               ldb->utf8_fns.casefold = casefold;
+       ldb_set_utf8_functions(ldb, context, casefold, NULL);
 }
 
+
 /*
   a simple case folding function
   NOTE: does not handle UTF8
@@ -62,14 +81,72 @@ char *ldb_casefold_default(void *context, TALLOC_CTX 
*mem_ctx, const char *s, si
                return NULL;
        }
        for (i=0;ret[i];i++) {
-               ret[i] = ldb_ascii_toupper((unsigned char)ret[i]);
+               ret[i] = ldb_ascii_toupper(ret[i]);
        }
        return ret;
 }
 
+
+/*
+ * The default comparison fold function only knows ASCII. Multiple
+ * spaces (0x20) are collapsed into one, and [a-z] map to [A-Z]. All
+ * other bytes are compared without casefolding.
+ *


-- 
Samba Shared Repository

Reply via email to