The branch, master has been updated via d6581d213d5 ldb: move struct ldb_debug_ops to ldb_private.h via 6dd68d89786 ldb: move struct ldb_utf8_fns to ldb_private.h via a00c0ebd090 s4:dsdb:strcasecmp_with_ldb_val() avoids overflow via b6974030e6a lib/fuzzing: add fuzz_strncasecmp_ldb via b22e1d3207d ldb: don't cast to unsigned for ldb_ascii_toupper() via e33a0dd70f0 ldb: ldb_set_utf8_functions follows README.Coding via 4a6a1d1f0af ldb: deprecate ldb_set_utf8_fns via 42ae85d70af ldb: remove old ldb_comparison_fold_utf8_broken() via 960724a06e4 ldb: ldb_comparison_fold always uses the casecmp function via edabb9f4cb9 ldb-samba: use ldb_comparison_fold_utf8() via 0becc8a90cb ldb-samba: add ldb_comparison_fold_utf8, wrapping strncasecmp_ldb via f9797950fd6 util:charset: strncasecmp_ldb avoids iconv for ASCII via 55397514db5 util:charset: strncasecmp_ldb degrades to ASCII strncasecmp via eb91e3437b4 util:charset: add strncasecmp_ldb() via 7cc3c56293d ldb: ldb_set_utf8_default() sets comparison function via 6c27284f7e9 ldb: ldb_comparison_fold_ascii sorts unsigned via 92275e27947 ldb: add ldb_comparison_fold_ascii() for default comparisons via 947f977acb7 ldb: ldb_comparison_fold uses the utf-8 casecmp function via ae7ca36830b ldb: add ldb_set_utf8_functions() for setting casefold functions via 1624ac7a987 ldb: move ldb_comparison_fold guts into a separate function via 278a3c7f7c6 ldb: add a utf-8 comparison fold callback via f9fbc7a5067 lib/util/charset: be explicit about INVALID_CODEPOINT value via 023a7ce7d5a ldb: add test_ldb_comparison_fold from 589a9ea6767 s4:kdc: Add comment about possible interaction between the krbtgt account and Group Managed Service Accounts
https://git.samba.org/?p=samba.git;a=shortlog;h=master - Log ----------------------------------------------------------------- commit d6581d213d5f625da493f14620e1a12e79a8e195 Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Thu May 23 09:40:00 2024 +1200 ldb: move struct ldb_debug_ops to ldb_private.h Only accessed through struct ldb_context -> debug_ops, which is already private. Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> Autobuild-User(master): Andrew Bartlett <abart...@samba.org> Autobuild-Date(master): Thu May 23 00:19:30 UTC 2024 on atb-devel-224 commit 6dd68d897865bd2518a6a71753ca0bc76d51b37e Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Thu May 23 09:36:57 2024 +1200 ldb: move struct ldb_utf8_fns to ldb_private.h It is only accessed via ldb functions that find it on the already-private struct ldb_context. Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit a00c0ebd090f69f94ce6ba7774a9fc126d7de504 Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Mon May 13 11:08:35 2024 +1200 s4:dsdb:strcasecmp_with_ldb_val() avoids overflow In the unlikely event that strlen(str) > INT_MAX, the result could have overflowed. This is not a sort transitivity issue, as this is not a symmetric sort comparison, but it would affect binary search reliability. Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit b6974030e6a7ddb330894f46631c8da4359b2d18 Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Mon May 13 10:39:44 2024 +1200 lib/fuzzing: add fuzz_strncasecmp_ldb As well as checking for the usual overflows, this asserts that strncasecmp_ldb is always transitive, by splitting the input into 3 pieces and comparing all pairs. Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit b22e1d3207d90f102247d690bfe31db55d7b681e Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Fri May 17 11:38:10 2024 +1200 ldb: don't cast to unsigned for ldb_ascii_toupper() Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit e33a0dd70f00481d1c3d9e2fdd227e26431402ef Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Tue May 21 10:55:53 2024 +1200 ldb: ldb_set_utf8_functions follows README.Coding Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 4a6a1d1f0afa830a679781a522d724bd861a3601 Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Fri May 17 11:35:01 2024 +1200 ldb: deprecate ldb_set_utf8_fns Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 42ae85d70af8da1aecbf45f5fb6e7d7ee1c379fb Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Fri May 10 15:43:36 2024 +1200 ldb: remove old ldb_comparison_fold_utf8_broken() There are no callers. Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 960724a06e4dcb793d606c71d6e79387761b3d42 Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Thu May 16 17:01:10 2024 +1200 ldb: ldb_comparison_fold always uses the casecmp function Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit edabb9f4cb9460f382a621a1f494cfdac615232a Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Thu May 16 14:09:46 2024 +1200 ldb-samba: use ldb_comparison_fold_utf8() This means ldb-samba/dsdb comparisons will be case-insensitive for non-ASCII UTF-8 characters (within the bounds of the 16-bit casefold table). And they will remain transitive. Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 0becc8a90cbeac7022a72061debe2edc5b67680a Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Fri May 10 15:42:46 2024 +1200 ldb-samba: add ldb_comparison_fold_utf8, wrapping strncasecmp_ldb Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit f9797950fd69c16dfab39804dc53172977a345ee Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Tue May 14 21:33:16 2024 +1200 util:charset: strncasecmp_ldb avoids iconv for ASCII This is a common case, and we can save a bit of work. Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 55397514db568ca7b75acf139afd527ece137bc1 Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Mon May 13 11:32:26 2024 +1200 util:charset: strncasecmp_ldb degrades to ASCII strncasecmp If strncasecmp_ldb() encounters invalid utf-8 bytes, it compares those as greater than any valid bytes (that is, it sorts them to the end of the list). If an invalid sequence is encountered in both strings at once, the rest of the strings are now compared using the default ldb_comparison_fold rules, as implemented in ldb_comparison_fold_ascii(). That is, each byte is compared individually, [a-z] are translated to [A-Z], and runs of spaces are collapsed into single spaces. There is no perfect answer in this case, but this solution is stable, fine-grained, and probably close to what is expected. This byte-by-byte comparison is equivalent to a utf-8 comparison without case-folding of multibyte codes. Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit eb91e3437b44c7ad653aac86d481ceaaddb06b01 Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Tue Apr 30 12:41:43 2024 +1200 util:charset: add strncasecmp_ldb() This is a function for comparing strings in a way that suits a case-insenstive syntaxes in LDB. We have it here, rahter than in LDB itself, because it needs the upcase table. By default uses ASCII-only comparisons. SSSD and OpenChange use it in that configuration, but Samba replaces the comparison and casefold functions with Unicode aware versions. Until now Samba has done that in a bad way; this will allow it to do better. Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 7cc3c56293d9c93d9c88fba8df0e998db3f7eaf7 Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Fri May 17 11:37:18 2024 +1200 ldb: ldb_set_utf8_default() sets comparison function The default is ASCII only, which is used by SSSD and OpenChange. Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 6c27284f7e9feae7e37072449e0c752034f6b672 Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Thu May 9 17:21:29 2024 +1200 ldb: ldb_comparison_fold_ascii sorts unsigned Typically in 8-bit character sets, those with the 0x80 bit set are seen as 288-255, not negative numbers. This will sort them after 'Z', not before 'A'. Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 92275e27947989706561292f47789a8d715a11d1 Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Wed May 15 20:51:08 2024 +1200 ldb: add ldb_comparison_fold_ascii() for default comparisons This function is made from the ASCII-only bits of the old ldb_comparison_fold() -- that is, what you get if you never follow a `goto utf8str` jump. It comparse the bytes, but collapses spaces and maps [a-z] to [A-Z]. This does exactly what ldb_comparison_fold_utf8_broken() would do in situations where ldb_casfold() calls ldb_casefold_default(). That means SSSD. The comparison is probably using signed char, so high bytes are actually low bytes. Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 947f977acb7946a4521cc8be2e7c0a61bd0e3f1e Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Sun May 19 15:09:26 2024 +1200 ldb: ldb_comparison_fold uses the utf-8 casecmp function But only if it is set, which it never is (so far). Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit ae7ca36830be7823dde17bcaeae74b5f46b1aa3d Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Fri May 17 11:34:35 2024 +1200 ldb: add ldb_set_utf8_functions() for setting casefold functions This replaces ldb_set_utf8_fns(), which will be deprecated really soon. The reason for this, as shown in surrounding commits, is that without an explicit case-insensitive comparison we need to rely on the casefold, and if the casefold can fail (because, e.g. bad utf-8) the comparison ends up being a bit chaotic. The strings being compared are generally user controlled, and a malicious user might find ways of hiding values or perhaps fooling a binary search. A case-insensitive comparisons that works gradually through the string without an all-at-once casefold is better placed to deal with problems where they happen, and we are able to separately specialise for the ASCII case (used by SSSD) and the UTF-8 case (Samba). Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 1624ac7a9876b4b8779364542747f66f5832a709 Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Thu May 16 14:10:06 2024 +1200 ldb: move ldb_comparison_fold guts into a separate function We're going to make this use a configurable pointer. Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 278a3c7f7c6506134e0e1d15126f55b444f37fbc Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Thu May 9 16:52:53 2024 +1200 ldb: add a utf-8 comparison fold callback This isn't used yet, but it will allow library users to select a case-insensitive comparison function that matches their chosen casefold. This will allow the comparisons to be consistent when the strings are bad, whereas currently we kind of guess. Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit f9fbc7a5067b78b9fe03e3bcde5e46f82a5704ba Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Wed May 1 15:32:03 2024 +1200 lib/util/charset: be explicit about INVALID_CODEPOINT value Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 023a7ce7d5ae50ff4f0563c68cb84f9f4ad235f2 Author: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Date: Mon May 20 11:15:47 2024 +1200 ldb: add test_ldb_comparison_fold Currently this fails like this: test_ldb_comparison_fold_default_common: 118 errors out of 256 test_ldb_comparison_fold_default_ascii: 32 errors out of 100 test_ldb_comparison_fold_utf8_common: 40 errors out of 256 test_ldb_comparison_fold_utf8: 28 errors out of 100 Signed-off-by: Douglas Bagnall <douglas.bagn...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> ----------------------------------------------------------------------- Summary of changes: lib/fuzzing/fuzz_strncasecmp_ldb.c | 161 +++++++++++++++++++++++ lib/fuzzing/wscript_build | 5 + lib/ldb-samba/ldb_wrap.c | 10 +- lib/ldb-samba/ldb_wrap.h | 5 + lib/ldb-samba/pyldb.c | 2 +- lib/ldb-samba/samba_extensions.c | 2 +- lib/ldb/ABI/ldb-2.10.0.sigs | 2 + lib/ldb/common/attrib_handlers.c | 148 ++------------------- lib/ldb/common/ldb_utf8.c | 91 ++++++++++++- lib/ldb/include/ldb.h | 54 ++++---- lib/ldb/include/ldb_private.h | 24 ++++ lib/ldb/tests/test_ldb_comparison_fold.c | 213 +++++++++++++++++++++++++++++++ lib/ldb/wscript | 5 + lib/util/charset/charset.h | 7 +- lib/util/charset/util_unistr.c | 199 +++++++++++++++++++++++++++++ selftest/tests.py | 1 + source4/dsdb/common/tests/dsdb_dn.c | 6 +- source4/dsdb/schema/schema_query.c | 4 +- source4/torture/ldb/ldb.c | 10 +- 19 files changed, 766 insertions(+), 183 deletions(-) create mode 100644 lib/fuzzing/fuzz_strncasecmp_ldb.c create mode 100644 lib/ldb/tests/test_ldb_comparison_fold.c Changeset truncated at 500 lines: diff --git a/lib/fuzzing/fuzz_strncasecmp_ldb.c b/lib/fuzzing/fuzz_strncasecmp_ldb.c new file mode 100644 index 00000000000..0f785b5bee7 --- /dev/null +++ b/lib/fuzzing/fuzz_strncasecmp_ldb.c @@ -0,0 +1,161 @@ +/* + Fuzzing ldb_comparison_fold() + Copyright (C) Catalyst IT 2020 + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. +*/ +#include "includes.h" +#include "fuzzing/fuzzing.h" +#include "charset.h" + + +int LLVMFuzzerInitialize(int *argc, char ***argv) +{ + return 0; +} + + +int LLVMFuzzerTestOneInput(const uint8_t *input, size_t len) +{ + struct ldb_val v[3] = {{},{},{}}; + size_t i, j, k; + int results[9], ab, ac, bc; + + if (len < 3) { + return 0; + } + + j = 0; + k = 0; + v[j].data = discard_const(input); + + /* + * We split the input into 3 ldb_vals, on the byte '*' (42), chosen + * because it is *not* special with regard to termination, utf-8, or + * casefolding. + * + * if there are not 2 '*' bytes, the last value[s] will be empty, with + * a NULL pointer and zero length. + */ + + for (i = 0; i < len; i++) { + if (input[i] != '*') { + continue; + } + v[j].length = i - k; + i++; + j++; + if (j > 2 || i == len) { + break; + } + k = i; + v[j].data = discard_const(input + k); + } + + for (i = 0; i < 3; i++) { + char *s1 = (char*)v[i].data; + size_t len1 = v[i].length; + for (j = 0; j < 3; j++) { + char *s2 = (char*)v[j].data; + size_t len2 = v[j].length; + int r = strncasecmp_ldb(s1, len1, s2, len2); + if (abs(r) > 1) { + abort(); + } + results[i * 3 + j] = r; + } + } + + /* + * There are nine comparisons we make. + * + * A B C + * A = x x + * B - = x + * C - - = + * + * The diagonal should be all zeros (A == A, etc) + * The upper and lower triangles should complement each other + * (A > B implies B < A; A == B implies B == A). + * + * So we check for those identities first. + */ + + if ((results[0] != 0) || + (results[4] != 0) || + (results[8] != 0)) { + abort(); + } + + ab = results[3]; + ac = results[6]; + bc = results[7]; + + if (ab != -results[1] || + ac != -results[2] || + bc != -results[5]) { + abort(); + } + + /* + * Then there are 27 states within the three comparisons of one + * triangle, because each of AB, AC, and BC can be in 3 states. + * + * 0 (A < B) (A < C) (B < C) A < B < C + * 1 (A < B) (A < C) (B = C) A < (B|C) + * 2 (A < B) (A < C) (B > C) A < C < B + * 3 (A < B) (A = C) (B < C) invalid + * 4 (A < B) (A = C) (B = C) invalid + * 5 (A < B) (A = C) (B > C) (A|C) < B + * 6 (A < B) (A > C) (B < C) invalid + * 7 (A < B) (A > C) (B = C) invalid + * 8 (A < B) (A > C) (B > C) C < A < B + * 9 (A = B) (A < C) (B < C) (A|B) < C + * 10 (A = B) (A < C) (B = C) invalid + * 11 (A = B) (A < C) (B > C) invalid + * 12 (A = B) (A = C) (B < C) invalid + * 13 (A = B) (A = C) (B = C) A = B = C + * 14 (A = B) (A = C) (B > C) invalid + * 15 (A = B) (A > C) (B < C) invalid + * 16 (A = B) (A > C) (B = C) invalid + * 17 (A = B) (A > C) (B > C) C < (A|B) + * 18 (A > B) (A < C) (B < C) B < C < A + * 19 (A > B) (A < C) (B = C) invalid + * 20 (A > B) (A < C) (B > C) invalid + * 21 (A > B) (A = C) (B < C) B < (A|C) + * 22 (A > B) (A = C) (B = C) invalid + * 23 (A > B) (A = C) (B > C) invalid + * 24 (A > B) (A > C) (B < C) B < C < A + * 25 (A > B) (A > C) (B = C) (B|C) < A + * 26 (A > B) (A > C) (B > C) C < B < A + * + * It actually turns out to be quite simple: + */ + + if (ab == 0) { + if (ac != bc) { + abort(); + } + } else if (ab < 0) { + if (ac >= 0 && bc <= 0) { + abort(); + } + } else { + if (ac <= 0 && bc >= 0) { + abort(); + } + } + + return 0; +} diff --git a/lib/fuzzing/wscript_build b/lib/fuzzing/wscript_build index 897a114ca7e..ce2684580ce 100644 --- a/lib/fuzzing/wscript_build +++ b/lib/fuzzing/wscript_build @@ -169,6 +169,11 @@ bld.SAMBA_BINARY('fuzz_security_token_vs_descriptor_ds', deps='fuzzing samba-security afl-fuzz-main', fuzzer=True) +bld.SAMBA_BINARY('fuzz_strncasecmp_ldb', + source='fuzz_strncasecmp_ldb.c', + deps='fuzzing samba-util afl-fuzz-main', + fuzzer=True) + # The fuzz_type and fuzz_function parameters make the built # fuzzer take the same input as ndrdump and so the same that diff --git a/lib/ldb-samba/ldb_wrap.c b/lib/ldb-samba/ldb_wrap.c index 437aaee101a..e5876c80a9c 100644 --- a/lib/ldb-samba/ldb_wrap.c +++ b/lib/ldb-samba/ldb_wrap.c @@ -125,6 +125,14 @@ char *wrap_casefold(void *context, void *mem_ctx, const char *s, size_t n) return strupper_talloc_n(mem_ctx, s, n); } +int ldb_comparison_fold_utf8(void *ignored, + const struct ldb_val *v1, + const struct ldb_val *v2) +{ + return strncasecmp_ldb((const char *)v1->data, v1->length, + (const char *)v2->data, v2->length); +} + struct ldb_context *samba_ldb_init(TALLOC_CTX *mem_ctx, struct tevent_context *ev, @@ -144,7 +152,7 @@ char *wrap_casefold(void *context, void *mem_ctx, const char *s, size_t n) ldb_set_debug(ldb, ldb_wrap_debug, NULL); - ldb_set_utf8_fns(ldb, NULL, wrap_casefold); + ldb_set_utf8_functions(ldb, NULL, wrap_casefold, ldb_comparison_fold_utf8); if (session_info) { if (ldb_set_opaque(ldb, DSDB_SESSION_INFO, session_info)) { diff --git a/lib/ldb-samba/ldb_wrap.h b/lib/ldb-samba/ldb_wrap.h index aa7ccb3a234..274d1e6fddf 100644 --- a/lib/ldb-samba/ldb_wrap.h +++ b/lib/ldb-samba/ldb_wrap.h @@ -30,9 +30,14 @@ struct ldb_dn; struct cli_credentials; struct loadparm_context; struct tevent_context; +struct ldb_val; char *wrap_casefold(void *context, void *mem_ctx, const char *s, size_t n); +int ldb_comparison_fold_utf8(void *ignored, + const struct ldb_val *v1, + const struct ldb_val *v2); + struct ldb_context *ldb_wrap_connect(TALLOC_CTX *mem_ctx, struct tevent_context *ev, struct loadparm_context *lp_ctx, diff --git a/lib/ldb-samba/pyldb.c b/lib/ldb-samba/pyldb.c index 958b3ad4b16..b2a485aaefa 100644 --- a/lib/ldb-samba/pyldb.c +++ b/lib/ldb-samba/pyldb.c @@ -88,7 +88,7 @@ static PyObject *py_ldb_set_utf8_casefold(PyObject *self, ldb = pyldb_Ldb_AS_LDBCONTEXT(self); - ldb_set_utf8_fns(ldb, NULL, wrap_casefold); + ldb_set_utf8_functions(ldb, NULL, wrap_casefold, ldb_comparison_fold_utf8); Py_RETURN_NONE; } diff --git a/lib/ldb-samba/samba_extensions.c b/lib/ldb-samba/samba_extensions.c index be92d982dde..aecc2d70dea 100644 --- a/lib/ldb-samba/samba_extensions.c +++ b/lib/ldb-samba/samba_extensions.c @@ -144,7 +144,7 @@ static int extensions_hook(struct ldb_context *ldb, enum ldb_module_hook_type t) return ldb_operr(ldb); } - ldb_set_utf8_fns(ldb, NULL, wrap_casefold); + ldb_set_utf8_functions(ldb, NULL, wrap_casefold, ldb_comparison_fold_utf8); break; } diff --git a/lib/ldb/ABI/ldb-2.10.0.sigs b/lib/ldb/ABI/ldb-2.10.0.sigs index 2266387cd60..f23014ffaaa 100644 --- a/lib/ldb/ABI/ldb-2.10.0.sigs +++ b/lib/ldb/ABI/ldb-2.10.0.sigs @@ -23,6 +23,7 @@ ldb_casefold_default: char *(void *, TALLOC_CTX *, const char *, size_t) ldb_check_critical_controls: int (struct ldb_control **) ldb_comparison_binary: int (struct ldb_context *, void *, const struct ldb_val *, const struct ldb_val *) ldb_comparison_fold: int (struct ldb_context *, void *, const struct ldb_val *, const struct ldb_val *) +ldb_comparison_fold_ascii: int (void *, const struct ldb_val *, const struct ldb_val *) ldb_connect: int (struct ldb_context *, const char *, unsigned int, const char **) ldb_control_to_string: char *(TALLOC_CTX *, const struct ldb_control *) ldb_controls_except_specified: struct ldb_control **(struct ldb_control **, TALLOC_CTX *, struct ldb_control *) @@ -275,6 +276,7 @@ ldb_set_timeout: int (struct ldb_context *, struct ldb_request *, int) ldb_set_timeout_from_prev_req: int (struct ldb_context *, struct ldb_request *, struct ldb_request *) ldb_set_utf8_default: void (struct ldb_context *) ldb_set_utf8_fns: void (struct ldb_context *, void *, char *(*)(void *, void *, const char *, size_t)) +ldb_set_utf8_functions: void (struct ldb_context *, void *, char *(*)(void *, void *, const char *, size_t), int (*)(void *, const struct ldb_val *, const struct ldb_val *)) ldb_setup_wellknown_attributes: int (struct ldb_context *) ldb_should_b64_encode: int (struct ldb_context *, const struct ldb_val *) ldb_standard_syntax_by_name: const struct ldb_schema_syntax *(struct ldb_context *, const char *) diff --git a/lib/ldb/common/attrib_handlers.c b/lib/ldb/common/attrib_handlers.c index e6d412bd3cf..145ff487310 100644 --- a/lib/ldb/common/attrib_handlers.c +++ b/lib/ldb/common/attrib_handlers.c @@ -327,146 +327,18 @@ int ldb_comparison_binary(struct ldb_context *ldb, void *mem_ctx, } /* - compare two case insensitive strings, ignoring multiple whitespaces - and leading and trailing whitespaces - see rfc2252 section 8.1 - - try to optimize for the ascii case, - but if we find out an utf8 codepoint revert to slower but correct function -*/ + * ldb_comparison_fold is a schema syntax comparison_fn for utf-8 strings that + * collapse multiple spaces into one (e.g. "Directory String" syntax). + * + * The default comparison function only performs ASCII case-folding, and only + * collapses multiple spaces, not tabs and other whitespace (contrary to + * RFC4518). To change the comparison function (as Samba does), use + * ldb_set_utf8_functions(). + */ int ldb_comparison_fold(struct ldb_context *ldb, void *mem_ctx, - const struct ldb_val *v1, const struct ldb_val *v2) + const struct ldb_val *v1, const struct ldb_val *v2) { - const char *s1=(const char *)v1->data, *s2=(const char *)v2->data; - size_t n1 = v1->length, n2 = v2->length; - char *b1, *b2; - const char *u1, *u2; - int ret; - - while (n1 && *s1 == ' ') { s1++; n1--; }; - while (n2 && *s2 == ' ') { s2++; n2--; }; - - while (n1 && n2 && *s1 && *s2) { - /* the first 127 (0x7F) chars are ascii and utf8 guarantees they - * never appear in multibyte sequences */ - if (((unsigned char)s1[0]) & 0x80) goto utf8str; - if (((unsigned char)s2[0]) & 0x80) goto utf8str; - if (ldb_ascii_toupper(*s1) != ldb_ascii_toupper(*s2)) { - break; - } - if (*s1 == ' ') { - while (n1 > 1 && s1[0] == s1[1]) { s1++; n1--; } - while (n2 > 1 && s2[0] == s2[1]) { s2++; n2--; } - } - s1++; s2++; - n1--; n2--; - } - - /* check for trailing spaces only if the other pointers has - * reached the end of the strings otherwise we can - * mistakenly match. ex. "domain users" <-> - * "domainUpdates" - */ - if (n1 && *s1 == ' ' && (!n2 || !*s2)) { - while (n1 && *s1 == ' ') { s1++; n1--; } - } - if (n2 && *s2 == ' ' && (!n1 || !*s1)) { - while (n2 && *s2 == ' ') { s2++; n2--; } - } - if (n1 == 0 && n2 != 0) { - return -(int)ldb_ascii_toupper(*s2); - } - if (n2 == 0 && n1 != 0) { - return (int)ldb_ascii_toupper(*s1); - } - if (n1 == 0 && n2 == 0) { - return 0; - } - return (int)ldb_ascii_toupper(*s1) - (int)ldb_ascii_toupper(*s2); - -utf8str: - /* - * No need to recheck from the start, just from the first utf8 charu - * found. Note that the callback of ldb_casefold() needs to be ascii - * compatible. - * - * Probably ldb_casefold() is wrap_casefold() which wraps - * strupper_talloc_n(). - */ - b1 = ldb_casefold(ldb, mem_ctx, s1, n1); - b2 = ldb_casefold(ldb, mem_ctx, s2, n2); - - if (!b1 || !b2) { - /* - * One of the strings was not UTF8, so we have no - * options but to do a binary compare. - * - * FIXME: this can be non-transitive. - * - * consider { - * CA 8A "ʊ" - * C6 B1 "Ʊ" - * C8 FE invalid utf-8 - * } - * - * The byte "0xfe" is always invalid in utf-8, so the - * comparisons against that string end up coming this way, - * while the "Ʊ" vs "ʊ" comparison goes via the ldb_casefold - * branch. Then: - * - * "ʊ" == "Ʊ" by casefold. - * "ʊ" > {c8 fe} by byte comparison. - * "Ʊ" < {c8 fe} by byte comparison. - * - * In many cases there are no invalid encodings between the - * upper and lower case letters, but the string as a whole - * might also compare differently due to the space-eating in - * the other branch. - */ - talloc_free(b1); - talloc_free(b2); - ret = memcmp(s1, s2, MIN(n1, n2)); - if (ret == 0) { - if (n1 == n2) { - return 0; - } - if (n1 > n2) { - if (s1[n2] == '\0') { - return 0; - } - return 1; - } else { - if (s2[n1] == '\0') { - return 0; - } - return -1; - } - } - return ret; - } - - u1 = b1; - u2 = b2; - - while (*u1 & *u2) { - if (*u1 != *u2) - break; - if (*u1 == ' ') { - while (u1[0] == u1[1]) u1++; - while (u2[0] == u2[1]) u2++; - } - u1++; u2++; - } - if (! (*u1 && *u2)) { - while (*u1 == ' ') u1++; - while (*u2 == ' ') u2++; - } - ret = NUMERIC_CMP(*u1, *u2); - - talloc_free(b1); - talloc_free(b2); - - return ret; + return ldb->utf8_fns.casecmp(ldb->utf8_fns.context, v1, v2); } diff --git a/lib/ldb/common/ldb_utf8.c b/lib/ldb/common/ldb_utf8.c index 178bdd86de1..6891de84101 100644 --- a/lib/ldb/common/ldb_utf8.c +++ b/lib/ldb/common/ldb_utf8.c @@ -34,6 +34,27 @@ #include "ldb_private.h" #include "system/locale.h" +/* + * Set functions for comparing and case-folding case-insensitive ldb val + * strings. + */ +void ldb_set_utf8_functions(struct ldb_context *ldb, + void *context, + char *(*casefold)(void *, void *, const char *, size_t), + int (*casecmp)(void *ctx, + const struct ldb_val *v1, + const struct ldb_val *v2)) +{ + if (context) { + ldb->utf8_fns.context = context; + } + if (casefold) { + ldb->utf8_fns.casefold = casefold; + } + if (casecmp) { + ldb->utf8_fns.casecmp = casecmp; + } +} /* this allow the user to pass in a caseless comparison @@ -43,12 +64,10 @@ void ldb_set_utf8_fns(struct ldb_context *ldb, void *context, char *(*casefold)(void *, void *, const char *, size_t)) { - if (context) - ldb->utf8_fns.context = context; - if (casefold) - ldb->utf8_fns.casefold = casefold; + ldb_set_utf8_functions(ldb, context, casefold, NULL); } + /* a simple case folding function NOTE: does not handle UTF8 @@ -62,14 +81,72 @@ char *ldb_casefold_default(void *context, TALLOC_CTX *mem_ctx, const char *s, si return NULL; } for (i=0;ret[i];i++) { - ret[i] = ldb_ascii_toupper((unsigned char)ret[i]); + ret[i] = ldb_ascii_toupper(ret[i]); } return ret; } + +/* + * The default comparison fold function only knows ASCII. Multiple + * spaces (0x20) are collapsed into one, and [a-z] map to [A-Z]. All + * other bytes are compared without casefolding. + * -- Samba Shared Repository