Bug#902051: [xml/sgml-pkgs] Bug#902051: libxslt: generate-id() not returning unique IDs
Hi MaJiang, On Tue, Dec 03, 2019 at 11:21:27AM +0800, ma.ji...@zte.com.cn wrote: > We have managed to get a unique IDs (without multi-thread). Hope this could > help to get a reproducible build. Thank you for chiming in, and sorry for not getting back to you much much sooner!! > Now the ID is generated by a ptr diff. > val = (long)((char *)cur - (char *)_address); > cur is the address of a xmlNsPtr node stored in a hash table(of course, > eventually it's in the heap), base_address is a static variable(in a data > section); Right. > After some debug, we found there are two major disturbances that prevent a > reproducible build. > First, hash functions use a random seed get from time(). So the address of > nodes in hash tables(related to cur) is not stable across multi-builds. > Second, ASLR (Address Space Layout Randomization) changes the base addresses > of data section and heap every time we start a new process. > > To fix the first problem, we could fake a fixed time. We currently use > libfaketime, and of course eventually something like > https://gitlab.gnome.org/GNOME/libxslt/commit/e57df303eca25a2a3f9e0625c29f4b20177858cc >should be applied. > > To fix the second problem, we could change the ptr diff algorithm to > val = (long)((char *)cur - heapStartAddr); > After this change, ALSR could not disturb ID generation anymore, because we > have eliminated the base address of heap. Unfortunately, that doesn't seem to be enough in this case I tried. I did the thing with the current debian package where the SOURCE_DATE_EPOCH commit you linked is already applied, I removed our (broken as this bug report reports) patch, then added yours instead. As a test case I used the debian-faq package, and that produces non-deterministic IDs. Which makes me curious, in which circumstances would your patch produce deterministic IDs? -- regards, Mattia Rizzolo GPG Key: 66AE 2B4A FCCF 3F52 DA18 4D18 4B04 3FCD B944 4540 .''`. More about me: https://mapreri.org : :' : Launchpad user: https://launchpad.net/~mapreri `. `'` Debian QA page: https://qa.debian.org/developer.php?login=mattia `- signature.asc Description: PGP signature
Bug#902051: libxslt: generate-id() not returning unique IDs
Dear Maintainer, We have managed to get a unique IDs (without multi-thread). Hope this could help to get a reproducible build. Now the ID is generated by a ptr diff. val = (long)((char *)cur - (char *)_address); cur is the address of a xmlNsPtr node stored in a hash table(of course, eventually it's in the heap), base_address is a static variable(in a data section); After some debug, we found there are two major disturbances that prevent a reproducible build. First, hash functions use a random seed get from time(). So the address of nodes in hash tables(related to cur) is not stable across multi-builds. Second, ASLR (Address Space Layout Randomization) changes the base addresses of data section and heap every time we start a new process. To fix the first problem, we could fake a fixed time. We currently use libfaketime, and of course eventually something like https://gitlab.gnome.org/GNOME/libxslt/commit/e57df303eca25a2a3f9e0625c29f4b20177858cc should be applied. To fix the second problem, we could change the ptr diff algorithm to val = (long)((char *)cur - heapStartAddr); After this change, ALSR could not disturb ID generation anymore, because we have eliminated the base address of heap. == We have use the following patch to make the xlstproc produce stable IDs. --- libxslt-master/libxslt/functions.c 2010-12-24 20:30:00.0 +0800 +++ libxslt-master-new/libxslt/functions.c 2010-12-24 20:30:00.0 +0800 @@ -14,7 +14,7 @@ #include "libxslt.h" #include - +#include #ifdef HAVE_SYS_TYPES_H #include #endif @@ -671,6 +671,13 @@ xsltFormatNumberFunction(xmlXPathParserC xmlXPathFreeObject(decimalObj); } + +static char *__heapStartAddr; +static __attribute__((constructor)) void initHeapStart() +{ + __heapStartAddr = sbrk(0); +} + /** * xsltGenerateIdFunction: * @ctxt: the XPath Parser context @@ -681,7 +688,6 @@ xsltFormatNumberFunction(xmlXPathParserC */ void xsltGenerateIdFunction(xmlXPathParserContextPtr ctxt, int nargs){ -static char base_address; xmlNodePtr cur = NULL; xmlXPathObjectPtr obj = NULL; long val; @@ -722,7 +728,8 @@ xsltGenerateIdFunction(xmlXPathParserCon if (obj) xmlXPathFreeObject(obj); -val = (long)((char *)cur - (char *)_address); +val = (long)((char *)cur - __heapStartAddr); + if (val >= 0) { snprintf((char *)str, sizeof(str), "idp%ld", val); } else {
Bug#902051: libxslt: generate-id() not returning unique IDs
And the current patch has the additional problem, that *any* software that uses libxslt on debian systems will spuriously outputs warnings/errors(?) like (without any way to turn them off, it seems): "Attributed 3 IDs for element, cleaned up 0" which can mess up e.g. scripts that expects data on stderr only when some important problem has occured... Tobias
Bug#902051: libxslt: generate-id() not returning unique IDs
Package: libxslt Version: 1.1.29-2.1 Severity: important X-Debbugs-CC: reproducible-bui...@lists.alioth.debian.org Dear Maintainer, Nick Bowler has pointed out on the libxslt bug tracker that debian/patches/0004-Make-generate-id-deterministic.patch has issues, most notably that generate-id() is not returning distinct IDs for distinct nodes: https://bugzilla.gnome.org/show_bug.cgi?id=751621#c15 https://bugzilla.gnome.org/show_bug.cgi?id=751621#c19 This bug could cause stylesheets to break in hard-to-detect ways. For example, it could cause the stylesheet to believe that two nodes are the same when they aren't (if the stylesheet is using the common `generate-id(foo) = generate-id(bar)` idiom), or to generate elements with duplicate id attributes (if generate-id() is being used to generate the id attribute). It's not immediately clear how the patch can be fixed, since it relies on an assumption that is no longer valid. Thus, I think the patch should be backed out until we can figure out the correct way to make generate-id() deterministic. Unfortunately, this will cause a regression in reproducibility, but I think that's outweighed by the breakage. Regards, Andrew