Bug#902051: [xml/sgml-pkgs] Bug#902051: libxslt: generate-id() not returning unique IDs

2021-01-18 Thread Mattia Rizzolo
Hi MaJiang,

On Tue, Dec 03, 2019 at 11:21:27AM +0800, ma.ji...@zte.com.cn wrote:
> We have managed to get a unique IDs (without multi-thread). Hope this could 
> help to get a reproducible build.

Thank you for chiming in, and sorry for not getting back to you much
much sooner!!

> Now the ID is generated by a ptr diff. 
> val = (long)((char *)cur - (char *)_address);
> cur is the address of a xmlNsPtr node stored in a hash table(of course, 
> eventually it's in the heap), base_address is a static variable(in a data 
> section);

Right.

> After some debug, we found there are two major disturbances that prevent a 
> reproducible build.
> First, hash functions use a random seed get from time(). So the address of 
> nodes in hash tables(related to cur) is not stable across multi-builds.
> Second, ASLR (Address Space Layout Randomization) changes the base addresses 
> of data section and heap  every time we start a new process.
> 
> To fix the first problem, we could fake a fixed time. We currently use 
> libfaketime, and of course eventually  something like 
> https://gitlab.gnome.org/GNOME/libxslt/commit/e57df303eca25a2a3f9e0625c29f4b20177858cc
>should be applied.
> 
> To fix the second problem, we could change the ptr diff algorithm to 
> val = (long)((char *)cur -  heapStartAddr);
> After this change, ALSR could not disturb ID generation anymore, because we 
> have eliminated the base address of heap.

Unfortunately, that doesn't seem to be enough in this case I tried.
I did the thing with the current debian package where the
SOURCE_DATE_EPOCH commit you linked is already applied, I removed our
(broken as this bug report reports) patch, then added yours instead.

As a test case I used the debian-faq package, and that produces
non-deterministic IDs.


Which makes me curious, in which circumstances would your patch produce
deterministic IDs?

-- 
regards,
Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18  4D18 4B04 3FCD B944 4540  .''`.
More about me:  https://mapreri.org : :'  :
Launchpad user: https://launchpad.net/~mapreri  `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia  `-


signature.asc
Description: PGP signature


Bug#902051: libxslt: generate-id() not returning unique IDs

2019-12-02 Thread ma.jiang
Dear Maintainer,
We have managed to get a unique IDs (without multi-thread). Hope this could 
help to get a reproducible build.

Now the ID is generated by a ptr diff. 
val = (long)((char *)cur - (char *)_address);
cur is the address of a xmlNsPtr node stored in a hash table(of course, 
eventually it's in the heap), base_address is a static variable(in a data 
section);

After some debug, we found there are two major disturbances that prevent a 
reproducible build.
First, hash functions use a random seed get from time(). So the address of 
nodes in hash tables(related to cur) is not stable across multi-builds.
Second, ASLR (Address Space Layout Randomization) changes the base addresses of 
data section and heap  every time we start a new process.

To fix the first problem, we could fake a fixed time. We currently use 
libfaketime, and of course eventually  something like 
https://gitlab.gnome.org/GNOME/libxslt/commit/e57df303eca25a2a3f9e0625c29f4b20177858cc
   should be applied.

To fix the second problem, we could change the ptr diff algorithm to 
val = (long)((char *)cur -  heapStartAddr);
After this change, ALSR could not disturb ID generation anymore, because we 
have eliminated the base address of heap.
==
We have use the following patch to make the xlstproc produce stable IDs.
--- libxslt-master/libxslt/functions.c  2010-12-24 20:30:00.0 +0800
+++ libxslt-master-new/libxslt/functions.c  2010-12-24 20:30:00.0 
+0800
@@ -14,7 +14,7 @@
 #include "libxslt.h"

 #include 
-
+#include 
 #ifdef HAVE_SYS_TYPES_H
 #include 
 #endif
@@ -671,6 +671,13 @@ xsltFormatNumberFunction(xmlXPathParserC
 xmlXPathFreeObject(decimalObj);
 }

+
+static char *__heapStartAddr;
+static __attribute__((constructor)) void initHeapStart()
+{
+   __heapStartAddr = sbrk(0);
+}
+
 /**
  * xsltGenerateIdFunction:
  * @ctxt:  the XPath Parser context
@@ -681,7 +688,6 @@ xsltFormatNumberFunction(xmlXPathParserC
  */
 void
 xsltGenerateIdFunction(xmlXPathParserContextPtr ctxt, int nargs){
-static char base_address;
 xmlNodePtr cur = NULL;
 xmlXPathObjectPtr obj = NULL;
 long val;
@@ -722,7 +728,8 @@ xsltGenerateIdFunction(xmlXPathParserCon
 if (obj)
 xmlXPathFreeObject(obj);

-val = (long)((char *)cur - (char *)_address);
+val = (long)((char *)cur - __heapStartAddr);
+
 if (val >= 0) {
   snprintf((char *)str, sizeof(str), "idp%ld", val);
 } else {

Bug#902051: libxslt: generate-id() not returning unique IDs

2019-04-24 Thread Tobias Hoffmann
And the current patch has the additional problem, that *any* software 
that uses libxslt on debian systems will spuriously outputs 
warnings/errors(?) like (without any way to turn them off, it seems):


"Attributed 3 IDs for element, cleaned up 0"

which can mess up e.g. scripts that expects data on stderr only when 
some important problem has occured...


  Tobias



Bug#902051: libxslt: generate-id() not returning unique IDs

2018-06-21 Thread Andrew Ayer
Package: libxslt
Version: 1.1.29-2.1
Severity: important
X-Debbugs-CC: reproducible-bui...@lists.alioth.debian.org

Dear Maintainer,

Nick Bowler has pointed out on the libxslt bug tracker that
debian/patches/0004-Make-generate-id-deterministic.patch has issues,
most notably that generate-id() is not returning distinct IDs for
distinct nodes:

https://bugzilla.gnome.org/show_bug.cgi?id=751621#c15

https://bugzilla.gnome.org/show_bug.cgi?id=751621#c19

This bug could cause stylesheets to break in hard-to-detect ways.  For
example, it could cause the stylesheet to believe that two nodes are
the same when they aren't (if the stylesheet is using the common 
`generate-id(foo) = generate-id(bar)` idiom), or to generate elements
with duplicate id attributes (if generate-id() is being used to generate
the id attribute).

It's not immediately clear how the patch can be fixed, since it relies
on an assumption that is no longer valid.  Thus, I think the patch
should be backed out until we can figure out the correct way to make
generate-id() deterministic.  Unfortunately, this will cause a
regression in reproducibility, but I think that's outweighed by the
breakage.

Regards,
Andrew