Bug#710439: pixman fails testsuite on powerpc due to glibc

2013-05-30 Thread Lennart Sorensen
Package: eglibc
Version: 2.13-38

When I try to build pixman (0.26.0-4) in my powerpc wheezy chroot,
the testsuite fails.

The failure usually looks like:

*** glibc detected *** /tmp/pixman-0.26.0/build/test/.libs/lt-scaling-test: 
corrupted double-linked list: 0x10068e80 ***
=== Backtrace: =
/lib/powerpc-linux-gnu/libc.so.6(+0x86ef4)[0xfd4fef4]
/lib/powerpc-linux-gnu/libc.so.6(+0x88b24)[0xfd51b24]
/lib/powerpc-linux-gnu/libc.so.6(cfree+0x8c)[0xfd554cc]
/tmp/pixman-0.26.0/build/test/.libs/lt-scaling-test[0x10001ba0]
/tmp/pixman-0.26.0/build/test/.libs/lt-scaling-test[0x100024dc]
/usr/lib/powerpc-linux-gnu/libgomp.so.1(+0xb5e0)[0xfe975e0]
/lib/powerpc-linux-gnu/libpthread.so.0(+0x67b0)[0xfe577b0]
/lib/powerpc-linux-gnu/libc.so.6(clone+0x84)[0xfdbd960]
=== Memory map: 
0010-00103000 r-xp  00:00 0  [vdso]
0fca-0fca8000 r-xp  fd:01 4211711
/lib/powerpc-linux-gnu/librt-2.13.so
0fca8000-0fcb7000 ---p 8000 fd:01 4211711
/lib/powerpc-linux-gnu/librt-2.13.so
0fcb7000-0fcb8000 r--p 7000 fd:01 4211711
/lib/powerpc-linux-gnu/librt-2.13.so
0fcb8000-0fcb9000 rw-p 8000 fd:01 4211711
/lib/powerpc-linux-gnu/librt-2.13.so
0fcc9000-0fe39000 r-xp  fd:01 4211720
/lib/powerpc-linux-gnu/libc-2.13.so
0fe39000-0fe3d000 r--p 0017 fd:01 4211720
/lib/powerpc-linux-gnu/libc-2.13.so
0fe3d000-0fe3e000 rw-p 00174000 fd:01 4211720
/lib/powerpc-linux-gnu/libc-2.13.so
0fe3e000-0fe41000 rw-p  00:00 0 
0fe51000-0fe69000 r-xp  fd:01 4211733
/lib/powerpc-linux-gnu/libpthread-2.13.so
...

Also:
*** glibc detected *** /tmp/pixman-0.26.0/build/test/.libs/lt-affine-test: 
corrupted double-linked list: 0x45f00760 ***
=== Backtrace: =
/lib/powerpc-linux-gnu/libc.so.6(+0x86ef4)[0xfd4fef4]
/lib/powerpc-linux-gnu/libc.so.6(+0x88b24)[0xfd51b24]
/lib/powerpc-linux-gnu/libc.so.6(cfree+0x8c)[0xfd554cc]
/tmp/pixman-0.26.0/build/test/.libs/lt-affine-test[0x10001a0c]
/tmp/pixman-0.26.0/build/test/.libs/lt-affine-test[0x10002188]
/usr/lib/powerpc-linux-gnu/libgomp.so.1(+0xb5e0)[0xfe975e0]
/lib/powerpc-linux-gnu/libpthread.so.0(+0x67b0)[0xfe577b0]
/lib/powerpc-linux-gnu/libc.so.6(clone+0x84)[0xfdbd960]
=== Memory map: 
...


If I install libc6 and libc6-dev 2.17-3, the problem goes away and all
tests pass.  Hence why I am assigning blame to glibc for this.

Of course this makes me rather concerned about the glibc in wheezy for
use on powerpc.

Any idea what could be broken in 2.13 and how to fix it, since I would
prefer to stick with stable.  A patch to 2.13 would seem rather preferable
for wheezy than having to start mixing in other bits, especially glibc.

I am building on an IBM p710 with 6 cores, so lots of parallelism is
certainly possible and likely, which may help to trigger bugs like this.

I would think this qualifies as a severity for FTBFS, even though it
isn't glibc that is failing to build, but it is causing other things to
fail to build.

-- 
Len Sorensen


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130530193458.go11...@csclub.uwaterloo.ca



Bug#710439: More info

2013-05-31 Thread Lennart Sorensen
If I disable use of altivec (VMX) in pixman, then the problem goes away.
Makes sense I suppose given it was in the VMX code that it was crashing.

Trying to use valgrind on pixman's failing tests shows lots of cases
of reading or writing beyong malloc'd blocks end, but it shows that for
running under 2.17 where it doesn't crash.  Of course not crashing
doesn't mean the code is correct.

I can't figure out why libc6 2.13 has it fail and 2.17 makes it work
(without even recompiling pixman).

Of course it is still possible that pixman's altivec code is doing
something wrong.  But the buildd logs show it passing pixman building on
one of the power5+ build machines, although using 2.13-37 (not -38) libc6.
Using -37 does not make it work for me though.

-- 
Len Sorensen


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130531190233.gq11...@csclub.uwaterloo.ca



Bug#710439: Testing glibc commit 7e2fca8dd22e3bd932581d6479b0c552deff00b6 to see if it is the solution

2013-08-28 Thread Lennart Sorensen
I am doing a test build of eglibc with commit
7e2fca8dd22e3bd932581d6479b0c552deff00b6 applied to see if it helps.

The commit is:

commit 7e2fca8dd22e3bd932581d6479b0c552deff00b6
Author: Alan Modra 
Date:   Tue Sep 25 16:30:06 2012 -0500

Fix bugs in powerpc pthread_once.

Ref gcc.gnu.org/bugzilla/show_bug.cgi?id=52839#c10

Release barriers are needed to ensure that any memory written by
init_routine is seen by other threads before *once_control changes.
In the case of clear_once_control we need to flush any partially
written state.

Reading the gcc bug report makes it sound like this only really triggers
on a power7 system, which happens to be what I am using.

I will let you know if this commit (which applies cleanly except for
the changelog, with a one line offset) solves the problem.

-- 
Len Sorensen


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130828155242.gk12...@csclub.uwaterloo.ca



Bug#710439: That didn't help

2013-08-28 Thread Lennart Sorensen
So that glibc patch didn't help.

So far all that works is turning off vmx support, or upgrading to a newer
glibc, but that may just be a timing change.  Running with efence takes a
long time, but makes the problem go away and hence detects nothing, making
me again think that is a timing issue.  It seems somehow the different
threads are managing to clobber each other when the vmx code is in use.

-- 
Len Sorensen


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130828211146.gm12...@csclub.uwaterloo.ca



Bug#710439: More info

2013-09-16 Thread Lennart Sorensen
So I have tried some more attempts.

If I disable vmx use in pixman, the crashes go away.
If I disable use of openMP in pixman, the crashes go away.
If I run on a power6 rather than power7, the crashes go away.
If I use strace, the crashes go away.
If I use gdb, the crashes go away.
If I install libc 2.17 instead of 2.13 (without recompiling anything),
the crashes go away.

So somehow it seems to be in interaction between vmx instructions and
openMP on power7, probably to due with memory barriers (or lack of
them somewhere between threads), with very specific timing (which gdb,
strace both affect), which has either been fixed in libc 2.17, or the
timing is somehow different again masking the problem.

I have no idea how to debug this any further.

-- 
Len Sorensen


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130916195711.gd13...@csclub.uwaterloo.ca



Bug#759530: Patch to fix segfault in ldconfig

2015-02-25 Thread Lennart Sorensen
I looked at ways the aux-cache could cause a segfault, and given the
file is mmap'd and has data offsets in it that are used as pointers
without being checked it is not hard to see how a corrupt file could
cause a segfault.  The following patch makes the segfaults I was able
to think of and create go away.

I also have included an example corrupted aux-cache file
(aux-cache-corrupt-soname-offset) which has a bad offset that causes
a segfault.

There is another problem which I haven't solved but which is not a
segfault.  If you somehow truncate the aux-cache file by a bit and run
the previous ldconfig without my patch, then you end up with a corrupted
aux-cache where some entries do not have soname's even though they should,
and that causes you to get messages like:

/sbin/ldconfig.real: /lib/i386-linux-gnu/ is not a symbolic link

/sbin/ldconfig.real: /usr/lib/i386-linux-gnu/ is not a symbolic link

/sbin/ldconfig.real: /lib64/ is not a symbolic link

/sbin/ldconfig.real: /usr/lib64/ is not a symbolic link

/sbin/ldconfig.real: /libx32/ is not a symbolic link

/sbin/ldconfig.real: /usr/libx32/ is not a symbolic link

/sbin/ldconfig.real: /usr/lib/ is not a symbolic link

/sbin/ldconfig.real: /usr/lib/i386-linux-gnu/i586/ is not a symbolic link

/sbin/ldconfig.real: /usr/lib/i386-linux-gnu/i686/cmov/ is not a symbolic link

Using ldconfig -i (and hence ignoring the corrupted aux-cache) makes
that problem go away.  To solve it would of course mean you have to
not trust the cache which rather defeats the point of having the cache,
so I don't know if that is worth trying to solve.  It does not cause a
segfault however.

Using ldconfig -p to show the cache at that point has entries that are
clearly wrong such as:

...
day (libc6, OS ABI: Linux 2.6.32) => /lib/i386-linux-gnu/day
__kernel_sigreturn (libc6,x32, OS ABI: Linux 3.4.0) => 
/libx32/__kernel_sigreturn
X_2.6 (libc6) => /usr/lib/i386-linux-gnu/X_2.6
LF (libc6) => /usr/lib/i386-linux-gnu/LF
 (libc6) => /usr/lib/ 
 (libc6, OS ABI: Linux 2.6.32) => /lib/i386-linux-gnu/
 (libc6, OS ABI: Linux 2.6.32) => /usr/lib/i386-linux-gnu/
 (libc6) => /lib/i386-linux-gnu/
 (libc6) => /usr/lib/i386-linux-gnu/
 (libc6) => /usr/lib/i386-linux-gnu/
 (libc6) => /usr/lib/i386-linux-gnu/
 (libc6) => /usr/lib/i386-linux-gnu/
 (libc6, OS ABI: Linux 2.6.32) => /lib/i386-linux-gnu/
 (libc6, OS ABI: Linux 2.6.32) => /usr/lib/i386-linux-gnu/
 (libc6,x32, OS ABI: Linux 3.4.0) => /libx32/
 (libc6,x32, OS ABI: Linux 3.4.0) => /usr/libx32/
 (libc6,x86-64, OS ABI: Linux 2.6.32) => /lib64/
 (libc6,x86-64, OS ABI: Linux 2.6.32) => /usr/lib64/
 (libc6, hwcap: 0x00088000) => 
/usr/lib/i386-linux-gnu/i686/cmov/
 (libc6, hwcap: 0x0004) => /usr/lib/i386-linux-gnu/i586/
 (libc6) => /lib/i386-linux-gnu/
 (libc6) => /usr/lib/i386-linux-gnu/
 (libc6) => /usr/lib/
� (libc6) => /lib/i386-linux-gnu/�

The file aux-cache-missing-sonames shows this corrupted state.

I hope the patch at least helps solve the worst part of the problem.

-- 
Len Sorensen
diff -ur --exclude debian --exclude build-tree glibc-2.19.ori/elf/cache.c glibc-2.19/elf/cache.c
--- glibc-2.19.ori/elf/cache.c	2015-02-25 16:24:59.0 +
+++ glibc-2.19/elf/cache.c	2015-02-25 17:42:18.0 +
@@ -699,7 +699,8 @@
   if (aux_cache == MAP_FAILED
   || aux_cache_size < sizeof (struct aux_cache_file)
   || memcmp (aux_cache->magic, AUX_CACHEMAGIC, sizeof AUX_CACHEMAGIC - 1)
-  || aux_cache->nlibs >= aux_cache_size)
+  || sizeof(struct aux_cache_file) + (aux_cache->nlibs - 1) * sizeof(struct aux_cache_file_entry) >= aux_cache_size
+  || aux_cache->nlibs * sizeof(struct aux_cache_file_entry) + aux_cache->libs[aux_cache->nlibs - 1].soname >= aux_cache_size)
 {
   close (fd);
   init_aux_cache ();
@@ -712,12 +713,14 @@
   const char *aux_cache_data
 = (const char *) &aux_cache->libs[aux_cache->nlibs];
   for (unsigned int i = 0; i < aux_cache->nlibs; ++i)
-insert_to_aux_cache (&aux_cache->libs[i].id,
-			 aux_cache->libs[i].flags,
-			 aux_cache->libs[i].osversion,
-			 aux_cache->libs[i].soname == 0
-			 ? NULL : aux_cache_data + aux_cache->libs[i].soname,
-			 0);
+/* Only use entries with sane offsets */
+if (aux_cache->libs[i].soname < aux_cache_size)
+  insert_to_aux_cache (&aux_cache->libs[i].id,
+			   aux_cache->libs[i].flags,
+			   aux_cache->libs[i].osversion,
+			   aux_cache->libs[i].soname == 0
+			   ? NULL : aux_cache_data + aux_cache->libs[i].soname,
+			   0);
 
   munmap (aux_cache, aux_cache_size);
   close (fd);


aux-cache-corrupt-soname-offset
Description: Binary data


aux-cache-missing-sonames
Description: Binary data


Bug#759530: Patch to fix segfault in ldconfig

2015-03-02 Thread Lennart Sorensen
On Sun, Mar 01, 2015 at 10:22:05PM +0100, Niels Thykier wrote:
> Excellent, thanks.
> 
> I am taking the liberty of adding the patch tag for this one.  If
> nothing else, I would greatly appreciate having ldconfig not seg. fault. :)

That makes sense to me.

> Sounds like the aux-cache could do with a checksum or something to catch
> obvious corruptions.

I very much agree, although that would of course be a format change and
then you would have to throw away the old file on upgrades.  Perfectly
reasonable and probably the best way to deal with all the corner cases
that I noticed for having it do the wrong thing when corrupted.

I wonder what upstream thinks of it.

> Or ldconfig needs some method to (fairly) reliably detect corruptions.
> If it spits out errors about directories not being links, then something
> notices a problem.  Perhaps this could be extended to discarding the
> cache (even if "just" a """if (!ignore_cache) execve(argv, "-i", null);""").

I thought that could work too.  It sounded rather ugly to do that though.

I rather like the checksum idea instead since it sounds more reliable
and simpler.

-- 
Len Sorensen


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20150302185456.gm25...@csclub.uwaterloo.ca



Bug#759530: Patch to fix segfault in ldconfig

2015-03-08 Thread Lennart Sorensen
On Sun, Mar 08, 2015 at 09:48:50PM +0100, Aurelien Jarno wrote:
> Thanks for your work, it's nice we have been able to understand the real
> issue.

Well I thing the best summary of the problem is that the aux-cache
handling code is awful and doesn't check anything before using the
incoming data as a source for pointers.  Corrupt aux-cache files was
not even remotely considered.  Safest option right now would be to disable
the use of the aux-cache if you want to avoid segfaults in ldconfig.

> Unfortunately they doesn't seem to work here. That said I have been able
> to reproduce the problem by truncating/changing the aux cache manually.

Well it segfaulted on my amd64 system when I tried that file.  Of course
it may need to have the same set of libraries installed as I have.

> I think the idea behind the patch is correct, but it is not fully
> correct (or at least it only fixes corner cases).

I don't doubt I missed some of the problem cases.  Seems the best answer
would be to redesign it with a checksum and a size indicator or something
else that would catch corruption in general.

> > diff -ur --exclude debian --exclude build-tree glibc-2.19.ori/elf/cache.c 
> > glibc-2.19/elf/cache.c
> > --- glibc-2.19.ori/elf/cache.c  2015-02-25 16:24:59.0 +
> > +++ glibc-2.19/elf/cache.c  2015-02-25 17:42:18.0 +
> > @@ -699,7 +699,8 @@
> >if (aux_cache == MAP_FAILED
> >|| aux_cache_size < sizeof (struct aux_cache_file)
> >|| memcmp (aux_cache->magic, AUX_CACHEMAGIC, sizeof AUX_CACHEMAGIC - 
> > 1)
> > -  || aux_cache->nlibs >= aux_cache_size)
> > +  || sizeof(struct aux_cache_file) + (aux_cache->nlibs - 1) * 
> > sizeof(struct aux_cache_file_entry) >= aux_cache_size
> 
> Why using (aux_cache->nlibs - 1) and not directly aux_cache->nlibs here?

Hmm, I seem to have misread the way the structure is defined, and thought
it always had one aux_cache_file_entry in the aux_cache_file struct,
but no it has 0 of them.

> > +  || aux_cache->nlibs * sizeof(struct aux_cache_file_entry) + 
> > aux_cache->libs[aux_cache->nlibs - 1].soname >= aux_cache_size)
> 
> The best to catch all cases here is to compute the theoretical size of
> the file using the headers and comparing it to the real one.

Well the filenames seem to be just stored at the end of the other data
with pointers to it (which is where most of the chances for segfaults
occur).  The format doesn't seem to actually have anything to cover the
size of that pile of strings.  There is really no way to know what the
filesize should be given the end is just a heap of c strings.  Did I
miss something?  You seem to be using anpother header thing that mentions
the string lengths.  Maybe I missed that existing.  That would be useful.

> >  {
> >close (fd);
> >init_aux_cache ();
> > @@ -712,12 +713,14 @@
> >const char *aux_cache_data
> >  = (const char *) &aux_cache->libs[aux_cache->nlibs];
> >for (unsigned int i = 0; i < aux_cache->nlibs; ++i)
> > -insert_to_aux_cache (&aux_cache->libs[i].id,
> > -aux_cache->libs[i].flags,
> > -aux_cache->libs[i].osversion,
> > -aux_cache->libs[i].soname == 0
> > -? NULL : aux_cache_data + aux_cache->libs[i].soname,
> > -0);
> > +/* Only use entries with sane offsets */
> > +if (aux_cache->libs[i].soname < aux_cache_size)
> 
> The check is incorrect here, the address in aux_cache->libs[i].soname is
> relative to aux_cache_data, so it probably catches very few cases.

Well it catches any where the offset is corrupt by a larger value
(doesn't have to be that large either).  But yes I did get that check
slightly wrong.  I think I changed it a few times while working out what
the code was trying to do with the data structure.

> > +  insert_to_aux_cache (&aux_cache->libs[i].id,
> > +  aux_cache->libs[i].flags,
> > +  aux_cache->libs[i].osversion,
> > +  aux_cache->libs[i].soname == 0
> > +  ? NULL : aux_cache_data + aux_cache->libs[i].soname,
> > +  0);
> >  
> >munmap (aux_cache, aux_cache_size);
> >close (fd);
> 
> 
> Please find a new patch below. I have submitted it upstream as bz 18093.
> I am planning to wait for upstream answer or comment for other people a
> few days. I'll then prepare an upload fixing this bug.
> 
> diff --git a/ChangeLog b/ChangeLog
> index 4a5cd16..5086267 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,9 @@
> +2015-03-08  Aurelien Jarno  
> +
> + [BZ #18093]
> + * elf/cache.c (load_aux_cache): Regenerate the cache if it has the
> + wrong size. Ignore entries pointing outside of the mmaped memory.
> +
>  2015-03-08  Paul Pluzhnikov  
>  
>   [BZ #16734]
> diff --git a/elf/cache.c b/elf/cache.c
> index 1732268..9131e08 100644
> --- a/elf/cache.c
> +++ b/elf/cache.c
> @@ -698,7 +698,9 @@ lo

Re: Depends problem. Libc6 is 2.28 but I need 2.29

2020-02-18 Thread Lennart Sorensen
On Sat, Feb 15, 2020 at 04:35:20PM +0200, Grisha Grybyniuk wrote:
> Here it is

Anything in /etc/apt/sources.list.d/ ?

Also if you install and run apt-show-versions, you can filter out the
'uptodate' lines and see what packages it things are either from unknown
sources or too new.

-- 
Len Sorensen



Re: Arch qualification for buster: call for DSA, Security, toolchain concerns

2018-06-29 Thread Lennart Sorensen
On Fri, Jun 29, 2018 at 10:20:50AM +0100, Luke Kenneth Casson Leighton wrote:
>  in addition, arm64 is usually speculative OoO (Cavium ThunderX V1
> being a notable exception) which means it's vulnerable to spectre and
> meltdown attacks, whereas 32-bit ARM is exclusively in-order.  if you
> want to GUARANTEE that you've got spectre-immune hardware you need
> either any 32-bit system (where even Cortex A7 has virtualisattion) or
> if 64-bit is absolutely required use Cortex A53.

The Cortex A8, A7 and A5 are in order.  The A9, A15, A17 etc are out of
order execution.  So any 32 bit arm worth using is pretty much always
out of order execution.

For 64 bit, I think the A35 and A53 are in order while the A57, A72 etc
are out of order.

Of course non Cortex designs vary (I think Marvel's JP4 core was out of
order execution for example).

After all, in general in order execution equals awful performance.

-- 
Len Sorensen



Re: Arch qualification for buster: call for DSA, Security, toolchain concerns

2018-06-29 Thread Lennart Sorensen
On Fri, Jun 29, 2018 at 06:29:48PM +0200, Geert Uytterhoeven wrote:
> Are you sure you're not interchanging A8 and A9, cfr. Linux kernel commit
> e388b80288aade31 ("ARM: spectre-v2: add Cortex A8 and A15 validation of the
> IBE bit")?

Yes.  That is the main reason the A9 is faster than the A8 at the same
clock speed by quite a bit.

For example http://www.360doc.com/content/12/0806/14/350555_228637047.shtml 
says:
Cortex-A9 has many advanced features for a RISC CPU, such as speculative
data accesses, branch prediction, multi-issuing of instructions,
hardware cache coherency, out-of-order execution and register renaming.
Cortex-A8 does not have these, except for dual-issuing instructions and
branch prediction.

So the A8 still has branch prediction so it can keep the pipeline fed,
even with in order execution.  Spectre isn't really about out-of-order
excution as much as it is about speculative memory accesses which some
in-order execution chips have too.

-- 
Len Sorensen