Hi,
On 7/21/23 11:08, Claudio Fontana wrote:
Hello Cornelia, Richard,
I had some strange behavior in an s390x TCG VM that I am debugging,
and configured latest upstream QEMU with --enable-debug --enable-debug-tcg
and I am running the qemu binary with -d unimp,guest_errors .
I get:
/usr/bin/qemu-system-s390x -nodefaults -no-reboot -nographic -vga none -cpu
qemu -d unimp,guest_errors -object rng-random,filename=/dev/random,id=rng0
-device virtio-rng-ccw,rng=rng0 -runas qemu -net none -kernel
/var/tmp/boot/kernel -initrd /var/tmp/boot/initrd -append
root=/dev/disk/by-id/virtio-0 rootfstype=ext3
rootflags=data=writeback,nobarrier,commit=150,noatime elevator=noop
nmi_watchdog=0 rw oops=panic panic=1 quiet elevator=noop console=hvc0
init=build -m 2048 -drive
file=/var/tmp/img,format=raw,if=none,id=disk,cache=unsafe -device
virtio-blk-ccw,drive=disk,serial=0 -drive
file=/var/tmp/swap,format=raw,if=none,id=swap,cache=unsafe -device
virtio-blk-ccw,drive=swap,serial=1 -device virtio-serial-ccw -device
virtconsole,chardev=virtiocon0 -chardev stdio,id=virtiocon0 -chardev
socket,id=monitor,server=on,wait=off,path=/var/tmp/img.qemu/monitor -mon
chardev=monitor,mode=readline -smp 8
unimplemented opcode 0xb9ab
unimplemented opcode 0xb2af
...
Since I have some strange misbehavior at runtime, with processes dying with
segfaults and the guest kernel complaining:
[ 2269s] [ 2243.901667][ T8318] User process fault: interruption code 0011
ilc:2 in libc.so.6[3ff87a80000+1c9000]
[ 2269s] [ 2243.904433][ T8318] Failing address: 000002aa0f73f000 TEID:
000002aa0f73f800
[ 2269s] [ 2243.904952][ T8318] Fault in primary space mode while using user
ASCE.
[ 2269s] [ 2243.905405][ T8318] AS:00000000057841c7 R3:0000000001fdc007
S:000000000398c000 P:0000000000000400
I am analyzing this problem further, now that the assertions have been solved.
I seem to have found an issue that manifests as a wrong return value from
glibc's
__strstr_arch13
found in glibc/sysdeps/s390/strstr-arch13.S, which ends up in libc.so
Based on my tests, I could not trigger this issue on baremetal, I could only
see it when run under TCG.
The workload here is the testsuite of the swig package:
git clone https://github.com/swig/swig.git
https://github.com/swig/swig/releases/tag/v4.1.1
https://github.com/swig/swig/commit/77323a0f07562b7d90d36181697a72a909b9519a
The error presents itself as a return of strstr with a match past the end of
the terminating NUL character of the string.
Here is the test I am doing to showcase it: I implemented a simple strstr as
follows:
--------
static char *strstr_simple(const char *haystack, const char *needle)
{
/*
* This function return a pointer to the beginning of the located substring,
or NULL if the substring is not found.
* If needle is the empty string, the return value is always haystack itself.
*/
int i, j;
if (needle == NULL || haystack == NULL) {
return NULL;
}
if (needle[0] == 0) {
return (char *)haystack;
}
for (i = 0; haystack[i] != 0; i++) {
for (j = 0; haystack[i + j] != 0 && needle[j] != 0; j++) {
if (needle[j] != haystack[i + j]) {
break;
}
}
if (needle[j] == 0) {
return (char *)haystack + i;
}
}
return NULL;
}
--------
and then I have a wrapper that compares the results of this simple
implementation with what comes from regular strstr,
where I made sure that the strstr_ifunc results in __strstr_arch13:
char *strstr_w(const char *haystack, const char *needle)
{
char *rv1 = strstr(haystack, needle);
char *rv2 = strstr_simple(haystack, needle);
if (rv1 != rv2) {
printf("haystack: %p \"%s\"\n"
"needle: %p \"%s\"\n"
"rv1: %p\n"
"rv2: %p\n",
(void*)haystack, haystack,
(void*)needle, needle,
(void*)rv1, (void*)rv2);
assert(0);
}
return rv1;
}
--------
After building swig with compilation flags: -m64 -march=z14 -mtune=z15
and running even a minimal test like:
$ cd Examples/perl5/simple
$ export SWIG_LIB=../../../Lib
$ ../../../swig -perl5 -o example.c.wrap example.i
I get:
haystack: 0x2aa2a2488f0 " "363:operator< ignored" "
needle: 0x2aa2961bc8c " ^A"
rv1: 0x2aa2a24891e
rv2: (nil)
swig: DOH/copy.c:120: strstr_w: Assertion `0' failed.
Aborted
As you can see here strstr returns a match where there is none, and what is
even worse, the pointer 0x2aa2a24891e is past the end of the string
(0x2aa2a248910).
This causes the successive code (that relies on valid strstr results) to
memmove a negative value of bytes, which ends up hitting the end of the heap
for the process, causing the segfault originally encountered.
I can make the issue disappear for example by forcing the strstr_ifunc to
choose __GI_strstr instead of __strstr_arch13.
Maybe something going wrong in the vector string search emulation, something
rings a bell?
Let me know if there is something I can provide that could help investigate
further.