Hi,

On 7/21/23 11:08, Claudio Fontana wrote:
> 
> Hello Cornelia, Richard,
> 
> I had some strange behavior in an s390x TCG VM that I am debugging,
> 
> and configured latest upstream QEMU with --enable-debug --enable-debug-tcg
> 
> and I am running the qemu binary with -d unimp,guest_errors .
> 
> I get:
> 
> /usr/bin/qemu-system-s390x -nodefaults -no-reboot -nographic -vga none -cpu 
> qemu -d unimp,guest_errors -object rng-random,filename=/dev/random,id=rng0 
> -device virtio-rng-ccw,rng=rng0 -runas qemu -net none -kernel 
> /var/tmp/boot/kernel -initrd /var/tmp/boot/initrd -append 
> root=/dev/disk/by-id/virtio-0 rootfstype=ext3 
> rootflags=data=writeback,nobarrier,commit=150,noatime elevator=noop 
> nmi_watchdog=0 rw oops=panic panic=1 quiet elevator=noop console=hvc0 
> init=build -m 2048 -drive 
> file=/var/tmp/img,format=raw,if=none,id=disk,cache=unsafe -device 
> virtio-blk-ccw,drive=disk,serial=0 -drive 
> file=/var/tmp/swap,format=raw,if=none,id=swap,cache=unsafe -device 
> virtio-blk-ccw,drive=swap,serial=1 -device virtio-serial-ccw -device 
> virtconsole,chardev=virtiocon0 -chardev stdio,id=virtiocon0 -chardev 
> socket,id=monitor,server=on,wait=off,path=/var/tmp/img.qemu/monitor -mon 
> chardev=monitor,mode=readline -smp 8
> 
> unimplemented opcode 0xb9ab
> unimplemented opcode 0xb2af
> 

...

> Since I have some strange misbehavior at runtime, with processes dying with 
> segfaults and the guest kernel complaining:
> 
>  [ 2269s] [ 2243.901667][ T8318] User process fault: interruption code 0011 
> ilc:2 in libc.so.6[3ff87a80000+1c9000]
>  [ 2269s] [ 2243.904433][ T8318] Failing address: 000002aa0f73f000 TEID: 
> 000002aa0f73f800
>  [ 2269s] [ 2243.904952][ T8318] Fault in primary space mode while using user 
> ASCE.
>  [ 2269s] [ 2243.905405][ T8318] AS:00000000057841c7 R3:0000000001fdc007 
> S:000000000398c000 P:0000000000000400 
> 

I am analyzing this problem further, now that the assertions have been solved.

I seem to have found an issue that manifests as a wrong return value from 
glibc's

__strstr_arch13

found in glibc/sysdeps/s390/strstr-arch13.S, which ends up in libc.so

Based on my tests, I could not trigger this issue on baremetal, I could only 
see it when run under TCG.

The workload here is the testsuite of the swig package:

git clone https://github.com/swig/swig.git

https://github.com/swig/swig/releases/tag/v4.1.1
https://github.com/swig/swig/commit/77323a0f07562b7d90d36181697a72a909b9519a

The error presents itself as a return of strstr with a match past the end of 
the terminating NUL character of the string.

Here is the test I am doing to showcase it: I implemented a simple strstr as 
follows:

--------

static char *strstr_simple(const char *haystack, const char *needle)
{
  /*                                                                            
                                                            
   * This function return a pointer to the beginning of the located substring, 
or NULL if the substring is not found.                       
   * If needle is the empty string, the return value is always haystack itself. 
                                                            
   */
  int i, j;

  if (needle == NULL || haystack == NULL) {
    return NULL;
  }
  if (needle[0] == 0) {
    return (char *)haystack;
  }
  for (i = 0; haystack[i] != 0; i++) {
    for (j = 0; haystack[i + j] != 0 && needle[j] != 0; j++) {
      if (needle[j] != haystack[i + j]) {
        break;
      }
    }
    if (needle[j] == 0) {
      return (char *)haystack + i;
    }
  }
  return NULL;
}


--------

and then I have a wrapper that compares the results of this simple 
implementation with what comes from regular strstr,
where I made sure that the strstr_ifunc results in __strstr_arch13:

char *strstr_w(const char *haystack, const char *needle)
{
  char *rv1 = strstr(haystack, needle);
  char *rv2 = strstr_simple(haystack, needle);
  if (rv1 != rv2) {
    printf("haystack: %p \"%s\"\n"
           "needle: %p \"%s\"\n"
           "rv1: %p\n"
           "rv2: %p\n",
           (void*)haystack, haystack,
           (void*)needle, needle,
           (void*)rv1, (void*)rv2);
    assert(0);
  }
  return rv1;
}

--------


After building swig with compilation flags: -m64 -march=z14 -mtune=z15

and running even a minimal test like:

$ cd Examples/perl5/simple
$ export SWIG_LIB=../../../Lib
$ ../../../swig -perl5 -o example.c.wrap example.i

I get:

haystack: 0x2aa2a2488f0 "        "363:operator< ignored" "
needle: 0x2aa2961bc8c " ^A"
rv1: 0x2aa2a24891e
rv2: (nil)
swig: DOH/copy.c:120: strstr_w: Assertion `0' failed.
Aborted

As you can see here strstr returns a match where there is none, and what is 
even worse, the pointer 0x2aa2a24891e is past the end of the string 
(0x2aa2a248910).

This causes the successive code (that relies on valid strstr results) to 
memmove a negative value of bytes, which ends up hitting the end of the heap 
for the process, causing the segfault originally encountered.

I can make the issue disappear for example by forcing the strstr_ifunc to 
choose __GI_strstr instead of __strstr_arch13.

Maybe something going wrong in the vector string search emulation, something 
rings a bell?

Let me know if there is something I can provide that could help investigate 
further.

Thanks,

Claudio










Reply via email to