On 1/19/23 23:41, Thomas Huth wrote:
On 16/01/2023 23.36, Richard Henderson wrote:
The following changes since commit fb7e7990342e59cf67dbd895c1a1e3fb1741df7a:

   tests/qtest/qom-test: Do not print tested properties by default (2023-01-16 15:00:57 +0000)

are available in the Git repository at:

   https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230116

for you to fetch changes up to 61710a7e23a63546da0071ea32adb96476fa5d07:

   accel/tcg: Split out cpu_exec_{setjmp,loop} (2023-01-16 10:14:12 -1000)

----------------------------------------------------------------
- Reorg cpu_tb_exec around setjmp.
- Use __attribute__((target)) for buffer_is_zero.
- Add perfmap and jitdump for perf support.

----------------------------------------------------------------
Ilya Leoshkevich (3):
       linux-user: Clean up when exiting due to a signal
       accel/tcg: Add debuginfo support
       tcg: add perfmap and jitdump

Richard Henderson (2):
       util/bufferiszero: Use __attribute__((target)) for avx2/avx512
       accel/tcg: Split out cpu_exec_{setjmp,loop}

  Hi Richard, hi Ilya,

with the recent QEMU master branch (commit 701ed34), I'm now seeing failures in 
Travis:

  https://app.travis-ci.com/github/huth/qemu/jobs/593786529#L14411

Everything was still fine a couple of days ago (commit fb7e799):

  https://app.travis-ci.com/github/huth/qemu/builds/259755664

... so it seems this is likely related to this pull request. Could you please 
have a look?

Thankfully our s390x.ci.qemu.org has the same version gcc installed, and I was able to reproduce this. But only once -- it's irregular and very low frequency.

The code generated by gcc is correct and easy to inspect, since cpu_exec_setjmp is now quite small:

00000000000f3250 <cpu_exec_setjmp.isra.0>:
   f3250:       eb 6f f0 30 00 24       stmg    %r6,%r15,48(%r15)
   f3256:       a7 39 00 00             lghi    %r3,0
   f325a:       e3 f0 ff 58 ff 71       lay     %r15,-168(%r15)

                                        // Save cpu to stack+160.
   f3260:       e3 20 f0 a0 00 24       stg     %r2,160(%r15)
   f3266:       41 20 20 f0             la      %r2,240(%r2)
   f326a:       c0 e5 ff fb 10 eb       brasl   %r14,55440 <__sigsetjmp@plt>
   f3270:       ec 26 00 0d 00 7e       cijne   %r2,0,f328a 
<cpu_exec_setjmp.isra.0+0x3a>

                                        // Reload cpu for cpu_exec_loop().
   f3276:       e3 20 f0 a0 00 04       lg      %r2,160(%r15)
   f327c:       c0 e5 ff ff fb ee       brasl   %r14,f2a58 
<cpu_exec_loop.isra.0>
   f3282:       eb 6f f0 d8 00 04       lmg     %r6,%r15,216(%r15)
   f3288:       07 fe                   br      %r14

                                        // Load tls pointer and current_cpu 
address.
   f328a:       b2 4f 00 10             ear     %r1,%a0
   f328e:       c0 20 00 0a 35 9d       larl    %r2,239dc8 
<current_cpu@@Base+0x239dc8>
   f3294:       eb 11 00 20 00 0d       sllg    %r1,%r1,32
   f329a:       e3 20 20 00 00 04       lg      %r2,0(%r2)
   f32a0:       b2 4f 00 11             ear     %r1,%a1

                                        // Reload cpu for comparison
   f32a4:       e3 30 f0 a0 00 04       lg      %r3,160(%r15)
                                        // cpu == current_cpu
   f32aa:       e3 32 10 00 00 20       cg      %r3,0(%r2,%r1)
   f32b0:       a7 84 00 12             je      f32d4 
<cpu_exec_setjmp.isra.0+0x84>
   ...

The only way I can imagine that this comparison fails is if we have corrupted the stack in some way. I have not been able to induce failure under any sort of debugging, and I can't imagine where irregular corruption would have come from.


r~

r~

Reply via email to