Le 17/03/2017 à 21:43, Alex Bennée a écrit : > > Laurent Vivier <laur...@vivier.eu> writes: > >> Le 27/02/2017 à 15:38, Alex Bennée a écrit : >>> >>> Laurent Vivier <laur...@vivier.eu> writes: >>> >>>> Le 24/02/2017 à 12:20, Alex Bennée a écrit : >>>>> There are a couple of changes that occur at the same time here: >>>>> >>>>> - introduce a single vCPU qemu_tcg_cpu_thread_fn >>>>> >>>>> One of these is spawned per vCPU with its own Thread and Condition >>>>> variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old >>>>> single threaded function. >>>>> >>>>> - the TLS current_cpu variable is now live for the lifetime of MTTCG >>>>> vCPU threads. This is for future work where async jobs need to know >>>>> the vCPU context they are operating in. >>>>> >>>>> The user to switch on multi-thread behaviour and spawn a thread >>>>> per-vCPU. For a simple test kvm-unit-test like: >>>>> >>>>> ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi >>>>> >>>>> Will now use 4 vCPU threads and have an expected FAIL (instead of the >>>>> unexpected PASS) as the default mode of the test has no protection when >>>>> incrementing a shared variable. >>>>> >>>>> We enable the parallel_cpus flag to ensure we generate correct barrier >>>>> and atomic code if supported by the front and backends. This doesn't >>>>> automatically enable MTTCG until default_mttcg_enabled() is updated to >>>>> check the configuration is supported. >>>> >>>> This commit breaks linux-user mode: >>>> >>>> debian-8 with qemu-ppc on x86_64 with ltp-full-20170116 >>>> >>>> cd /opt/ltp >>>> ./runltp -p -l "qemu-$(date +%FT%T).log" -f /opt/ltp/runtest/syscalls -s >>>> setgroups03 >>>> >>>> setgroups03 1 TPASS : setgroups(65537) fails, Size is > >>>> sysconf(_SC_NGROUPS_MAX), errno=22 >>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89: >>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed. >>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89: >>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed. >>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89: >>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed. >>>> ... >>> >>> Interesting. I can only think the current_cpu change has broken it >>> because most of the changes in this commit affect softmmu targets only >>> (linux-user has its own run loop). >>> >>> Thanks for the report - I'll look into it. >> >> After: >> >> 95b0eca Merge remote-tracking branch >> 'remotes/stsquad/tags/pull-mttcg-fixups-090317-1' into staging >> >> [Tested with my HEAD on: >> b1616fe Merge remote-tracking branch >> 'remotes/famz/tags/docker-pull-request' into staging] >> >> I have now: >> >> <<<test_start>>> >> tag=setgroups03 stime=1489413401 >> cmdline="setgroups03" >> contacts="" >> analysis=exit >> <<<test_output>>> >> ** >> ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion >> failed: (cpu == current_cpu) >> ** > > OK we now understand what's happening: > > - setgroups calls __nptl_setxid_error, triggers abort() > - this sends sig_num 6, then 11 > - host_signal_handler tries to handle 11 > - -> handle_cpu_signal > > Pre: tcg: enable thread-per-vCPU caused this problem: > > - current_cpu was reset to NULL on the way out of the loop > - therefore handle_cpu_signal went boom because > cpu = current_cpu; > cc = CPU_GET_CLASS(cpu); > > Post: tcg: enable thread-per-vCPU caused this problem: > > - current_cpu is now live outside cpu_exec_loop > - this is mainly so async_work functions can assert (cpu == current_cpu) > - hence handle_cpu_signal gets further and calls > cpu_loop_exit(cpu); > - hilarity ensues as we siglongjmp into a stale context > > Obviously we shouldn't try to siglongjmp. But we also shouldn't rely on > current_cpu as a proxy to crash early when outside of the loop. There is > a slight wrinkle that we also have funny handling of segs during > translation if a guest jumps to code in an as-yet un-mapped region of > memory. > > There is currently cpu->running which is set/cleared by > cpu_exec_start/end. Although if we crash between cpu_exec_start and > sigsetjmp the same sort of brokenness might happen. > > Anyway understood now. If anyone has any suggestions for neater stuff > over the weekend please shout, otherwise I'll probably just hack > handle_cpu_signal to do: > > cpu = current_cpu; > if (!cpu->running) { > /* we weren't running or translating JIT code when the signal came */ > return 1; > }
The return doesn't break the loop, but an abort() does. I think we can put abort() here as it can be seen as an internal error (and we get back the previous behavior). Laurent