On 7/13/20 12:51 PM, Cornelia Huck wrote: > On Sat, 11 Jul 2020 13:40:50 +0200 > Claudio Fontana <cfont...@suse.de> wrote: > >> I found out something that for me shows that more investigation here is >> warranted. >> >> >> Here is my latest workaround for the problem: >> >> >> >> diff --git a/hw/s390x/s390-skeys.c b/hw/s390x/s390-skeys.c >> index 1e036cc602..47c9a015af 100644 >> --- a/hw/s390x/s390-skeys.c >> +++ b/hw/s390x/s390-skeys.c >> @@ -252,6 +252,8 @@ static const TypeInfo qemu_s390_skeys_info = { >> .class_size = sizeof(S390SKeysClass), >> }; >> >> +extern void qemu_fflush(QEMUFile *f); >> + >> static void s390_storage_keys_save(QEMUFile *f, void *opaque) >> { >> S390SKeysState *ss = S390_SKEYS(opaque); >> @@ -302,6 +304,7 @@ static void s390_storage_keys_save(QEMUFile *f, void >> *opaque) >> g_free(buf); >> end_stream: >> qemu_put_be64(f, eos); >> + qemu_fflush(f); >> } >> >> static int s390_storage_keys_load(QEMUFile *f, void *opaque, int version_id) >> ------------------------------------------------------------------------------------ >> >> >> I think that this might imply that my patch changing the migration stream >> has only triggered an existing problem. > > Looks a bit like it. > >> >> The sympthom is: the load keys code does not see the EOS (byte value 1). >> It does see the keys (which are all empty in the test, ie 32678 times the >> byte value 0). > > Yes, that (zero keys) is expected. > >> >> The workaround for the sympthom: flush the qemu file after putting the EOS >> in there. >> >> >> Any ideas on where to investigate next? > > Do any other users of the SaveVMHandlers interface see errors as well > (or do they do the fflush dance)? >
Hi Cornelia, just want to point you also to this discussion that I made outside of the context of this particular patch, with a much simpler reproducer: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg03969.html There do not seem to be many users of the "Old style" vmstate_save as it is called in the code. I could find only: fgrep -R -- ".save_state" hw/s390x/tod.c: .save_state = s390_tod_save, hw/s390x/s390-skeys.c: .save_state = s390_storage_keys_save, net/slirp.c: .save_state = net_slirp_state_save, It is difficult to see what does fflush where, and which methods are supposed to do what, because we have quite a few methods in SaveVMHandlers and VMStateDescription. Some of the fflushes are just done in the code just after writing the EOS, and in some cases there is no fflush after writing the EOS, but there are other methods called at completion time as far as I understand, where the fflush is done. The issue in the stream seems to appear only just after the s390 keys are written. Maybe better to continue the discussion at: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg03969.html ? Otherwise it is fine, I can follow two threads if you prefer so. Thanks, Claudio