Hi Markus. On 7/16/19 5:12 PM, Markus Armbruster wrote: > "Dr. David Alan Gilbert" <dgilb...@redhat.com> writes: > >> * Markus Armbruster (arm...@redhat.com) wrote: >>> Philippe asked me to have a look at this one, so here goes.
Thanks a lot for your careful analysis. I got scared the uh-oh you raised would get this series or rework of it still refused for the release, so I went for a quick-and-dirty bugfix. Since this bugfix got merged (as commit 3a283507c0347), I can now think again about how to properly fix this (if it is fixable...). >>> Philippe Mathieu-Daudé <phi...@redhat.com> writes: >>> >>>> In the document [*] the "Read Array Flowchart", the READ_ARRAY >>>> command has a value of 0xff. >>>> >>>> Use the correct value in the pflash model. >>>> >>>> There is no change of behavior in the guest, because: >>>> - when the guest were sending 0xFF, the reset_flash label >>>> was setting the command value as 0x00 >>>> - 0x00 was used internally for READ_ARRAY >>> >>> *Groan* >>> >>> Is this cleanup, or does it fix an observable bug? Well it depends where you stand. I have a few patches on top of this adding trace events (4.2 material) and while debugging it was not making sense with the CFI specs. 1/ The guest writes 0xFF to go in READ_ARRAY mode, the model report a warning and take the switch default case which calls pflash_reset(), which happens to set the flash in READ_ARRAY. 2/ Then a later series adds the CFI specs timings (like the CFI02 model), because it would useful to test the UEFI Capsule Update feature with real-time behavior. For the 'Virt' pflash, the timing is disabled. Now running a non-Virt pflash, it becomes very slow because each time the guest goes into READ_ARRAY mode, the reset delay (which is the longest) occurs. Talking with Laszlo, I figured for 1/ instead of fixing the model, I can display 0x00 as 0xFF and ignore the pflash_reset() when the caller is not system_reset(). Dirty again. For 2/ it is not that easy, it will depends if there is more interest from the UEFI community (Intel parallel NOR flashes are used on x86 and aarch64 UEFI platforms). If we justify 1/ and 2/ are not important, then it is simply a cleanup. >>>> To keep migration with older versions behaving correctly, we >>>> decide to always migrate the READ_ARRAY as 0x00. >>>> >>>> If the CFI open standard decide to assign a new command of value >>>> 0x00, this model is flawed because it uses this value internally. >>>> If a guest eventually requires this new CFI feature, a different >>>> model will be required (or this same model but breaking backward >>>> migration). So it is safe to keep migrating READ_ARRAY as 0x00. >>> >>> We could perhaps keep migration working for "benign" device states, with >>> judicious use of subsections. We'll cross that bridge when we get to >>> it. >>> >>>> [*] "Common Flash Interface (CFI) and Command Sets" >>>> (Intel Application Note 646) >>>> Appendix B "Basic Command Set" >>>> >>>> Reviewed-by: John Snow <js...@redhat.com> >>>> Reviewed-by: Alistair Francis <alistair.fran...@wdc.com> >>>> Regression-tested-by: Laszlo Ersek <ler...@redhat.com> >>>> Signed-off-by: Philippe Mathieu-Daudé <phi...@redhat.com> >>>> --- >>>> v3: Handle migrating the 'cmd' field. >>>> v4: Handle migrating to older QEMU (Dave) >>>> v5: Add a paragraph about why this model is flawed due to >>>> historically using READ_ARRAY as 0x00 (Dave, Peter). >>>> >>>> Since Laszlo stated he did not test migration [*], I'm keeping his >>>> test tag, because the change with v2 has no impact in the tests >>>> he ran. >>>> >>>> Likewise I'm keeping John and Alistair tags, but I'd like an extra >>>> review for the migration change, thanks! >>>> >>>> [*] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg00679.html >>>> --- >>>> hw/block/pflash_cfi01.c | 57 ++++++++++++++++++++++++++++++++++------- >>>> 1 file changed, 48 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c >>>> index 9e34fd4e82..85bb2132c0 100644 >>>> --- a/hw/block/pflash_cfi01.c >>>> +++ b/hw/block/pflash_cfi01.c >>>> @@ -96,6 +96,37 @@ struct PFlashCFI01 { >>>> bool old_multiple_chip_handling; >>>> }; >>>> >>>> +static int pflash_pre_save(void *opaque) >>>> +{ >>>> + PFlashCFI01 *s = opaque; >>>> + >>>> + /* >>>> + * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the >>>> + * READ_ARRAY command. To preserve migrating to these older version, >>>> + * always migrate the READ_ARRAY command as 0x00. >>>> + */ >>>> + if (s->cmd == 0xff) { >>>> + s->cmd = 0x00; >>>> + } >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static int pflash_post_save(void *opaque) >>>> +{ >>>> + PFlashCFI01 *s = opaque; >>>> + >>>> + /* >>>> + * If migration failed, the guest will continue to run. >>>> + * Restore the correct READ_ARRAY value. >>>> + */ >>>> + if (s->cmd == 0x00) { >>>> + s->cmd = 0xff; >>>> + } >>>> + >>>> + return 0; >>>> +} >>> >>> Uh, this gives me a queasy feeling. Perhaps David can assuage it. >> >> See the previous 4 versions of discussion.... > > Jumped in at v5; sorry about that. > >>> I figure the intent is to migrate PFlashCFI01 member @cmd value 0xFF as >>> 0x00, for migration compatibility to and from older versions. >>> >>> You do this by monkey-patching it to 0x00 before migration, and to 0xFF >>> afterwards. On the incoming side, you replace 0x00 by 0xFF, in >>> pflash_post_load() below. >>> >>> Questions: >>> >>> * Can anything but the code that sends @cmd see the temporary 0x00 value >>> between pflash_pre_save() and pflash_post_save() >> >> It is the same pflash data structure; but all CPUs are stopped and we're >> just walking the list of devices serialising them; so no nothing should >> be seeing that value. > > Sounds good. > >> (There is another way to do this, which is to produce a temporary >> structure at this point, populate the temporary structure and migrate >> that) > > Not necessary. > > The uh-ohs below still need assuaging, not necessarily yours. > >> Dave >> >>> * Consider the matrix source \in { old, new } x dest \in { old, new } x >>> @cmd on source in { 0x00, 0xFF }. What does migration put into @cmd >>> on dest? Eight cases: >>> >>> source @cmd -> wire -> dest @cmd >>> old 0x00 -> 0x00 -> old 0x00 (1) >>> new 0xFF (2) >>> old 0xFF -> 0xFF -> old 0xFF (3) >>> new 0xFF (4) >>> new 0x00 -> 0x00 -> old 0x00 (5) >>> new 0xFF (6) >>> new 0xFF -> 0x00 -> old 0x00 (7) >>> new 0xFF (8) >>> >>> Old -> old (cases 1 and 3) is unaffected by this patch. >>> >>> New -> new leaves 0xFF unchanged (8). It changes 0x00 to 0xFF (6). >>> Uh-oh. Can this happen? Rephrasing the question: can @cmd ever be >>> 0x00 with this patch applied? 0x00 is not in the spec (but as suggested Peter Maydell, the spec can eventually assign the value in the future [*]). So no guests use it. This value is only set: - when the (old) model is initialized, without any access to the guest - if the guest wrote an incorrect value, hitting the switch default case. The guest can not read this value (value internal to the state machine). So answer: Yes :( [*] However the spec has a way to announce supported features to the guest. >>> >>> Old -> new leaves 0xFF unchanged (4). It changes 0x00 to 0xFF (2), >>> which I think is intentional. >>> >>> New -> old leaves 0x00 unchanged (5). It changes 0xFF to 0x00 (7), >>> which I think is intentional. >>> >>> Old -> new -> old leaves 0x00 unchanged. Good. It changes 0xFF to >>> 0x00. Uh-oh. Can @cmd ever be 0xFF before this patch? I understand the full question as "Can @cmd ever be 0xFF [in Old] before this patch?". Answer: No, neither the guest nor the state machine can set @cmd to 0xFF in Old. >>> >>> New -> old -> new leaves 0xFF unchanged. Good. It changes 0x00 to >>> 0xFF. Same uh-oh as for new -> new. "Same uh-oh", do you mean "Can @cmd ever be 0x00 [in New] before this patch?"? Same answer: No, neither the guest nor the state machine can set @cmd to 0x00 in New. >>> >>>> + >>>> static int pflash_post_load(void *opaque, int version_id); >>>> >>>> static const VMStateDescription vmstate_pflash = { >>>> @@ -103,6 +134,8 @@ static const VMStateDescription vmstate_pflash = { >>>> .version_id = 1, >>>> .minimum_version_id = 1, >>>> .post_load = pflash_post_load, >>>> + .pre_save = pflash_pre_save, >>>> + .post_save = pflash_post_save, >>>> .fields = (VMStateField[]) { >>>> VMSTATE_UINT8(wcycle, PFlashCFI01), >>>> VMSTATE_UINT8(cmd, PFlashCFI01), >>>> @@ -277,10 +310,9 @@ static uint32_t pflash_read(PFlashCFI01 *pfl, hwaddr >>>> offset, >>>> /* This should never happen : reset state & treat it as a read */ >>>> DPRINTF("%s: unknown command state: %x\n", __func__, pfl->cmd); >>>> pfl->wcycle = 0; >>>> - pfl->cmd = 0; >>>> + pfl->cmd = 0xff; >>>> /* fall through to read code */ >>>> - case 0x00: >>>> - /* Flash area read */ >>>> + case 0xff: /* Read Array */ >>>> ret = pflash_data_read(pfl, offset, width, be); >>>> break; >>> >>> On 0xFF, we no longer zap pfl->wcycle and pfl->cmd. We have 2 ways to set @cmd=0xFF. - Write 0xFF, write an invalid command, finish a multicycle operation (wcycle returns to 0): pflash_write() goto reset_flash, then calls memory_region_rom_device_set_romd(). set wcycle=0, cmd=READ_ARRAY. Next read() will be in ROMD mode, we won't reach pflash_read(). - if next access is write, we'll enter the same pflash_write(). - The /* This should never happen */ comment in pflash_read(). It might happens migrating? So we migrated crap, the guest wants to read, the crap defaults to READ_ARRAY in I/O mode. Wrong in many ways, not sure what the guest expects there, probably not ARRAY data. Anyway, we stay in this READ_ARRAY I/O mode until the guest eventually does a write access. Wrong. The case "The state machine set 0xff, let the device in I/O mode" so we expect to answer to a read() with READ_ARRAY is wrong too, the device should already be in ROMD mode. >>> >>> On 0x00, we do. Because we have no idea how we got there... Neither what we should do. >>> >>> We zap pfl->cmd to 0xFF instead of 0x00. Same below after label >>> error_flash and reset_flash. Related: initialization to 0xFF instead of >>> 0x00 in pflash_cfi01_realize(). I *guess* these changes together ensure >>> pfl->cmd can't become 0x00. Correct? >>> >>>> case 0x10: /* Single byte program */ >>>> @@ -448,8 +480,6 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr >>>> offset, >>>> case 0: >>>> /* read mode */ >>>> switch (cmd) { >>>> - case 0x00: /* ??? */ >>>> - goto reset_flash; >>> >>> On 0x00, we now use default: goto error_flash. Can this happen? This could happen if the the guest is writing crap, so we correctly report this as an error. >>> >>>> case 0x10: /* Single Byte Program */ >>>> case 0x40: /* Single Byte Program */ >>>> DPRINTF("%s: Single Byte Program\n", __func__); >>>> @@ -526,7 +556,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr >>>> offset, >>>> if (cmd == 0xd0) { /* confirm */ >>>> pfl->wcycle = 0; >>>> pfl->status |= 0x80; >>>> - } else if (cmd == 0xff) { /* read array mode */ >>>> + } else if (cmd == 0xff) { /* Read Array */ >>>> goto reset_flash; >>>> } else >>>> goto error_flash; >>>> @@ -553,7 +583,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr >>>> offset, >>>> } else if (cmd == 0x01) { >>>> pfl->wcycle = 0; >>>> pfl->status |= 0x80; >>>> - } else if (cmd == 0xff) { >>>> + } else if (cmd == 0xff) { /* read array mode */ >>> >>> Your new comment is phrased the way you corrected in the previous hunk. >>> Intentional? No :/ Too many rebases. >>> >>>> goto reset_flash; >>>> } else { >>>> DPRINTF("%s: Unknown (un)locking command\n", __func__); >>>> @@ -645,7 +675,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr >>>> offset, >>> error_flash: >>> qemu_log_mask(LOG_UNIMP, "%s: Unimplemented flash cmd sequence " >>> "(offset " TARGET_FMT_plx ", wcycle 0x%x cmd 0x%x >>> value 0x%x)" >>> "\n", __func__, offset, pfl->wcycle, pfl->cmd, value); >>> >>> reset_flash: >>>> trace_pflash_reset(); >>>> memory_region_rom_device_set_romd(&pfl->mem, true); >>>> pfl->wcycle = 0; >>>> - pfl->cmd = 0; >>>> + pfl->cmd = 0xff; >>>> } >>>> >>>> >>>> @@ -761,7 +791,7 @@ static void pflash_cfi01_realize(DeviceState *dev, >>>> Error **errp) >>>> } >>>> >>>> pfl->wcycle = 0; >>>> - pfl->cmd = 0; >>>> + pfl->cmd = 0xff; >>>> pfl->status = 0; >>>> /* Hardcoded CFI table */ >>>> /* Standard "QRY" string */ >>>> @@ -1001,5 +1031,14 @@ static int pflash_post_load(void *opaque, int >>>> version_id) >>>> pfl->vmstate = >>>> qemu_add_vm_change_state_handler(postload_update_cb, >>>> pfl); >>>> } >>>> + >>>> + /* >>>> + * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the >>>> + * READ_ARRAY command. >>>> + */ >>>> + if (pfl->cmd == 0x00) { >>>> + pfl->cmd = 0xff; >>>> + } >>>> + >>>> return 0; >>>> } >> -- >> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK