pflash_cfi01: Use the correct READ_ARRAY value

Philippe Mathieu-Daudé Mon, 29 Jul 2019 09:22:07 -0700

Hi Markus.

On 7/16/19 5:12 PM, Markus Armbruster wrote:
> "Dr. David Alan Gilbert" <dgilb...@redhat.com> writes:
> 
>> * Markus Armbruster (arm...@redhat.com) wrote:
>>> Philippe asked me to have a look at this one, so here goes.


Thanks a lot for your careful analysis.

I got scared the uh-oh you raised would get this series or rework of it
still refused for the release, so I went for a quick-and-dirty bugfix.
Since this bugfix got merged (as commit 3a283507c0347), I can now think
again about how to properly fix this (if it is fixable...).

>>> Philippe Mathieu-Daudé <phi...@redhat.com> writes:
>>>
>>>> In the document [*] the "Read Array Flowchart", the READ_ARRAY
>>>> command has a value of 0xff.
>>>>
>>>> Use the correct value in the pflash model.
>>>>
>>>> There is no change of behavior in the guest, because:
>>>> - when the guest were sending 0xFF, the reset_flash label
>>>>   was setting the command value as 0x00
>>>> - 0x00 was used internally for READ_ARRAY
>>>
>>> *Groan*
>>>
>>> Is this cleanup, or does it fix an observable bug?

Well it depends where you stand.

I have a few patches on top of this adding trace events (4.2 material)
and while debugging it was not making sense with the CFI specs.

1/ The guest writes 0xFF to go in READ_ARRAY mode, the model report a
warning and take the switch default case which calls pflash_reset(),
which happens to set the flash in READ_ARRAY.

2/ Then a later series adds the CFI specs timings (like the CFI02
model), because it would useful to test the UEFI Capsule Update feature
with real-time behavior. For the 'Virt' pflash, the timing is disabled.
Now running a non-Virt pflash, it becomes very slow because each time
the guest goes into READ_ARRAY mode, the reset delay (which is the
longest) occurs.

Talking with Laszlo, I figured for 1/ instead of fixing the model, I can
display 0x00 as 0xFF and ignore the pflash_reset() when the caller is
not system_reset(). Dirty again.

For 2/ it is not that easy, it will depends if there is more interest
from the UEFI community (Intel parallel NOR flashes are used on x86 and
aarch64 UEFI platforms).

If we justify 1/ and 2/ are not important, then it is simply a cleanup.

>>>> To keep migration with older versions behaving correctly, we
>>>> decide to always migrate the READ_ARRAY as 0x00.
>>>>
>>>> If the CFI open standard decide to assign a new command of value
>>>> 0x00, this model is flawed because it uses this value internally.
>>>> If a guest eventually requires this new CFI feature, a different
>>>> model will be required (or this same model but breaking backward
>>>> migration). So it is safe to keep migrating READ_ARRAY as 0x00.
>>>
>>> We could perhaps keep migration working for "benign" device states, with
>>> judicious use of subsections.  We'll cross that bridge when we get to
>>> it.
>>>
>>>> [*] "Common Flash Interface (CFI) and Command Sets"
>>>>     (Intel Application Note 646)
>>>>     Appendix B "Basic Command Set"
>>>>
>>>> Reviewed-by: John Snow <js...@redhat.com>
>>>> Reviewed-by: Alistair Francis <alistair.fran...@wdc.com>
>>>> Regression-tested-by: Laszlo Ersek <ler...@redhat.com>
>>>> Signed-off-by: Philippe Mathieu-Daudé <phi...@redhat.com>
>>>> ---
>>>> v3: Handle migrating the 'cmd' field.
>>>> v4: Handle migrating to older QEMU (Dave)
>>>> v5: Add a paragraph about why this model is flawed due to
>>>>     historically using READ_ARRAY as 0x00 (Dave, Peter).
>>>>
>>>> Since Laszlo stated he did not test migration [*], I'm keeping his
>>>> test tag, because the change with v2 has no impact in the tests
>>>> he ran.
>>>>
>>>> Likewise I'm keeping John and Alistair tags, but I'd like an extra
>>>> review for the migration change, thanks!
>>>>
>>>> [*] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg00679.html
>>>> ---
>>>>  hw/block/pflash_cfi01.c | 57 ++++++++++++++++++++++++++++++++++-------
>>>>  1 file changed, 48 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
>>>> index 9e34fd4e82..85bb2132c0 100644
>>>> --- a/hw/block/pflash_cfi01.c
>>>> +++ b/hw/block/pflash_cfi01.c
>>>> @@ -96,6 +96,37 @@ struct PFlashCFI01 {
>>>>      bool old_multiple_chip_handling;
>>>>  };
>>>>  
>>>> +static int pflash_pre_save(void *opaque)
>>>> +{
>>>> +    PFlashCFI01 *s = opaque;
>>>> +
>>>> +    /*
>>>> +     * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the
>>>> +     * READ_ARRAY command. To preserve migrating to these older version,
>>>> +     * always migrate the READ_ARRAY command as 0x00.
>>>> +     */
>>>> +    if (s->cmd == 0xff) {
>>>> +        s->cmd = 0x00;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static int pflash_post_save(void *opaque)
>>>> +{
>>>> +    PFlashCFI01 *s = opaque;
>>>> +
>>>> +    /*
>>>> +     * If migration failed, the guest will continue to run.
>>>> +     * Restore the correct READ_ARRAY value.
>>>> +     */
>>>> +    if (s->cmd == 0x00) {
>>>> +        s->cmd = 0xff;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>
>>> Uh, this gives me a queasy feeling.  Perhaps David can assuage it.
>>
>> See the previous 4 versions of discussion....
> 
> Jumped in at v5; sorry about that.
> 
>>> I figure the intent is to migrate PFlashCFI01 member @cmd value 0xFF as
>>> 0x00, for migration compatibility to and from older versions.
>>>
>>> You do this by monkey-patching it to 0x00 before migration, and to 0xFF
>>> afterwards.  On the incoming side, you replace 0x00 by 0xFF, in
>>> pflash_post_load() below.
>>>
>>> Questions:
>>>
>>> * Can anything but the code that sends @cmd see the temporary 0x00 value
>>>   between pflash_pre_save() and pflash_post_save()
>>
>> It is the same pflash data structure; but all CPUs are stopped and we're
>> just walking the list of devices serialising them; so no nothing should
>> be seeing that value.
> 
> Sounds good.
> 
>> (There is another way to do this, which is to produce a temporary
>> structure at this point, populate the temporary structure and migrate
>> that)
> 
> Not necessary.
> 
> The uh-ohs below still need assuaging, not necessarily yours.
> 
>> Dave
>>
>>> * Consider the matrix source \in { old, new } x dest \in { old, new } x
>>>   @cmd on source in { 0x00, 0xFF }.  What does migration put into @cmd
>>>   on dest?  Eight cases:
>>>
>>>   source @cmd  ->  wire  ->  dest @cmd
>>>     old  0x00  ->  0x00  ->  old  0x00  (1)
>>>                              new  0xFF  (2)
>>>     old  0xFF  ->  0xFF  ->  old  0xFF  (3)
>>>                              new  0xFF  (4)
>>>     new  0x00  ->  0x00  ->  old  0x00  (5)
>>>                              new  0xFF  (6)
>>>     new  0xFF  ->  0x00  ->  old  0x00  (7)
>>>                              new  0xFF  (8)
>>>
>>>   Old -> old (cases 1 and 3) is unaffected by this patch.
>>>
>>>   New -> new leaves 0xFF unchanged (8).  It changes 0x00 to 0xFF (6).
>>>   Uh-oh.  Can this happen?  Rephrasing the question: can @cmd ever be
>>>   0x00 with this patch applied?

0x00 is not in the spec (but as suggested Peter Maydell, the spec can
eventually assign the value in the future [*]). So no guests use it.
This value is only set:
- when the (old) model is initialized, without any access to the guest
- if the guest wrote an incorrect value, hitting the switch default case.
The guest can not read this value (value internal to the state machine).

So answer: Yes :(

[*] However the spec has a way to announce supported features to the guest.

>>>
>>>   Old -> new leaves 0xFF unchanged (4).  It changes 0x00 to 0xFF (2),
>>>   which I think is intentional.
>>>
>>>   New -> old leaves 0x00 unchanged (5).  It changes 0xFF to 0x00 (7),
>>>   which I think is intentional.
>>>
>>>   Old -> new -> old leaves 0x00 unchanged.  Good.  It changes 0xFF to
>>>   0x00.  Uh-oh.  Can @cmd ever be 0xFF before this patch?

I understand the full question as "Can @cmd ever be 0xFF [in Old] before
this patch?".

Answer: No, neither the guest nor the state machine can set @cmd to 0xFF
in Old.

>>>
>>>   New -> old -> new leaves 0xFF unchanged.  Good.  It changes 0x00 to
>>>   0xFF.  Same uh-oh as for new -> new.

"Same uh-oh", do you mean "Can @cmd ever be 0x00 [in New] before this
patch?"?

Same answer: No, neither the guest nor the state machine can set @cmd to
0x00 in New.

>>>
>>>> +
>>>>  static int pflash_post_load(void *opaque, int version_id);
>>>>  
>>>>  static const VMStateDescription vmstate_pflash = {
>>>> @@ -103,6 +134,8 @@ static const VMStateDescription vmstate_pflash = {
>>>>      .version_id = 1,
>>>>      .minimum_version_id = 1,
>>>>      .post_load = pflash_post_load,
>>>> +    .pre_save = pflash_pre_save,
>>>> +    .post_save = pflash_post_save,
>>>>      .fields = (VMStateField[]) {
>>>>          VMSTATE_UINT8(wcycle, PFlashCFI01),
>>>>          VMSTATE_UINT8(cmd, PFlashCFI01),
>>>> @@ -277,10 +310,9 @@ static uint32_t pflash_read(PFlashCFI01 *pfl, hwaddr 
>>>> offset,
>>>>          /* This should never happen : reset state & treat it as a read */
>>>>          DPRINTF("%s: unknown command state: %x\n", __func__, pfl->cmd);
>>>>          pfl->wcycle = 0;
>>>> -        pfl->cmd = 0;
>>>> +        pfl->cmd = 0xff;
>>>>          /* fall through to read code */
>>>> -    case 0x00:
>>>> -        /* Flash area read */
>>>> +    case 0xff: /* Read Array */
>>>>          ret = pflash_data_read(pfl, offset, width, be);
>>>>          break;
>>>
>>> On 0xFF, we no longer zap pfl->wcycle and pfl->cmd.

We have 2 ways to set @cmd=0xFF.

- Write 0xFF, write an invalid command,
  finish a multicycle operation (wcycle returns to 0):

  pflash_write() goto reset_flash, then calls
  memory_region_rom_device_set_romd(). set wcycle=0, cmd=READ_ARRAY.

  Next read() will be in ROMD mode, we won't reach pflash_read().

  - if next access is write, we'll enter the same pflash_write().

- The /* This should never happen */ comment in pflash_read().

  It might happens migrating? So we migrated crap, the guest wants to
  read, the crap defaults to READ_ARRAY in I/O mode. Wrong in many ways,
  not sure what the guest expects there, probably not ARRAY data.
  Anyway, we stay in this READ_ARRAY I/O mode until the guest eventually
  does a write access. Wrong.

The case "The state machine set 0xff, let the device in I/O mode" so we
expect to answer to a read() with READ_ARRAY is wrong too, the device
should already be in ROMD mode.

>>>
>>> On 0x00, we do.

Because we have no idea how we got there... Neither what we should do.

>>>
>>> We zap pfl->cmd to 0xFF instead of 0x00.  Same below after label
>>> error_flash and reset_flash.  Related: initialization to 0xFF instead of
>>> 0x00 in pflash_cfi01_realize().  I *guess* these changes together ensure
>>> pfl->cmd can't become 0x00.  Correct?
>>>
>>>>      case 0x10: /* Single byte program */
>>>> @@ -448,8 +480,6 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr 
>>>> offset,
>>>>      case 0:
>>>>          /* read mode */
>>>>          switch (cmd) {
>>>> -        case 0x00: /* ??? */
>>>> -            goto reset_flash;
>>>
>>> On 0x00, we now use default: goto error_flash.  Can this happen?

This could happen if the the guest is writing crap, so we correctly
report this as an error.

>>>
>>>>          case 0x10: /* Single Byte Program */
>>>>          case 0x40: /* Single Byte Program */
>>>>              DPRINTF("%s: Single Byte Program\n", __func__);
>>>> @@ -526,7 +556,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr 
>>>> offset,
>>>>              if (cmd == 0xd0) { /* confirm */
>>>>                  pfl->wcycle = 0;
>>>>                  pfl->status |= 0x80;
>>>> -            } else if (cmd == 0xff) { /* read array mode */
>>>> +            } else if (cmd == 0xff) { /* Read Array */
>>>>                  goto reset_flash;
>>>>              } else
>>>>                  goto error_flash;
>>>> @@ -553,7 +583,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr 
>>>> offset,
>>>>              } else if (cmd == 0x01) {
>>>>                  pfl->wcycle = 0;
>>>>                  pfl->status |= 0x80;
>>>> -            } else if (cmd == 0xff) {
>>>> +            } else if (cmd == 0xff) { /* read array mode */
>>>
>>> Your new comment is phrased the way you corrected in the previous hunk.
>>> Intentional?

No :/ Too many rebases.

>>>
>>>>                  goto reset_flash;
>>>>              } else {
>>>>                  DPRINTF("%s: Unknown (un)locking command\n", __func__);
>>>> @@ -645,7 +675,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr 
>>>> offset,
>>>     error_flash:
>>>        qemu_log_mask(LOG_UNIMP, "%s: Unimplemented flash cmd sequence "
>>>                      "(offset " TARGET_FMT_plx ", wcycle 0x%x cmd 0x%x 
>>> value 0x%x)"
>>>                      "\n", __func__, offset, pfl->wcycle, pfl->cmd, value);
>>>
>>>     reset_flash:
>>>>      trace_pflash_reset();
>>>>      memory_region_rom_device_set_romd(&pfl->mem, true);
>>>>      pfl->wcycle = 0;
>>>> -    pfl->cmd = 0;
>>>> +    pfl->cmd = 0xff;
>>>>  }
>>>>  
>>>>  
>>>> @@ -761,7 +791,7 @@ static void pflash_cfi01_realize(DeviceState *dev, 
>>>> Error **errp)
>>>>      }
>>>>  
>>>>      pfl->wcycle = 0;
>>>> -    pfl->cmd = 0;
>>>> +    pfl->cmd = 0xff;
>>>>      pfl->status = 0;
>>>>      /* Hardcoded CFI table */
>>>>      /* Standard "QRY" string */
>>>> @@ -1001,5 +1031,14 @@ static int pflash_post_load(void *opaque, int 
>>>> version_id)
>>>>          pfl->vmstate = 
>>>> qemu_add_vm_change_state_handler(postload_update_cb,
>>>>                                                          pfl);
>>>>      }
>>>> +
>>>> +    /*
>>>> +     * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the
>>>> +     * READ_ARRAY command.
>>>> +     */
>>>> +    if (pfl->cmd == 0x00) {
>>>> +        pfl->cmd = 0xff;
>>>> +    }
>>>> +
>>>>      return 0;
>>>>  }
>> --
>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v5 2/5] hw/block/pflash_cfi01: Use the correct READ_ARRAY value

Reply via email to