Hi Havard,

On Fri, Jan 15, 2021 at 11:29 AM Havard Skinnemoen
<hskinnem...@google.com> wrote:
>
> Hi Bin,
>
> On Thu, Jan 14, 2021 at 6:08 PM Bin Meng <bmeng...@gmail.com> wrote:
> >
> > Hi Francisco,
> >
> > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > <frasse.igles...@gmail.com> wrote:
> > >
> > > Hi Bin,
> > >
> > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > From: Bin Meng <bin.m...@windriver.com>
> > > >
> > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > bytes are expected to be received after it receives a command. For
> > > > example, depending on the address mode, either 3-byte address or
> > > > 4-byte address is needed.
> > > >
> > > > For fast read family commands, some dummy cycles are required after
> > > > sending the address bytes, and the dummy cycles need to be counted
> > > > in s->needed_bytes. This is where the mess began.
> > > >
> > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > It is not in bit, or cycle. However for some reason the model has
> > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > approach is to convert the number of dummy cycles to bytes based on
> > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > >
> > > While not being the original implementor I must assume that above 
> > > solution was
> > > considered but not chosen by the developers due to it is inaccuracy (it
> > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > meaning that if the controller is wrongly programmed to generate 7 the 
> > > error
> > > wouldn't be caught and the controller will still be considered 
> > > "correct"). Now
> > > that we have this detail in the implementation I'm in favor of keeping 
> > > it, this
> > > also because the detail is already in use for catching exactly above 
> > > error.
> > >
> >
> > I found no clue from the commit message that my proposed solution here
> > was ever considered, otherwise all SPI controller models supporting
> > software generation should have been found out seriously broken long
> > time ago!
> >
> > The issue you pointed out that we require the total number of dummy
> > bits should be multiple of 8 is true, that's why I added the
> > unimplemented log message in this series (patch 2/3/4) to warn users
> > if this expectation is not met. However this will not cause any issue
> > when running U-Boot or Linux, because both spi-nor drivers expect the
> > same assumption as we do here.
> >
> > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > there is a logic to calculate the dummy bytes needed for fast read
> > command:
> >
> >     /* convert the dummy cycles to the number of bytes */
> >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> >
> > Note the default dummy cycles configuration for all flashes I have
> > looked into as of today, meets the multiple of 8 assumption. On some
> > flashes the dummy cycle number is configurable, and if it's been
> > configured to be an odd value, it would not work on U-Boot/Linux in
> > the first place.
> >
> > > >
> > > > Things get complicated when interacting with different SPI or QSPI
> > > > flash controllers. There are major two cases:
> > > >
> > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > >   For such case, driver will calculate the correct number of dummy
> > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > >   fix flashes working with such controllers.
> > >
> > > Above can be fixed while still keeping the detailed dummy cycle 
> > > implementation
> > > inside m25p80. Perhaps one of the following could be looked into: 
> > > configurating
> > > the amount, letting the spi ctrl fetch the amount from m25p80 or by 
> > > inheriting
> > > some functionality handling this in the SPI controller. Or a mixture of 
> > > above.
> >
> > Please send patches to explain this in detail how this is going to
> > work. I am open to all possible solutions.
> >
> > >
> > > > - Dummy bytes not prepared by drivers. Drivers just tell the hardware
> > > >   the dummy cycle configuration via some registers, and hardware will
> > > >   automatically generate dummy cycles for us. Fixing the m25p80 model
> > > >   is not enough, and we will need to fix the SPI/QSPI models for such
> > > >   controllers.
> > > >
> > > > This series fixes the mess in the m25p80 from the flash side first,
> > >
> > > Considering the problems solved by the solution in tree I find m25p80 
> > > pretty
> > > clean, at least I don't see any clearly better way for accurately 
> > > modeling the
> > > dummy clock cycles. Counting bits instead of bytes would for example still
> > > force the controllers to mark which bits to count (when transmitting one 
> > > dummy
> > > byte from a txfifo on four lines (Quad command) it generates 2 dummy clock
> > > cycles since it takes two cycles to transfer 8 bits).
> > >
> >
> > SPI is a bit based protocol, not bytes. If you insist on bit modeling
> > with the dummy cycles then you should also suggest we change all
> > cycles (including command/addr/dummy/data phases) to be modeled with
> > bits. That way we can accurately emulate everything, for example one
> > potential problem like transferring 9 bit in the data phase.
>
> I agree with this. There's really nothing special about dummy cycles.
> Making them special makes it super painful to implement SPI controller
> emulation because you have to anticipate when ssi_transfer changes
> semantics from byte-at-a-time to bit-at-a-time. I doubt all the SPI
> controllers in the tree gets it right all the time.
>

Yep, it's not just painful for SPI controllers, and for the case 1 SPI
controller it's impossible to snoop the data to distinguish when the
dummy cycles begin.

> > However modeling everything with bit is super inefficient. My view is
> > that we should avoid trying to support uncommon use cases (like not
> > multiple of 8 for dummy bits) in QEMU.
>
> Perhaps ssi_transfer could take an additional bits parameter? That
> should make it possible to transfer any number of bits up to 32, while
> keeping the common case simple on both sides. And it would work for
> any SPI transfer, not just dummy cycles.

This sounds like a good tradeoff from the emulator perspective. But I
am not sure we should do this to solve the dummy cycle mess given all
the default dummy cycle configurations so far match the multiple of 8
assumption.

Regards,
Bin

Reply via email to