Hello, On 4/27/21 10:54 AM, Francisco Iglesias wrote: > On [2021 Apr 27] Tue 15:56:10, Alistair Francis wrote: >> On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng...@gmail.com> wrote: >>> >>> On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng...@gmail.com> wrote: >>>> >>>> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias >>>> <frasse.igles...@gmail.com> wrote: >>>>> >>>>> Hi Bin, >>>>> >>>>> On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote: >>>>>> Hi Francisco, >>>>>> >>>>>> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias >>>>>> <frasse.igles...@gmail.com> wrote: >>>>>>> >>>>>>> Dear Bin, >>>>>>> >>>>>>> On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote: >>>>>>>> Hi Francisco, >>>>>>>> >>>>>>>> On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias >>>>>>>> <frasse.igles...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Hi Bin, >>>>>>>>> >>>>>>>>> On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote: >>>>>>>>>> Hi Francisco, >>>>>>>>>> >>>>>>>>>> On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias >>>>>>>>>> <frasse.igles...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Bin, >>>>>>>>>>> >>>>>>>>>>> On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote: >>>>>>>>>>>> Hi Francisco, >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias >>>>>>>>>>>> <frasse.igles...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Bin, >>>>>>>>>>>>> >>>>>>>>>>>>> On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote: >>>>>>>>>>>>>> Hi Francisco, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias >>>>>>>>>>>>>> <frasse.igles...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Bin, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote: >>>>>>>>>>>>>>>> From: Bin Meng <bin.m...@windriver.com> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The m25p80 model uses s->needed_bytes to indicate how many >>>>>>>>>>>>>>>> follow-up >>>>>>>>>>>>>>>> bytes are expected to be received after it receives a command. >>>>>>>>>>>>>>>> For >>>>>>>>>>>>>>>> example, depending on the address mode, either 3-byte address >>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>> 4-byte address is needed. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For fast read family commands, some dummy cycles are required >>>>>>>>>>>>>>>> after >>>>>>>>>>>>>>>> sending the address bytes, and the dummy cycles need to be >>>>>>>>>>>>>>>> counted >>>>>>>>>>>>>>>> in s->needed_bytes. This is where the mess began. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> As the variable name (needed_bytes) indicates, the unit is in >>>>>>>>>>>>>>>> byte. >>>>>>>>>>>>>>>> It is not in bit, or cycle. However for some reason the model >>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>> been using the number of dummy cycles for s->needed_bytes. The >>>>>>>>>>>>>>>> right >>>>>>>>>>>>>>>> approach is to convert the number of dummy cycles to bytes >>>>>>>>>>>>>>>> based on >>>>>>>>>>>>>>>> the SPI protocol, for example, 6 dummy cycles for the Fast >>>>>>>>>>>>>>>> Read Quad >>>>>>>>>>>>>>>> I/O (EBh) should be converted to 3 bytes per the formula (6 * >>>>>>>>>>>>>>>> 4 / 8). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> While not being the original implementor I must assume that >>>>>>>>>>>>>>> above solution was >>>>>>>>>>>>>>> considered but not chosen by the developers due to it is >>>>>>>>>>>>>>> inaccuracy (it >>>>>>>>>>>>>>> wouldn't be possible to model exacly 6 dummy cycles, only a >>>>>>>>>>>>>>> multiple of 8, >>>>>>>>>>>>>>> meaning that if the controller is wrongly programmed to >>>>>>>>>>>>>>> generate 7 the error >>>>>>>>>>>>>>> wouldn't be caught and the controller will still be considered >>>>>>>>>>>>>>> "correct"). Now >>>>>>>>>>>>>>> that we have this detail in the implementation I'm in favor of >>>>>>>>>>>>>>> keeping it, this >>>>>>>>>>>>>>> also because the detail is already in use for catching exactly >>>>>>>>>>>>>>> above error. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I found no clue from the commit message that my proposed >>>>>>>>>>>>>> solution here >>>>>>>>>>>>>> was ever considered, otherwise all SPI controller models >>>>>>>>>>>>>> supporting >>>>>>>>>>>>>> software generation should have been found out seriously broken >>>>>>>>>>>>>> long >>>>>>>>>>>>>> time ago! >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> The controllers you are referring to might lack support for >>>>>>>>>>>>> commands requiring >>>>>>>>>>>>> dummy clock cycles but I really hope they work with the other >>>>>>>>>>>>> commands? If so I >>>>>>>>>>>> >>>>>>>>>>>> I am not sure why you view dummy clock cycles as something special >>>>>>>>>>>> that needs some special support from the SPI controller. For the >>>>>>>>>>>> case >>>>>>>>>>>> 1 controller, it's nothing special from the controller perspective, >>>>>>>>>>>> just like sending out a command, or address bytes, or data. The >>>>>>>>>>>> controller just shifts data bit by bit from its tx fifo and that's >>>>>>>>>>>> it. >>>>>>>>>>>> In the Xilinx GQSPI controller case, the dummy cycles can either be >>>>>>>>>>>> sent via a regular data (the case 1 controller) in the tx fifo, or >>>>>>>>>>>> automatically generated (case 2 controller) by the hardware. >>>>>>>>>>> >>>>>>>>>>> Ok, I'll try to explain my view point a little differently. For >>>>>>>>>>> that we also >>>>>>>>>>> need to keep in mind that QEMU models HW, and any binary that runs >>>>>>>>>>> on a HW >>>>>>>>>>> board supported in QEMU should ideally run on that board inside >>>>>>>>>>> QEMU aswell >>>>>>>>>>> (this can be a bare metal application equaly well as a modified >>>>>>>>>>> u-boot/Linux >>>>>>>>>>> using SPI commands with a non multiple of 8 number of dummy clock >>>>>>>>>>> cycles). >>>>>>>>>>> >>>>>>>>>>> Once functionality has been introduced into QEMU it is not easy to >>>>>>>>>>> know which >>>>>>>>>>> intentional or untentional features provided by the functionality >>>>>>>>>>> are being >>>>>>>>>>> used by users. One of the (perhaps not well known) features I'm >>>>>>>>>>> aware of that >>>>>>>>>>> is in use and is provided by the accurate dummy clock cycle >>>>>>>>>>> modeling inside >>>>>>>>>>> m25p80 is the be ability to test drivers accurately regarding the >>>>>>>>>>> dummy clock >>>>>>>>>>> cycles (even when using commands with a non-multiple of 8 number of >>>>>>>>>>> dummy clock >>>>>>>>>>> cycles), but there might be others aswell. So by removing this >>>>>>>>>>> functionality >>>>>>>>>>> above use case will brake, this since those test will not be >>>>>>>>>>> reliable. >>>>>>>>>>> Furthermore, since users tend to be creative it is not possible to >>>>>>>>>>> know if >>>>>>>>>>> there are other use cases that will be affected. This means that in >>>>>>>>>>> case [1] >>>>>>>>>>> needs to be followed the safe path is to add functionality instead >>>>>>>>>>> of removing. >>>>>>>>>>> Luckily it also easier in this case, see below. >>>>>>>>>> >>>>>>>>>> I understand there might be users other than U-Boot/Linux that use an >>>>>>>>>> odd number of dummy bits (not multiple of 8). If your concern was >>>>>>>>>> about model behavior changes, sure I can update >>>>>>>>>> qemu/docs/system/deprecated.rst to mention that some flashes in the >>>>>>>>>> m25p80 model now implement dummy cycles as bytes. >>>>>>>>> >>>>>>>>> Yes, something like that. My concern is that since this functionality >>>>>>>>> has been >>>>>>>>> in tree for while, users have found known or unknown features that got >>>>>>>>> introduced by it. By removing the functionality (and the known/uknown >>>>>>>>> features) >>>>>>>>> we are riscing to brake our user's use cases (currently I'm aware of >>>>>>>>> one >>>>>>>>> feature/use case but it is not unlikely that there are more). [1] >>>>>>>>> states that >>>>>>>>> "In general features are intended to be supported indefinitely once >>>>>>>>> introduced >>>>>>>>> into QEMU", to me that makes very much sense because the opposite >>>>>>>>> would mean >>>>>>>>> that we were not reliable. So in case [1] needs to be honored it >>>>>>>>> looks to be >>>>>>>>> safer to add functionality instead of removing (and riscing the >>>>>>>>> removal of use >>>>>>>>> cases/features). Luckily I still believe in this case that it will be >>>>>>>>> easier to >>>>>>>>> go forward (even if I also agree on what you are saying below about >>>>>>>>> what I >>>>>>>>> proposed). >>>>>>>>> >>>>>>>> >>>>>>>> Even if the implementation is buggy and we need to keep the buggy >>>>>>>> implementation forever? I think that's why >>>>>>>> qemu/docs/system/deprecated.rst was created for deprecating such >>>>>>>> feature. >>>>>>> >>>>>>> With the RFC I posted all commands in m25p80 are working for both the >>>>>>> case 1 >>>>>>> controller (using a txfifo) and the case 2 controller (no txfifo, as >>>>>>> GQSPI). >>>>>>> Because of this, I, with all respect, will have to disagree that this >>>>>>> is buggy. >>>>>> >>>>>> Well, the existing m25p80 implementation that uses dummy cycle >>>>>> accuracy for those flashes prevents all SPI controllers that use tx >>>>>> fifo to work with those flashes. Hence it is buggy. >>>>>> >>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> don't think it is fair to call them 'seriously broken' (and else >>>>>>>>>>>>> we should >>>>>>>>>>>>> probably let the maintainers know about it). Most likely the lack >>>>>>>>>>>>> of support >>>>>>>>>>>> >>>>>>>>>>>> I called it "seriously broken" because current implementation only >>>>>>>>>>>> considered one type of SPI controllers while completely ignoring >>>>>>>>>>>> the >>>>>>>>>>>> other type. >>>>>>>>>>> >>>>>>>>>>> If we change view and see this from the perspective of m25p80, it >>>>>>>>>>> models the >>>>>>>>>>> commands a certain way and provides an API that the SPI controllers >>>>>>>>>>> need to >>>>>>>>>>> implement for interacting with it. It is true that there are SPI >>>>>>>>>>> controllers >>>>>>>>>>> referred to above that do not support the portion of that API that >>>>>>>>>>> corresponds >>>>>>>>>>> to commands with dummy clock cycles, but I don't think it is true >>>>>>>>>>> that this is >>>>>>>>>>> broken since there is also one SPI controller that has a working >>>>>>>>>>> implementation >>>>>>>>>>> of m25p80's full API also when transfering through a tx fifo (use >>>>>>>>>>> case 1). But >>>>>>>>>>> as mentioned above, by doing a minor extension and improvement to >>>>>>>>>>> m25p80's API >>>>>>>>>>> and allow for toggling the accuracy from dummy clock cycles to >>>>>>>>>>> dummy bytes [1] >>>>>>>>>>> will still be honored as in the same time making it possible to >>>>>>>>>>> have full >>>>>>>>>>> support for the API in the SPI controllers that currently do not >>>>>>>>>>> (please reread >>>>>>>>>>> the proposal in my previous reply that attempts to do this). I >>>>>>>>>>> myself see this >>>>>>>>>>> as win/win situation, also because no controller should need >>>>>>>>>>> modifications. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I am afraid your proposal does not work. Your proposed new device >>>>>>>>>> property 'model_dummy_bytes' to select to convert the accurate dummy >>>>>>>>>> clock cycle count to dummy bytes inside m25p80, is hard to justify as >>>>>>>>>> a property to the flash itself, as the behavior is tightly coupled to >>>>>>>>>> how the SPI controller works. >>>>>>>>> >>>>>>>>> I agree on above. I decided though that instead of posting sample >>>>>>>>> code in here >>>>>>>>> I'll post an RFC with hopefully an improved proposal. I'll cc you. >>>>>>>>> About below, >>>>>>>>> Xilinx ZynqMP GQSPI should not need any modication in a first step. >>>>>>>>> >>>>>>>> >>>>>>>> Wait, (see below) >>>>>>>> >>>>>>>>>> >>>>>>>>>> Please take a look at the Xilinx GQSPI controller, which supports >>>>>>>>>> both >>>>>>>>>> use cases, that the dummy cycles can be transferred via tx fifo, or >>>>>>>>>> generated by the controller automatically. Please read the example >>>>>>>>>> given in: >>>>>>>>>> >>>>>>>>>> table 24‐22, an example of Generic FIFO Contents for Quad I/O >>>>>>>>>> Read >>>>>>>>>> Command (EBh) >>>>>>>>>> >>>>>>>>>> in >>>>>>>>>> https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf >>>>>>>>>> >>>>>>>>>> If you choose to set the m25p80 device property 'model_dummy_bytes' >>>>>>>>>> to >>>>>>>>>> true when working with the Xilinx GQSPI controller, you are bound to >>>>>>>>>> only allow guest software to use tx fifo to transfer the dummy >>>>>>>>>> cycles, >>>>>>>>>> and this is wrong. >>>>>>>>>> >>>>>>>> >>>>>>>> You missed this part. I looked at your RFC, and as I mentioned above >>>>>>>> your proposal cannot support the complicated controller like Xilinx >>>>>>>> GQSPI. Please read the example of table 24-22. With your RFC, you >>>>>>>> mandate guest software's GQSPI driver to only use hardware dummy cycle >>>>>>>> generation, which is wrong. >>>>>>>> >>>>>>> >>>>>>> First, thank you very much for looking into the RFC series, very much >>>>>>> appreciated. Secondly, about above, the GQSPI model in QEMU transfers >>>>>>> from 2 >>>>>>> locations in the file, in 1 location the transfer referred to above is >>>>>>> done, in >>>>>>> another location the transfer through the txfifo is done. The location >>>>>>> where >>>>>>> transfer referred to above is done will not need any modifications (and >>>>>>> will >>>>>>> thus work equally well as it does currently). >>>>>> >>>>>> Please explain this a little bit. How does your RFC series handle >>>>>> cases as described in table 24-22, where the 6 dummy cycles are split >>>>>> into 2 transfers, with one transfer using tx fifo, and the other one >>>>>> using hardware dummy cycle generation? >>>>> >>>>> Sorry, I missunderstod. You are right, that won't work. >>>> >>>> +Edgar E. Iglesias >>>> >>>> So it looks by far the only way to implement dummy cycles correctly to >>>> work with all SPI controller models is what I proposed here in this >>>> patch series. >>>> >>>> Maintainers are quite silent, so I would like to hear your thoughts. >>>> >>>> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you >>>> please share your thoughts since you are the one who reviewed the >>>> existing dummy implementation (based on commits history) >> >> I agree with Edgar, in that Francisco and Bin know this better than me >> and that modelling things in cycles is a pain. > > Hi Alistair, > >> >> As Bin points out it seems like currently we should be modelling bytes >> (from the variable name) so it makes sense to keep it in bytes. I >> would be in favour of this series in that case. Do we know what use >> cases this will break? I know it's hard to answer but I don't think >> there are too many SSI users in QEMU so it might not be too hard to >> test most of the possible use cases. > > The use case I'm aware of is regression testing of drivers. Ex: if a > driver is using 10 dummy clock cycles with the commands and a patch > accidentaly changes the driver to use 11 dummy clock cycles QEMU currently > finds the problem, that won't be possible with this series. It's difficult > to say but it is not impossible there are other use cases also.
It was breaking the Aspeed machines : https://lore.kernel.org/qemu-devel/78a12882-1303-dd6d-6619-96c5e2cbf...@kaod.org/ QEMU 6.1 should have acceptance tests that will help in detecting regressions in this area. Thanks, C. > > More importantly IMO though is that the current use cases can be keept > while still providing support for commands with dummy clock cycles into > the QEMU SPI controllers lacking at the moment. > > (If I recall correctly this series might also have another issue regarding > the GQSPI SPI mode configuration, with that it is possible transmit 8 > dummy clock cycles as 1 data byte, 2 data bytes or 4 data bytes, so I > think some form of calculation might be needed inside m25p80). > > Best regards, > Francisco > > >> >> Alistair >> >>> >>> Hello maintainers, >>> >>> We apparently missed the 6.0 window to address this mess of the m25p80 >>> model. Please provide your inputs on this before I start working on >>> the v2. >>> >>> Regards, >>> Bin >>>