I agree with Tim and will try to provide more requested information and
testing with those CLs.

> Only your reasoning "maintain the code symmetry" is not enough to justify
changes.

Kindly ignore my #2 comment, re-reading it doesn't make sense to me :) to
maintain the symmetry is the real reason for those code changes.

Looking for review comments and I will address those earliest.

Thanks for having good discussion here.

Thanks,
Subrata


On Thu, Mar 17, 2022 at 10:24 PM Tim Wawrzynczak <twawrzync...@google.com>
wrote:

>
>
> On Thu, Mar 17, 2022 at 10:35 AM Nico Huber <nic...@gmx.de> wrote:
>
>> On 17.03.22 15:35, Subrata Banik wrote:
>> > On Thu, Mar 17, 2022 at 6:08 PM Nico Huber <nic...@gmx.de> wrote:
>> >> On 16.03.22 17:19, Subrata Banik wrote:
>> >>> The reason for this refactoring of HW sequencing SPI driver code are:
>> >>> 1. We (Chrome OS team) recently ran into a firmware update issue on
>> >>> dogfooders (test device utilise by real users and share feedback)
>> >>> systems,
>> >>> where the attempt to perform SPI flash update operation from host-cpu
>> >>> (using underlying flashrom utility) is silently failing. Furthermore,
>> we
>> >>> debug the issue with help of Intel and figure out the problem is
>> related
>> >>> to
>> >>> multiple master is accessing the SPI flash and operation triggered by
>> >>> host-cpu side using flashrom (write and erase operation) is getting
>> timed
>> >>> out (due to given lesser timeout boundary) due to underlying SPI bus
>> is
>> >>> occupied by other master.
>> >>
>> >> Does the issue persist with the following commit applied to
>> libflashrom,
>> >> activated and integrated into futility?
>> >>
>> >>   commit 330088d77 [CHROMIUM]: libflashrom: Also use USE_BIG_LOCK
>> >>
>> >> This would give us a hint if the issue is indeed due to multiple
>> masters
>> >> or actually due to multiple programs trying to control the same master.
>> >>
>> > [Subrata] Yes, we are seeing failure even with USE_BIG_LOCK code changes
>> > being integrated. The only code change that helps us to W/A this issue
>> is
>> > here
>> >
>> https://chromium-review.googlesource.com/c/chromiumos/third_party/flashrom/+/3498895
>> > (increase
>> > timeout in flashrom operations). Also, Intel disable the CSE runtime
>> access
>> > to SPI using a custom tool and that helped to fix this update failure as
>> > well. These are the evidence towards multi master theory in AU failure
>> > issue.
>>
>> I see. I was probably too distracted by the refactoring patches earlier.
>> CB:62867 is indeed necessary.
>>
>
> We seem to have actually figured out what the actual root cause of our
> problem was, and it indeed
> does seem to be the multi-master problem and flashrom not waiting long
> enough in these cases.
> tl;dr widevine provisioning was failing, and repeating endlessly in a loop
> (~each second or so) and
> each attempt actually causes the CSE to write to its data area in flash.
> Unfortunately, we do not have
> fantastic visibility into the processes that are running during AU, and so
> this was very hard to track down.
> We have fixed the issues with widevine provisioning and also added a max 5
> attempts to provision a device
> instead of writing endlessly to the flash in the failure case.
>
>
>>
>> We already assumed a 5s timeout is needed for erasing a block. Worst
>> case seems to be that all masters try to do this at once and then one
>> master would have to wait (NM-1)*5s before its transaction even starts
>> (NM being the number of masters).
>>
>> We could make it depend on the number of masters, or just choose some-
>> thing safe like 30s. WDYT?
>>
>>
> I think the longer the timeout (until it gets ridiculous), probably the
> better. Unless there is buggy hardware
> out there, there is no good reason for these operations to timeout; there
> could for sure be SPI flash
> chips out there that perhaps aren't always performing to-spec, who knows.
>
> But in this case, since do we know the maximum number of masters, and 5s
> per erase seems reasonable too,
> then 30 seconds makes sense to me.
>
>
>> >> This is the special case, starting from Tiger Lake Chrome Platform,
>> where
>> >>> we have enable the PVAP (Protected Audio Video Path) which request
>> CSE to
>> >>> perform some erase and write operation as needed.
>> >>
>> >> We used flashrom on PAVP enabled systems for more than a decade without
>> >> issues. Did something change in that area for Tiger Lake in particular?
>> >>
>> > [Subrata]  I'm not sure about other platforms, but from TGL onwards, on
>> > Chrome devices we have enabled PAVP use case and Intel had tried to
>> > replicate this issue on older platforms as well and informed us it's
>> > replicated the issue on TGL platform as well where PAVP first being
>> enabled
>> > and not on JSL (which doesn't enable PAVP). Right now the recommendation
>> > that is coming out from Intel is that, on latest Chrome OS devices,
>> there
>> > might be concurrent SPI accesses between host CPU and CSE. But I will
>> ask
>> > this question about the older generation platform as you have
>> highlighted
>> > which has PAVP enabled. Can you please let me know those platform names
>> ?
>> > Also,  do you know on those devices if we are trying to update the FW
>> using
>> > flashrom tool as well or not?
>>
>> Thanks for looking into it. As our timeouts were way below the worst
>> case, I assume enabling PAVP just makes the problem more visible and is
>> not directly related. It could also be any other code running on the CSE
>> that changed between JSL and TGL (unless you know, ofc., that PAVP is
>> the only code that writes to flash).
>>
>>
> PAVP is the only thing we know of ATM that writes to the flash aside from
> perhaps after the first flash of the ME region.
>
> The reason this became visible on brya only now I explained just above :)
>
>
>> Flashrom basically runs on everything so the platform names would be all
>> platforms that support PAVP. Probably not worth looking further into it
>> unless there are still issues with a reasonable timeout.
>>
>> >
>> >>
>> >>> 2. Maintain the code symmetry between coreboot and flashrom SPI
>> hardware
>> >>> sequencing operations.
>> >>
>> >> Flashrom has tighter requirements compared to coreboot because we try
>> to
>> >> support all compatible chipset generations with the same code. If you
>> >> want to align the two code-bases, please use flashrom as reference and
>> >> patch coreboot.
>> >>
>> > [Subrata] Let me understand this a little better, are you suggesting
>> not to
>> > increase the SPI operation timing in flashrom upstream master ? knowing
>> > that there is a hidden issue acknowledged by Intel with their validation
>> > data, also, promised to update the SPI programming guide to capture the
>> > recommendation of timeout in multi master scenarios.
>>
>> No I'm not suggesting any such thing. What I wrote was related to your
>> 2. point as quoted above.
>>
>> >
>> > Apart from timeout increasing CLs, other code changes are to increase
>> the
>> > code modularity (like using macro and helper function), are you
>> suggesting
>> > to not touch the flashrom code? can't we all contribute towards
>> validating
>> > the concerned target platforms. At Least in the coreboot community we
>> all
>> > help each other to validate the platform if we are out of required
>> devices.
>>
>> We can have a look at the individual changes if there is a reason to
>> change the code, i.e. to make it better. Only your reasoning "maintain
>> the code symmetry" is not enough to justify changes.
>>
>> I'm sorry if this affects your work, but we have very bad experience
>> with unnecessary code changes by CrOS developers. Currently `sb600spi`
>> (the equivalent to `ichspi` for AMD platforms) is broken since months
>> and only CrOS developers could clarify why the code was changed as it
>> was. We need to prevent that this happens to `ichspi` too, right?
>>
>> Nico
>>
>
> We are certainly not intending to break ichspi or make it worse, we are
> trying to make it better, perhaps we were a little
> hasty in trying to get through solutions that were not perhaps the root of
> the problem, but were discovered along the way.
>
> We are also working with Intel to get some documentation out about the
> expected behaviors with regard to the
> multi-master scenarios, hopefully that will help clarify things some in
> this regard.
>
> -Tim
>
_______________________________________________
flashrom mailing list -- flashrom@flashrom.org
To unsubscribe send an email to flashrom-le...@flashrom.org

Reply via email to