I agree with Tim and will try to provide more requested information and testing with those CLs.
> Only your reasoning "maintain the code symmetry" is not enough to justify changes. Kindly ignore my #2 comment, re-reading it doesn't make sense to me :) to maintain the symmetry is the real reason for those code changes. Looking for review comments and I will address those earliest. Thanks for having good discussion here. Thanks, Subrata On Thu, Mar 17, 2022 at 10:24 PM Tim Wawrzynczak <twawrzync...@google.com> wrote: > > > On Thu, Mar 17, 2022 at 10:35 AM Nico Huber <nic...@gmx.de> wrote: > >> On 17.03.22 15:35, Subrata Banik wrote: >> > On Thu, Mar 17, 2022 at 6:08 PM Nico Huber <nic...@gmx.de> wrote: >> >> On 16.03.22 17:19, Subrata Banik wrote: >> >>> The reason for this refactoring of HW sequencing SPI driver code are: >> >>> 1. We (Chrome OS team) recently ran into a firmware update issue on >> >>> dogfooders (test device utilise by real users and share feedback) >> >>> systems, >> >>> where the attempt to perform SPI flash update operation from host-cpu >> >>> (using underlying flashrom utility) is silently failing. Furthermore, >> we >> >>> debug the issue with help of Intel and figure out the problem is >> related >> >>> to >> >>> multiple master is accessing the SPI flash and operation triggered by >> >>> host-cpu side using flashrom (write and erase operation) is getting >> timed >> >>> out (due to given lesser timeout boundary) due to underlying SPI bus >> is >> >>> occupied by other master. >> >> >> >> Does the issue persist with the following commit applied to >> libflashrom, >> >> activated and integrated into futility? >> >> >> >> commit 330088d77 [CHROMIUM]: libflashrom: Also use USE_BIG_LOCK >> >> >> >> This would give us a hint if the issue is indeed due to multiple >> masters >> >> or actually due to multiple programs trying to control the same master. >> >> >> > [Subrata] Yes, we are seeing failure even with USE_BIG_LOCK code changes >> > being integrated. The only code change that helps us to W/A this issue >> is >> > here >> > >> https://chromium-review.googlesource.com/c/chromiumos/third_party/flashrom/+/3498895 >> > (increase >> > timeout in flashrom operations). Also, Intel disable the CSE runtime >> access >> > to SPI using a custom tool and that helped to fix this update failure as >> > well. These are the evidence towards multi master theory in AU failure >> > issue. >> >> I see. I was probably too distracted by the refactoring patches earlier. >> CB:62867 is indeed necessary. >> > > We seem to have actually figured out what the actual root cause of our > problem was, and it indeed > does seem to be the multi-master problem and flashrom not waiting long > enough in these cases. > tl;dr widevine provisioning was failing, and repeating endlessly in a loop > (~each second or so) and > each attempt actually causes the CSE to write to its data area in flash. > Unfortunately, we do not have > fantastic visibility into the processes that are running during AU, and so > this was very hard to track down. > We have fixed the issues with widevine provisioning and also added a max 5 > attempts to provision a device > instead of writing endlessly to the flash in the failure case. > > >> >> We already assumed a 5s timeout is needed for erasing a block. Worst >> case seems to be that all masters try to do this at once and then one >> master would have to wait (NM-1)*5s before its transaction even starts >> (NM being the number of masters). >> >> We could make it depend on the number of masters, or just choose some- >> thing safe like 30s. WDYT? >> >> > I think the longer the timeout (until it gets ridiculous), probably the > better. Unless there is buggy hardware > out there, there is no good reason for these operations to timeout; there > could for sure be SPI flash > chips out there that perhaps aren't always performing to-spec, who knows. > > But in this case, since do we know the maximum number of masters, and 5s > per erase seems reasonable too, > then 30 seconds makes sense to me. > > >> >> This is the special case, starting from Tiger Lake Chrome Platform, >> where >> >>> we have enable the PVAP (Protected Audio Video Path) which request >> CSE to >> >>> perform some erase and write operation as needed. >> >> >> >> We used flashrom on PAVP enabled systems for more than a decade without >> >> issues. Did something change in that area for Tiger Lake in particular? >> >> >> > [Subrata] I'm not sure about other platforms, but from TGL onwards, on >> > Chrome devices we have enabled PAVP use case and Intel had tried to >> > replicate this issue on older platforms as well and informed us it's >> > replicated the issue on TGL platform as well where PAVP first being >> enabled >> > and not on JSL (which doesn't enable PAVP). Right now the recommendation >> > that is coming out from Intel is that, on latest Chrome OS devices, >> there >> > might be concurrent SPI accesses between host CPU and CSE. But I will >> ask >> > this question about the older generation platform as you have >> highlighted >> > which has PAVP enabled. Can you please let me know those platform names >> ? >> > Also, do you know on those devices if we are trying to update the FW >> using >> > flashrom tool as well or not? >> >> Thanks for looking into it. As our timeouts were way below the worst >> case, I assume enabling PAVP just makes the problem more visible and is >> not directly related. It could also be any other code running on the CSE >> that changed between JSL and TGL (unless you know, ofc., that PAVP is >> the only code that writes to flash). >> >> > PAVP is the only thing we know of ATM that writes to the flash aside from > perhaps after the first flash of the ME region. > > The reason this became visible on brya only now I explained just above :) > > >> Flashrom basically runs on everything so the platform names would be all >> platforms that support PAVP. Probably not worth looking further into it >> unless there are still issues with a reasonable timeout. >> >> > >> >> >> >>> 2. Maintain the code symmetry between coreboot and flashrom SPI >> hardware >> >>> sequencing operations. >> >> >> >> Flashrom has tighter requirements compared to coreboot because we try >> to >> >> support all compatible chipset generations with the same code. If you >> >> want to align the two code-bases, please use flashrom as reference and >> >> patch coreboot. >> >> >> > [Subrata] Let me understand this a little better, are you suggesting >> not to >> > increase the SPI operation timing in flashrom upstream master ? knowing >> > that there is a hidden issue acknowledged by Intel with their validation >> > data, also, promised to update the SPI programming guide to capture the >> > recommendation of timeout in multi master scenarios. >> >> No I'm not suggesting any such thing. What I wrote was related to your >> 2. point as quoted above. >> >> > >> > Apart from timeout increasing CLs, other code changes are to increase >> the >> > code modularity (like using macro and helper function), are you >> suggesting >> > to not touch the flashrom code? can't we all contribute towards >> validating >> > the concerned target platforms. At Least in the coreboot community we >> all >> > help each other to validate the platform if we are out of required >> devices. >> >> We can have a look at the individual changes if there is a reason to >> change the code, i.e. to make it better. Only your reasoning "maintain >> the code symmetry" is not enough to justify changes. >> >> I'm sorry if this affects your work, but we have very bad experience >> with unnecessary code changes by CrOS developers. Currently `sb600spi` >> (the equivalent to `ichspi` for AMD platforms) is broken since months >> and only CrOS developers could clarify why the code was changed as it >> was. We need to prevent that this happens to `ichspi` too, right? >> >> Nico >> > > We are certainly not intending to break ichspi or make it worse, we are > trying to make it better, perhaps we were a little > hasty in trying to get through solutions that were not perhaps the root of > the problem, but were discovered along the way. > > We are also working with Intel to get some documentation out about the > expected behaviors with regard to the > multi-master scenarios, hopefully that will help clarify things some in > this regard. > > -Tim >
_______________________________________________ flashrom mailing list -- flashrom@flashrom.org To unsubscribe send an email to flashrom-le...@flashrom.org