[flashrom] Sharing info about flashrom HW seq code refactoring https://review.coreboot.org/q/topic:PCH_HW_SEQ_Cleanup

2022-03-17 Thread Subrata Banik via flashrom
Hello Team,

Hope you are doing well.
I would like bring this activity in your notice
https://review.coreboot.org/q/topic:PCH_HW_SEQ_Cleanup and seek help to
review the code.

The reason for this refactoring of HW sequencing SPI driver code are:
1. We (Chrome OS team) recently ran into a firmware update issue on
dogfooders (test device utilise by real users and share feedback) systems,
where the attempt to perform SPI flash update operation from host-cpu
(using underlying flashrom utility) is silently failing. Furthermore, we
debug the issue with help of Intel and figure out the problem is related to
multiple master is accessing the SPI flash and operation triggered by
host-cpu side using flashrom (write and erase operation) is getting timed
out (due to given lesser timeout boundary) due to underlying SPI bus is
occupied by other master.
This is the special case, starting from Tiger Lake Chrome Platform, where
we have enable the PVAP (Protected Audio Video Path) which request CSE to
perform some erase and write operation as needed. Now an attempt to perform
firmware update from host-cpu side might coincide with the CSE performing
SPI operations. The recommendation that coming out from Intel is to account
the multiple master SPI operation while we are issuing the timeout from
flashrom side. We are also expecting a doc update from Intel to share the
exact timeout calculation (although I have attempted to document that early
here https://review.coreboot.org/c/flashrom/+/62867/1/ichspi.c#35).
2. Maintain the code symmetry between coreboot and flashrom SPI hardware
sequencing operations.

>From the testing side, I have tested this code on Alderlake `Brya`,
Tigerlake `volteer` and Kabylake `eve` devices, and looking for help from
community to check on other systems as well (if possible).
Furthermore, I have attempted to increase the timeout which won't impact
the flashrom performance on the older and/or existing systems.

Looking forward for your help on this request.

Thanks,
Subrata
___
flashrom mailing list -- flashrom@flashrom.org
To unsubscribe send an email to flashrom-le...@flashrom.org


[flashrom] Re: Gatekeeping, ACLs and Review Rules

2022-03-17 Thread Anastasia Klimchuk
“For example, the flashrom project doesn't require that all comments
be resolved before merge.  That can be enabled if you'd like, but currently
it
isn't.”

Oh this explains! I was wondering where those “patches merged with
unresolved comments” are coming from. I am 100% voting for this setting to
be enabled. It does not prevent from just clicking Ack on all comments, if
needed. But at least, comments won’t be lost unintentionally.

And then, we can for example agree (if we agree) that if the Reviewer wants
a response on the comment, they mark it Unresolved (it will be yellow
colour).
It doesn’t solve all the problems, but at least, it guards unintentional
mistakes.

“it might help to have a weekly or bi-weekly meeting *just* to discuss
patches face-to-face.”

I would agree with that. This is *another meeting* yes! But looking at how
things are going right now, a problematic patch takes so much unnecessary
time, it can be weeks or even months, as I said, unnecessary extra time.
And it starts involving lots of people, many of those people should not be
involved at all. And worst of all, it can cause emotional damage to people
involved. This is all wrong. We need to ask others for sure, but for
myself, I am happy to have another meeting, because it has a chance to
remove those problems. A patch should be a purely technical question.

“Comparing the two roles there is another important point: One can't
review their own commits but a core developer can submit their own
commits.”

This is an important point. Reviewer is a very important second pair of
eyes.
I see my previous point about “Reviewed-by” tag is not that strong :) But
your one is.

“It's kind of admin access but without
being admin for the whole Gerrit instance. I never tried, but I assume
we could remove a -2. In any case this group can change the rules, so
it's only a question of how convenient it would be to override a -2.”

This is good, so there is a way to remove a -2.
I am still in the same position that “merging” and “block from merging” are
similar powers with + and - sign, and it makes sense to have it in one
group. If this group (very small!) has no trust in each other, let’s solve
it.

“And there is also a huge psychological component. When somebody knows
they have the time to fix things after a revert, that seems fine. But if
one doesn't have the time, it can really hurt to see something reverted.”

But wait, one needs to fix things if a patch breaks something? It is
technically a choice between “revert and fix” or “fix”? In both cases,
there is a “fix” involved…

“That's probably why I'm so afraid of rules ;) start with one and you'll
have to add a dozen more. We'd have to properly formalize everything
around the groups to avoid regressions.”

I looked once again in my “what can go wrong” scenarios, and now I think
that #2 and #3 can be solved by having clear rules on when someone can be
added to Reviewers and Committers (and those clear rules are published so
everyone can see). Let’s brainstorm some clear rules? :)
The point #1 can be solved by a nice message which explains everything and
calms down people ;)
So actually, maybe it’s not so bad :)

“Technically, I'm beyond the scale already. But I don't want to burden
you with that story. There is not much to worry about right now.”

If I can help by listening, just tell me!

“Today, the most unsettling thing seems to me is that we spend a lot of
time to
address problems that people may have accidentally imported into the
project.”

I think this is a connection to a second topic “Release preparations” that
I will respond to next :)
But what I think, the problems need to be listed, described and assigned.
Lots of these you did already!

“I've noticed something related in reviews over the years, though. Some-
times when reviewers give a lot of comments on Gerrit, among them some
critical ones about the overall patch and a lot of nits, the author
tends to fix the nits and ignore the critical comments.”

We can formalise this. Gerrit has yellow and grey comments. Yellow are
supposed to be blocking (major) and grey are non-blocking (minor). We can,
at the very least, state very clearly that yellow comments cannot be
ignored (and Martin said it can be enforced in Gerrit!). It is still
possible to give some random answer, but… better than nothing.

And that’s what Greg said as well, findings can be major and minor.
Although in my head I see it as “comments that I expect a reply” and “just
comments”. This is why I almost always have yellow comments: I expect a
reply :P Unless it is something like “Awesome work thanks!” etc.

And in addition to everything that I said, I have another thought. I hear
the words “right direction” and “wrong direction” from time to time (in
gerrit comments, and in this thread). And the thing with those words… They
are deceptive, because everyone has their own idea in their own head on
what is right / wrong direction, and it seems so obvious to the person, and
no one

[flashrom] Re: Sharing info about flashrom HW seq code refactoring https://review.coreboot.org/q/topic:PCH_HW_SEQ_Cleanup

2022-03-17 Thread Nico Huber
Hi Subrata,

On 16.03.22 17:19, Subrata Banik wrote:
> Hope you are doing well.
> I would like bring this activity in your notice
> https://review.coreboot.org/q/topic:PCH_HW_SEQ_Cleanup and seek help to
> review the code.

thank you for bringing this to the mailing list. Not only but also
because `ichspi` is maybe the most important driver in flashrom it
is important to discuss changes very early. We have to avoid re-
gressions at all cost, so all changes need to be fully understood
before it makes sense to look into the implementation details.

> The reason for this refactoring of HW sequencing SPI driver code are:
> 1. We (Chrome OS team) recently ran into a firmware update issue on
> dogfooders (test device utilise by real users and share feedback) systems,
> where the attempt to perform SPI flash update operation from host-cpu
> (using underlying flashrom utility) is silently failing. Furthermore, we
> debug the issue with help of Intel and figure out the problem is related to
> multiple master is accessing the SPI flash and operation triggered by
> host-cpu side using flashrom (write and erase operation) is getting timed
> out (due to given lesser timeout boundary) due to underlying SPI bus is
> occupied by other master.

Does the issue persist with the following commit applied to libflashrom,
activated and integrated into futility?

  commit 330088d77 [CHROMIUM]: libflashrom: Also use USE_BIG_LOCK

This would give us a hint if the issue is indeed due to multiple masters
or actually due to multiple programs trying to control the same master.

> This is the special case, starting from Tiger Lake Chrome Platform, where
> we have enable the PVAP (Protected Audio Video Path) which request CSE to
> perform some erase and write operation as needed.

We used flashrom on PAVP enabled systems for more than a decade without
issues. Did something change in that area for Tiger Lake in particular?

> 2. Maintain the code symmetry between coreboot and flashrom SPI hardware
> sequencing operations.

Flashrom has tighter requirements compared to coreboot because we try to
support all compatible chipset generations with the same code. If you
want to align the two code-bases, please use flashrom as reference and
patch coreboot.

Also, if you want to save some time in the long run: If we'd focus on
the flashrom code-base, e.g. prepare libflashrom for easier integration
into embedded environments, we could drop the redundant code in core-
boot.

Cheers,
Nico
___
flashrom mailing list -- flashrom@flashrom.org
To unsubscribe send an email to flashrom-le...@flashrom.org


[flashrom] Re: Gatekeeping, ACLs and Review Rules

2022-03-17 Thread Nico Huber
Hi Anastasia,

On 17.03.22 12:32, Anastasia Klimchuk wrote:
>> For example, the flashrom project doesn't require that all comments
>> be resolved before merge.  That can be enabled if you'd like, but currently
>> it isn't.
>
> Oh this explains! I was wondering where those “patches merged with
> unresolved comments” are coming from. I am 100% voting for this setting to
> be enabled. It does not prevent from just clicking Ack on all comments, if
> needed. But at least, comments won’t be lost unintentionally.
>

and I vote 100% against it. FWIW, this feature wakes our worst in-
stincts. No matter how convinced we are that review is also for our
own good, getting a commit merged always feels rewarding. And many
people act like it's a necessity to set the resolved tick to gain
that reward. I've seen people doing so with every single comment they
wrote since day 1 when this feature was activated for coreboot. Let's
not repeat every mistake made there.

Also, today Gerrit warns about unresolved comment threads when one
tries to submit. Enforcing it brings a marginal convenience but makes
the review process less friendly in my experience.

Sometimes comment threads are opened after review. So the `merged`
status never implies all-comments-resolved. And that's not bad, IMHO.
Sometimes things come up after review and Gerrit makes it easy to
continue to discuss changes even after they are merged.

> And then, we can for example agree (if we agree) that if the Reviewer wants
> a response on the comment, they mark it Unresolved (it will be yellow
> colour).
> It doesn’t solve all the problems, but at least, it guards unintentional
> mistakes.
>
>> it might help to have a weekly or bi-weekly meeting *just* to discuss
>> patches face-to-face.
>
> I would agree with that. This is *another meeting* yes! But looking at how
> things are going right now, a problematic patch takes so much unnecessary
> time, it can be weeks or even months, as I said, unnecessary extra time.

IMO, problematic patches should not be escalated but avoided in the
first place. In our wiki it says in multiple places that one should
discuss things on the mailing list early so nobody gets frustrated
later. Such problematic reviews is exactly why it's stated there.
Open-source development is nothing new. Many problems have been fixed
in the past already. Please always encourage everybody to try the
established ways before experimenting with new ones.

There also seems to be a misunderstanding about patches. Some people
seem to believe that every patch should get merged eventually. This
is practically impossible. We can't merge every single line that was
and will be written about flashrom into one project. Not everything
is correct and not everything that is is compatible. Sometimes the
most efficient thing to do with a patch is to abandon it.

Sorry if I seem worked up over this. FWIW, such problematic reviews
are one thing that's been congesting the project over the past three
years. Of course, we can all try to learn from such reviews, but it
needs to pay out for the project somehow. Otherwise, it would be more
efficient and less frustrating if the reviewers would write everything.

> And it starts involving lots of people, many of those people should not be
> involved at all. And worst of all, it can cause emotional damage to people
> involved. This is all wrong. We need to ask others for sure, but for
> myself, I am happy to have another meeting, because it has a chance to
> remove those problems. A patch should be a purely technical question.
>
>> Comparing the two roles there is another important point: One can't
>> review their own commits but a core developer can submit their own
>> commits.
>
> This is an important point. Reviewer is a very important second pair of
> eyes.
> I see my previous point about “Reviewed-by” tag is not that strong :) But
> your one is.
>
>> It's kind of admin access but without
>> being admin for the whole Gerrit instance. I never tried, but I assume
>> we could remove a -2. In any case this group can change the rules, so
>> it's only a question of how convenient it would be to override a -2.
>
> This is good, so there is a way to remove a -2.
> I am still in the same position that “merging” and “block from merging” are
> similar powers with + and - sign, and it makes sense to have it in one
> group. If this group (very small!) has no trust in each other, let’s solve
> it.
>
>> And there is also a huge psychological component. When somebody knows
>> they have the time to fix things after a revert, that seems fine. But if
>> one doesn't have the time, it can really hurt to see something reverted.
>
> But wait, one needs to fix things if a patch breaks something? It is
> technically a choice between “revert and fix” or “fix”? In both cases,
> there is a “fix” involved…

It's a matter of time and responsibility. Who will fix it and when? Will
the one who broke things fix them? or the one who discovers the problem?
or may

[flashrom] Re: Release preparations

2022-03-17 Thread Nico Huber
On 13.03.22 16:28, Nico Huber wrote:
> On 13.03.22 08:28, Anastasia Klimchuk wrote:
>>> I suggest that we freeze the master branch for everything that is neither
>>> a fix nor on the list (or a similar case that I missed)
>>
>> But how can we freeze master… that would mean no one can do any work? Maybe
>> I am missing something?
>
> No you didn't miss anything :) it would be a very desperate
> measure. However, I see no other solution to make progress
> again without forking or further stalling a release. And
> after 2 years I think the project has waited long enough.

Sorry folks, I didn't mean to provoke any hasty activity to fix the
listed problems. Rather to push us to talk about what we should fix
before a release and what features we could do without in a release.
If we'd wait for everything to be fixed at snail pace, I fear people
would give up before another release happens (I would). Not to mention
that people will continue to push more controversial patches in the
meantime.

Nico
___
flashrom mailing list -- flashrom@flashrom.org
To unsubscribe send an email to flashrom-le...@flashrom.org


[flashrom] Re: Sharing info about flashrom HW seq code refactoring https://review.coreboot.org/q/topic:PCH_HW_SEQ_Cleanup

2022-03-17 Thread Nico Huber
On 17.03.22 15:35, Subrata Banik wrote:
> On Thu, Mar 17, 2022 at 6:08 PM Nico Huber  wrote:
>> On 16.03.22 17:19, Subrata Banik wrote:
>>> The reason for this refactoring of HW sequencing SPI driver code are:
>>> 1. We (Chrome OS team) recently ran into a firmware update issue on
>>> dogfooders (test device utilise by real users and share feedback)
>>> systems,
>>> where the attempt to perform SPI flash update operation from host-cpu
>>> (using underlying flashrom utility) is silently failing. Furthermore, we
>>> debug the issue with help of Intel and figure out the problem is related
>>> to
>>> multiple master is accessing the SPI flash and operation triggered by
>>> host-cpu side using flashrom (write and erase operation) is getting timed
>>> out (due to given lesser timeout boundary) due to underlying SPI bus is
>>> occupied by other master.
>>
>> Does the issue persist with the following commit applied to libflashrom,
>> activated and integrated into futility?
>>
>>   commit 330088d77 [CHROMIUM]: libflashrom: Also use USE_BIG_LOCK
>>
>> This would give us a hint if the issue is indeed due to multiple masters
>> or actually due to multiple programs trying to control the same master.
>>
> [Subrata] Yes, we are seeing failure even with USE_BIG_LOCK code changes
> being integrated. The only code change that helps us to W/A this issue is
> here
> https://chromium-review.googlesource.com/c/chromiumos/third_party/flashrom/+/3498895
> (increase
> timeout in flashrom operations). Also, Intel disable the CSE runtime access
> to SPI using a custom tool and that helped to fix this update failure as
> well. These are the evidence towards multi master theory in AU failure
> issue.

I see. I was probably too distracted by the refactoring patches earlier.
CB:62867 is indeed necessary.

We already assumed a 5s timeout is needed for erasing a block. Worst
case seems to be that all masters try to do this at once and then one
master would have to wait (NM-1)*5s before its transaction even starts
(NM being the number of masters).

We could make it depend on the number of masters, or just choose some-
thing safe like 30s. WDYT?

>> This is the special case, starting from Tiger Lake Chrome Platform, where
>>> we have enable the PVAP (Protected Audio Video Path) which request CSE to
>>> perform some erase and write operation as needed.
>>
>> We used flashrom on PAVP enabled systems for more than a decade without
>> issues. Did something change in that area for Tiger Lake in particular?
>>
> [Subrata]  I'm not sure about other platforms, but from TGL onwards, on
> Chrome devices we have enabled PAVP use case and Intel had tried to
> replicate this issue on older platforms as well and informed us it's
> replicated the issue on TGL platform as well where PAVP first being enabled
> and not on JSL (which doesn't enable PAVP). Right now the recommendation
> that is coming out from Intel is that, on latest Chrome OS devices, there
> might be concurrent SPI accesses between host CPU and CSE. But I will ask
> this question about the older generation platform as you have highlighted
> which has PAVP enabled. Can you please let me know those platform names ?
> Also,  do you know on those devices if we are trying to update the FW using
> flashrom tool as well or not?

Thanks for looking into it. As our timeouts were way below the worst
case, I assume enabling PAVP just makes the problem more visible and is
not directly related. It could also be any other code running on the CSE
that changed between JSL and TGL (unless you know, ofc., that PAVP is
the only code that writes to flash).

Flashrom basically runs on everything so the platform names would be all
platforms that support PAVP. Probably not worth looking further into it
unless there are still issues with a reasonable timeout.

>
>>
>>> 2. Maintain the code symmetry between coreboot and flashrom SPI hardware
>>> sequencing operations.
>>
>> Flashrom has tighter requirements compared to coreboot because we try to
>> support all compatible chipset generations with the same code. If you
>> want to align the two code-bases, please use flashrom as reference and
>> patch coreboot.
>>
> [Subrata] Let me understand this a little better, are you suggesting not to
> increase the SPI operation timing in flashrom upstream master ? knowing
> that there is a hidden issue acknowledged by Intel with their validation
> data, also, promised to update the SPI programming guide to capture the
> recommendation of timeout in multi master scenarios.

No I'm not suggesting any such thing. What I wrote was related to your
2. point as quoted above.

>
> Apart from timeout increasing CLs, other code changes are to increase the
> code modularity (like using macro and helper function), are you suggesting
> to not touch the flashrom code? can't we all contribute towards validating
> the concerned target platforms. At Least in the coreboot community we all
> help each other to validate the platform if we are out

[flashrom] Re: Sharing info about flashrom HW seq code refactoring https://review.coreboot.org/q/topic:PCH_HW_SEQ_Cleanup

2022-03-17 Thread Subrata Banik via flashrom
HI Nico,

Thanks for your email.
Please find my response below.

Thanks,
Subrata


On Thu, Mar 17, 2022 at 6:08 PM Nico Huber  wrote:

> Hi Subrata,
>
> On 16.03.22 17:19, Subrata Banik wrote:
> > Hope you are doing well.
> > I would like bring this activity in your notice
> > https://review.coreboot.org/q/topic:PCH_HW_SEQ_Cleanup and seek help to
> > review the code.
>
> thank you for bringing this to the mailing list. Not only but also
> because `ichspi` is maybe the most important driver in flashrom it
> is important to discuss changes very early. We have to avoid re-
> gressions at all cost, so all changes need to be fully understood
> before it makes sense to look into the implementation details.
>
[Subrata] Sure. I will try to explain the situation better and seek help
from Anastasia to fill in if still some gap exists.

>
> > The reason for this refactoring of HW sequencing SPI driver code are:
> > 1. We (Chrome OS team) recently ran into a firmware update issue on
> > dogfooders (test device utilise by real users and share feedback)
> systems,
> > where the attempt to perform SPI flash update operation from host-cpu
> > (using underlying flashrom utility) is silently failing. Furthermore, we
> > debug the issue with help of Intel and figure out the problem is related
> to
> > multiple master is accessing the SPI flash and operation triggered by
> > host-cpu side using flashrom (write and erase operation) is getting timed
> > out (due to given lesser timeout boundary) due to underlying SPI bus is
> > occupied by other master.
>
> Does the issue persist with the following commit applied to libflashrom,
> activated and integrated into futility?
>
>   commit 330088d77 [CHROMIUM]: libflashrom: Also use USE_BIG_LOCK
>
> This would give us a hint if the issue is indeed due to multiple masters
> or actually due to multiple programs trying to control the same master.
>
[Subrata] Yes, we are seeing failure even with USE_BIG_LOCK code changes
being integrated. The only code change that helps us to W/A this issue is
here
https://chromium-review.googlesource.com/c/chromiumos/third_party/flashrom/+/3498895
(increase
timeout in flashrom operations). Also, Intel disable the CSE runtime access
to SPI using a custom tool and that helped to fix this update failure as
well. These are the evidence towards multi master theory in AU failure
issue.

> This is the special case, starting from Tiger Lake Chrome Platform, where
> > we have enable the PVAP (Protected Audio Video Path) which request CSE to
> > perform some erase and write operation as needed.
>
> We used flashrom on PAVP enabled systems for more than a decade without
> issues. Did something change in that area for Tiger Lake in particular?
>
[Subrata]  I'm not sure about other platforms, but from TGL onwards, on
Chrome devices we have enabled PAVP use case and Intel had tried to
replicate this issue on older platforms as well and informed us it's
replicated the issue on TGL platform as well where PAVP first being enabled
and not on JSL (which doesn't enable PAVP). Right now the recommendation
that is coming out from Intel is that, on latest Chrome OS devices, there
might be concurrent SPI accesses between host CPU and CSE. But I will ask
this question about the older generation platform as you have highlighted
which has PAVP enabled. Can you please let me know those platform names ?
Also,  do you know on those devices if we are trying to update the FW using
flashrom tool as well or not?

>
> > 2. Maintain the code symmetry between coreboot and flashrom SPI hardware
> > sequencing operations.
>
> Flashrom has tighter requirements compared to coreboot because we try to
> support all compatible chipset generations with the same code. If you
> want to align the two code-bases, please use flashrom as reference and
> patch coreboot.
>
[Subrata] Let me understand this a little better, are you suggesting not to
increase the SPI operation timing in flashrom upstream master ? knowing
that there is a hidden issue acknowledged by Intel with their validation
data, also, promised to update the SPI programming guide to capture the
recommendation of timeout in multi master scenarios.

Apart from timeout increasing CLs, other code changes are to increase the
code modularity (like using macro and helper function), are you suggesting
to not touch the flashrom code? can't we all contribute towards validating
the concerned target platforms. At Least in the coreboot community we all
help each other to validate the platform if we are out of required devices.

>
> Also, if you want to save some time in the long run: If we'd focus on
> the flashrom code-base, e.g. prepare libflashrom for easier integration
> into embedded environments, we could drop the redundant code in core-
> boot.
>
> [Subrata] Edward has some suggestions for increasing the code reusability
(between coreboot and flaashrom) going forward, and will let him share his
thoughts on this.


> Cheers,
> Nico
>

[flashrom] Re: Sharing info about flashrom HW seq code refactoring https://review.coreboot.org/q/topic:PCH_HW_SEQ_Cleanup

2022-03-17 Thread Tim Wawrzynczak via flashrom
On Thu, Mar 17, 2022 at 10:35 AM Nico Huber  wrote:

> On 17.03.22 15:35, Subrata Banik wrote:
> > On Thu, Mar 17, 2022 at 6:08 PM Nico Huber  wrote:
> >> On 16.03.22 17:19, Subrata Banik wrote:
> >>> The reason for this refactoring of HW sequencing SPI driver code are:
> >>> 1. We (Chrome OS team) recently ran into a firmware update issue on
> >>> dogfooders (test device utilise by real users and share feedback)
> >>> systems,
> >>> where the attempt to perform SPI flash update operation from host-cpu
> >>> (using underlying flashrom utility) is silently failing. Furthermore,
> we
> >>> debug the issue with help of Intel and figure out the problem is
> related
> >>> to
> >>> multiple master is accessing the SPI flash and operation triggered by
> >>> host-cpu side using flashrom (write and erase operation) is getting
> timed
> >>> out (due to given lesser timeout boundary) due to underlying SPI bus is
> >>> occupied by other master.
> >>
> >> Does the issue persist with the following commit applied to libflashrom,
> >> activated and integrated into futility?
> >>
> >>   commit 330088d77 [CHROMIUM]: libflashrom: Also use USE_BIG_LOCK
> >>
> >> This would give us a hint if the issue is indeed due to multiple masters
> >> or actually due to multiple programs trying to control the same master.
> >>
> > [Subrata] Yes, we are seeing failure even with USE_BIG_LOCK code changes
> > being integrated. The only code change that helps us to W/A this issue is
> > here
> >
> https://chromium-review.googlesource.com/c/chromiumos/third_party/flashrom/+/3498895
> > (increase
> > timeout in flashrom operations). Also, Intel disable the CSE runtime
> access
> > to SPI using a custom tool and that helped to fix this update failure as
> > well. These are the evidence towards multi master theory in AU failure
> > issue.
>
> I see. I was probably too distracted by the refactoring patches earlier.
> CB:62867 is indeed necessary.
>

We seem to have actually figured out what the actual root cause of our
problem was, and it indeed
does seem to be the multi-master problem and flashrom not waiting long
enough in these cases.
tl;dr widevine provisioning was failing, and repeating endlessly in a loop
(~each second or so) and
each attempt actually causes the CSE to write to its data area in flash.
Unfortunately, we do not have
fantastic visibility into the processes that are running during AU, and so
this was very hard to track down.
We have fixed the issues with widevine provisioning and also added a max 5
attempts to provision a device
instead of writing endlessly to the flash in the failure case.


>
> We already assumed a 5s timeout is needed for erasing a block. Worst
> case seems to be that all masters try to do this at once and then one
> master would have to wait (NM-1)*5s before its transaction even starts
> (NM being the number of masters).
>
> We could make it depend on the number of masters, or just choose some-
> thing safe like 30s. WDYT?
>
>
I think the longer the timeout (until it gets ridiculous), probably the
better. Unless there is buggy hardware
out there, there is no good reason for these operations to timeout; there
could for sure be SPI flash
chips out there that perhaps aren't always performing to-spec, who knows.

But in this case, since do we know the maximum number of masters, and 5s
per erase seems reasonable too,
then 30 seconds makes sense to me.


> >> This is the special case, starting from Tiger Lake Chrome Platform,
> where
> >>> we have enable the PVAP (Protected Audio Video Path) which request CSE
> to
> >>> perform some erase and write operation as needed.
> >>
> >> We used flashrom on PAVP enabled systems for more than a decade without
> >> issues. Did something change in that area for Tiger Lake in particular?
> >>
> > [Subrata]  I'm not sure about other platforms, but from TGL onwards, on
> > Chrome devices we have enabled PAVP use case and Intel had tried to
> > replicate this issue on older platforms as well and informed us it's
> > replicated the issue on TGL platform as well where PAVP first being
> enabled
> > and not on JSL (which doesn't enable PAVP). Right now the recommendation
> > that is coming out from Intel is that, on latest Chrome OS devices, there
> > might be concurrent SPI accesses between host CPU and CSE. But I will ask
> > this question about the older generation platform as you have highlighted
> > which has PAVP enabled. Can you please let me know those platform names ?
> > Also,  do you know on those devices if we are trying to update the FW
> using
> > flashrom tool as well or not?
>
> Thanks for looking into it. As our timeouts were way below the worst
> case, I assume enabling PAVP just makes the problem more visible and is
> not directly related. It could also be any other code running on the CSE
> that changed between JSL and TGL (unless you know, ofc., that PAVP is
> the only code that writes to flash).
>
>
PAVP is the only thing we know of ATM that writes to the flash

[flashrom] Re: Sharing info about flashrom HW seq code refactoring https://review.coreboot.org/q/topic:PCH_HW_SEQ_Cleanup

2022-03-17 Thread Nico Huber
Hi Tim,

On 17.03.22 17:54, Tim Wawrzynczak wrote:
> On Thu, Mar 17, 2022 at 10:35 AM Nico Huber  wrote:
>> I'm sorry if this affects your work, but we have very bad experience
>> with unnecessary code changes by CrOS developers. Currently `sb600spi`
>> (the equivalent to `ichspi` for AMD platforms) is broken since months
>> and only CrOS developers could clarify why the code was changed as it
>> was. We need to prevent that this happens to `ichspi` too, right?
>>
>> Nico
>>
>
> We are certainly not intending to break ichspi or make it worse, we are
> trying to make it better

I always assume that's the case for everybody :)

Still, sometimes things break and we know already that things can
break even due to unnecessary refactorings. And saying the reason
for a change is to align with a project with lower quality require-
ments seems kind of odd, doesn't it?

And when reviewers are busy with refactorings instead of attending
the changes they already merged, that feels very wrong :-/

>, perhaps we were a little
> hasty in trying to get through solutions that were not perhaps the root of
> the problem, but were discovered along the way.
>
> We are also working with Intel to get some documentation out about the
> expected behaviors with regard to the
> multi-master scenarios, hopefully that will help clarify things some in
> this regard.

If the timeout is the only issue, the hardware works as expected.
Stating an expected timeout in the datasheets would have avoided the
problem of course :)

Nico
___
flashrom mailing list -- flashrom@flashrom.org
To unsubscribe send an email to flashrom-le...@flashrom.org


[flashrom] Re: Sharing info about flashrom HW seq code refactoring https://review.coreboot.org/q/topic:PCH_HW_SEQ_Cleanup

2022-03-17 Thread Subrata Banik via flashrom
I agree with Tim and will try to provide more requested information and
testing with those CLs.

> Only your reasoning "maintain the code symmetry" is not enough to justify
changes.

Kindly ignore my #2 comment, re-reading it doesn't make sense to me :) to
maintain the symmetry is the real reason for those code changes.

Looking for review comments and I will address those earliest.

Thanks for having good discussion here.

Thanks,
Subrata


On Thu, Mar 17, 2022 at 10:24 PM Tim Wawrzynczak 
wrote:

>
>
> On Thu, Mar 17, 2022 at 10:35 AM Nico Huber  wrote:
>
>> On 17.03.22 15:35, Subrata Banik wrote:
>> > On Thu, Mar 17, 2022 at 6:08 PM Nico Huber  wrote:
>> >> On 16.03.22 17:19, Subrata Banik wrote:
>> >>> The reason for this refactoring of HW sequencing SPI driver code are:
>> >>> 1. We (Chrome OS team) recently ran into a firmware update issue on
>> >>> dogfooders (test device utilise by real users and share feedback)
>> >>> systems,
>> >>> where the attempt to perform SPI flash update operation from host-cpu
>> >>> (using underlying flashrom utility) is silently failing. Furthermore,
>> we
>> >>> debug the issue with help of Intel and figure out the problem is
>> related
>> >>> to
>> >>> multiple master is accessing the SPI flash and operation triggered by
>> >>> host-cpu side using flashrom (write and erase operation) is getting
>> timed
>> >>> out (due to given lesser timeout boundary) due to underlying SPI bus
>> is
>> >>> occupied by other master.
>> >>
>> >> Does the issue persist with the following commit applied to
>> libflashrom,
>> >> activated and integrated into futility?
>> >>
>> >>   commit 330088d77 [CHROMIUM]: libflashrom: Also use USE_BIG_LOCK
>> >>
>> >> This would give us a hint if the issue is indeed due to multiple
>> masters
>> >> or actually due to multiple programs trying to control the same master.
>> >>
>> > [Subrata] Yes, we are seeing failure even with USE_BIG_LOCK code changes
>> > being integrated. The only code change that helps us to W/A this issue
>> is
>> > here
>> >
>> https://chromium-review.googlesource.com/c/chromiumos/third_party/flashrom/+/3498895
>> > (increase
>> > timeout in flashrom operations). Also, Intel disable the CSE runtime
>> access
>> > to SPI using a custom tool and that helped to fix this update failure as
>> > well. These are the evidence towards multi master theory in AU failure
>> > issue.
>>
>> I see. I was probably too distracted by the refactoring patches earlier.
>> CB:62867 is indeed necessary.
>>
>
> We seem to have actually figured out what the actual root cause of our
> problem was, and it indeed
> does seem to be the multi-master problem and flashrom not waiting long
> enough in these cases.
> tl;dr widevine provisioning was failing, and repeating endlessly in a loop
> (~each second or so) and
> each attempt actually causes the CSE to write to its data area in flash.
> Unfortunately, we do not have
> fantastic visibility into the processes that are running during AU, and so
> this was very hard to track down.
> We have fixed the issues with widevine provisioning and also added a max 5
> attempts to provision a device
> instead of writing endlessly to the flash in the failure case.
>
>
>>
>> We already assumed a 5s timeout is needed for erasing a block. Worst
>> case seems to be that all masters try to do this at once and then one
>> master would have to wait (NM-1)*5s before its transaction even starts
>> (NM being the number of masters).
>>
>> We could make it depend on the number of masters, or just choose some-
>> thing safe like 30s. WDYT?
>>
>>
> I think the longer the timeout (until it gets ridiculous), probably the
> better. Unless there is buggy hardware
> out there, there is no good reason for these operations to timeout; there
> could for sure be SPI flash
> chips out there that perhaps aren't always performing to-spec, who knows.
>
> But in this case, since do we know the maximum number of masters, and 5s
> per erase seems reasonable too,
> then 30 seconds makes sense to me.
>
>
>> >> This is the special case, starting from Tiger Lake Chrome Platform,
>> where
>> >>> we have enable the PVAP (Protected Audio Video Path) which request
>> CSE to
>> >>> perform some erase and write operation as needed.
>> >>
>> >> We used flashrom on PAVP enabled systems for more than a decade without
>> >> issues. Did something change in that area for Tiger Lake in particular?
>> >>
>> > [Subrata]  I'm not sure about other platforms, but from TGL onwards, on
>> > Chrome devices we have enabled PAVP use case and Intel had tried to
>> > replicate this issue on older platforms as well and informed us it's
>> > replicated the issue on TGL platform as well where PAVP first being
>> enabled
>> > and not on JSL (which doesn't enable PAVP). Right now the recommendation
>> > that is coming out from Intel is that, on latest Chrome OS devices,
>> there
>> > might be concurrent SPI accesses between host CPU and CSE. But I will
>> ask
>> > this question about 

[flashrom] Re: Questions regarding Easy project and GSoC

2022-03-17 Thread Thomas Heijligen
Hi Aarya,

Thank you for contributing to flashrom. The merge of a patch sometimes
needs just time, even if it has +2.

The topic "fixing endianness issues" consists of tasks to rewrite code
that assumes a specific endianness. For example, the flashrom fmap
parser just maps packed c structs onto the memory. This works only when
flashrom runs on a CPU with the same endian as fmap defines. We have
two macros, __FLASHROM_LITTLE_ENDIAN__ and __FLASHROM_BIG_ENDIAN__
which handle code that works only on one endian. In the GSoC project,
we want to make this code run on little and big endian systems. Besides
that, we may find additional parts in the code which are endian
specific.

-- Thomas

On Sun, 2022-03-13 at 08:13 +0530, Aarya Chaumal wrote:
> As suggested, I have been doing some of the easy projects till now
> and
> I have enjoyed it. I have submitted a few patches till now and some
> of
> them have gotten +2 code-reviews but still, they were not merged. Is
> there anything that is required to get the changes to be merged?
> Also, I have fixed most of the issues I got by running scan-build,
> only 2 classes of issues were remaining which I feel are false
> positives (showing underflow error but those cases won't occur or are
> handled seprately) created by the tool and can be ignored.
> As was going through your proposed projects for GSoC, I found
> "optimizing erase-function" and "fixing endianness issues"
> interesting and would like to do that in the summer. Can you suggest
> some tasks so that I can know more about these projects and get an
> idea of how to write my proposal for GSoC for doing this project?
> Thanks :)
> 
> On Thu, Mar 10, 2022 at 10:45 AM Anastasia Klimchuk
>  wrote:
> > 
> > Hello Aarya,
> > 
> > Nice to meet you! I think I saw your question on the IRC channel,
> > and someone replied, was that your question?
> > Do you still need more info? Let us know if yes.
> > 
> > Thanks!
> > 
> > On Wed, Mar 9, 2022 at 2:42 PM Aarya Chaumal
> >  wrote:
> > > 
> > > Hello there,
> > > 
> > > I hope you are well today when you receive this email. I am Aarya
> > > Chaumal, a Computer Engineering student at the College of
> > > Engineering Pune, India. While going through organizations for
> > > this year's Google's Summer of Code I came across your
> > > organization, Flashrom.
> > > 
> > > I am a  part of onboard computers subsystem at my college's
> > > satellite initiative. Through this, I have closely worked on
> > > Atmel SAM E70 XPLAINED board. Also, I have strong knowledge about
> > > C/C++ and assembly language. From your list of GSoC project
> > > ideas, I liked the idea of “Remove global state from flashrom”
> > > and "Optimize Erase-Function Selection", although I am not quite
> > > sure which one is more suitable for me. Can you guide me through
> > > this?
> > > 
> > > As mentioned in your Contributor commitments and requirements, I
> > > started to do one of the easy projects - Add new flash chip
> > > definitions. For this, I read the relevant datasheets, one from
> > > the unlisted chips and another of a listed one (for reference)
> > > but still, I am not getting the information about some fields for
> > > the structure in the datasheet, namely the feature_bits,
> > > probe_timing. Also, do I have to write the probe, read, write and
> > > erase functions for the chip separately? Also, how do I test if
> > > my code is working as I don't have relevant hardware with me? Can
> > > you help me with this? Also what resources should I use to learn
> > > more about it?
> > > 
> > > Thank you for looking into this for me.
> > > Sincerely,
> > > Aarya Chaumal
> > > ___
> > > flashrom mailing list -- flashrom@flashrom.org
> > > To unsubscribe send an email to flashrom-le...@flashrom.org
> > 
> > 
> > 
> > --
> > Anastasia.
> ___
> flashrom mailing list -- flashrom@flashrom.org
> To unsubscribe send an email to flashrom-le...@flashrom.org

___
flashrom mailing list -- flashrom@flashrom.org
To unsubscribe send an email to flashrom-le...@flashrom.org


[flashrom] Re: Questions regarding Easy project and GSoC

2022-03-17 Thread Thomas Heijligen
Hi Aarya,

Thank you for contributing to flashrom. The merge of a patch sometimes
needs just time, even if it has +2.

The topic "fixing endianness issues" consists of tasks to rewrite code
that assumes a specific endianness. For example, the flashrom fmap
parser just maps packed c structs onto the memory. This works only when
flashrom runs on a CPU with the same endian as fmap defines. We have
two macros, __FLASHROM_LITTLE_ENDIAN__ and __FLASHROM_BIG_ENDIAN__
which handle code that works only on one endian. In the GSoC project,
we want to make this code run on little and big endian systems. Besides
that, we may find additional parts in the code which are endian
specific.

-- Thomas

On Sun, 2022-03-13 at 08:13 +0530, Aarya Chaumal wrote:
> As suggested, I have been doing some of the easy projects till now
> and
> I have enjoyed it. I have submitted a few patches till now and some
> of
> them have gotten +2 code-reviews but still, they were not merged. Is
> there anything that is required to get the changes to be merged?
> Also, I have fixed most of the issues I got by running scan-build,
> only 2 classes of issues were remaining which I feel are false
> positives (showing underflow error but those cases won't occur or are
> handled seprately) created by the tool and can be ignored.
> As was going through your proposed projects for GSoC, I found
> "optimizing erase-function" and "fixing endianness issues"
> interesting and would like to do that in the summer. Can you suggest
> some tasks so that I can know more about these projects and get an
> idea of how to write my proposal for GSoC for doing this project?
> Thanks :)
> 
> On Thu, Mar 10, 2022 at 10:45 AM Anastasia Klimchuk
>  wrote:
> > 
> > Hello Aarya,
> > 
> > Nice to meet you! I think I saw your question on the IRC channel,
> > and someone replied, was that your question?
> > Do you still need more info? Let us know if yes.
> > 
> > Thanks!
> > 
> > On Wed, Mar 9, 2022 at 2:42 PM Aarya Chaumal
> >  wrote:
> > > 
> > > Hello there,
> > > 
> > > I hope you are well today when you receive this email. I am Aarya
> > > Chaumal, a Computer Engineering student at the College of
> > > Engineering Pune, India. While going through organizations for
> > > this year's Google's Summer of Code I came across your
> > > organization, Flashrom.
> > > 
> > > I am a  part of onboard computers subsystem at my college's
> > > satellite initiative. Through this, I have closely worked on
> > > Atmel SAM E70 XPLAINED board. Also, I have strong knowledge about
> > > C/C++ and assembly language. From your list of GSoC project
> > > ideas, I liked the idea of “Remove global state from flashrom”
> > > and "Optimize Erase-Function Selection", although I am not quite
> > > sure which one is more suitable for me. Can you guide me through
> > > this?
> > > 
> > > As mentioned in your Contributor commitments and requirements, I
> > > started to do one of the easy projects - Add new flash chip
> > > definitions. For this, I read the relevant datasheets, one from
> > > the unlisted chips and another of a listed one (for reference)
> > > but still, I am not getting the information about some fields for
> > > the structure in the datasheet, namely the feature_bits,
> > > probe_timing. Also, do I have to write the probe, read, write and
> > > erase functions for the chip separately? Also, how do I test if
> > > my code is working as I don't have relevant hardware with me? Can
> > > you help me with this? Also what resources should I use to learn
> > > more about it?
> > > 
> > > Thank you for looking into this for me.
> > > Sincerely,
> > > Aarya Chaumal
> > > ___
> > > flashrom mailing list -- flashrom@flashrom.org
> > > To unsubscribe send an email to flashrom-le...@flashrom.org
> > 
> > 
> > 
> > --
> > Anastasia.
> ___
> flashrom mailing list -- flashrom@flashrom.org
> To unsubscribe send an email to flashrom-le...@flashrom.org

___
flashrom mailing list -- flashrom@flashrom.org
To unsubscribe send an email to flashrom-le...@flashrom.org