On Wed, Aug 21, 2019 at 08:42:24PM +0300, Alexander Amelkin wrote:
> 21.08.2019 19:32, Guenter Roeck wrote:
> > On Wed, Aug 21, 2019 at 06:57:43PM +0300, Ivan Mikhaylov wrote:
> >> Set WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION into WDT_CLEAR_TIMEOUT_STATUS
> >> to clear out boot code source and re-enable access to the primary SPI flash
> >> chip while booted via wdt2 from the alternate chip.
> >>
> >> AST2400 datasheet says:
> >> "In the 2nd flash booting mode, all the address mapping to CS0# would be
> >> re-directed to CS1#. And CS0# is not accessable under this mode. To access
> >> CS0#, firmware should clear the 2nd boot mode register in the WDT2 status
> >> register WDT30.bit[1]."
> > Is there reason to not do this automatically when loading the module
> > in alt-boot mode ? What means does userspace have to determine if CS0
> > or CS1 is active at any given time ? If there is reason to ever have CS1
> > active instead of CS0, what means would userspace have to enable it ?
> 
> Yes, there is. The driver is loaded long before the filesystems are mounted. 
> The filesystems, in the event of alternate/recovery boot, need to be mounted 
> from the same chip that the kernel was booted. For one reason because the 
> main chip at CS0 is most probably corrupt. If you clear that bit when driver 
> is loaded, your software will not know that and will try to mount the wrong 
> filesystems. The whole idea of ASPEED's switching chipselects is to have 
> identical firmware in both chips, without the need to process the alternate 
> boot state in any way except for indicating a successful boot and restoring 
> access to CS0 when needed.
> 
> The userspace can read bootstatus sysfs node to determine if an alternate 
> boot has occured.
> 
> With ASPEED, CS1 is activated automatically by wdt2 when system fails to boot 
> from the primary flash chip (at CS0) and disable the watchdog to indicate a 
> successful boot. When that happens, both CS0 and CS1 controls  get routed in 
> hardware to CS1 line, making the primary flash chip inaccessible. Depending 
> on the architecture of the user-space software, it may choose to re-enable 
> access to the primary chip via CS0 at different times. There must be a way to 
> do so.
> 
So by activating cs0, userspace would essentially pull its own root file system
from underneath itself ?

> > If userspace can not really determine if CS1 or CS0 is active, all it could
> > ever do was to enable CS0 to be in a deterministic state. If so, it doesn't
> > make sense to ever have CS1 active, and re-enabling CS0 could be automatic.
> >
> > Similar, if CS1 can ever be enabled, there is no means for userspace to 
> > ensure
> > that some other application did not re-enable CS0 while it believes that CS1
> > is enabled. If there is no means for userspace to enable CS1, it can never 
> > be
> > sure what is enabled (because some other entity may have enabled CS0 while
> > userspace just thought that CS1 is still enabled). Again, the only means
> > to guarantee a well defined state would be to explicitly enable CS0 and 
> > provive
> > no means to enable CS1. Again, this could be done during boot, not requiring
> > an explicit request from userspace.
> 
> Please understand that activation of CS1 in place of CS0 is NOT a software 
> choice!
> 
> 
> >> +  if (unlikely(!wdt))
> >> +          return -ENODEV;
> >> +
> > How would this ever happen, and how / where is drvdata set to NULL ?
> 
> This is purely for robustness. Seeing a pointer obtained via a function 
> accessed without first checking it for validity makes me nervous.
> 
This is not how kernel code is commonly written.
Sure, we could add similar checks to each sysfs access code in the kernel,
blowing up its size significantly. I do not see a point of this.

> This code most probably adds nothing at the assembly level.
> 
That seems quite unlikely. Please demonstrate.

> >
> >> +  writel(WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION,
> >> +                  wdt->base + WDT_CLEAR_TIMEOUT_STATUS);
> >> +  wdt->wdd.bootstatus |= WDIOF_EXTERN1;
> > The variable reflects the _boot status_. It should not change after booting.
> Is there any documentation that dictates that? All I could find is
> 
> "bootstatus: status of the device after booting". That doesn't look to me 
> like it absolutely can not change to reflect the updated status (that is, to 
> reflect that the originally set up alternate CS routing has been reset to 
> normal).
> 
You choose to interpret "after booting" in a kind of novel way,
which I find a bit disturbing. I am not really sure how else to
describe "boot status" in a way that does not permit such
reinterpratation of the term.

On top of that, how specifically would "WDIOF_EXTERN1" reflect
what you claim it does ? Not only you are hijacking bootstatus9
(which is supposed to describe the reason for a reboot), you
are also hijacking WDIOF_EXTERN1. That seems highly arbitrary
to me, and is not really how an API/ABI should be used.

Guenter

> If you absolutely disallow that, I think we could make 'access_cs0' readable 
> instead, so it could report the current state of the boot code selection bit. 
> Reverted, I suppose. That way 'access_cs0' would report 1 after 1 has been 
> written to it (it wouldn't be possible to write a zero).
> 
> > @@ -223,6 +248,9 @@ static int aspeed_wdt_probe(struct platform_device 
> > *pdev)
> >  
> >     wdt->ctrl = WDT_CTRL_1MHZ_CLK;
> >  
> > +   if (of_property_read_bool(np, "aspeed,alt-boot"))
> > +           wdt->wdd.groups = bswitch_groups;
> > +
> > Why does this have to be separate to the existing evaluation of
> > aspeed,alt-boot, and why does the existing code not work ?
> >
> > Also, is it guaranteed that this does not interfer with existing
> > support for alt-boot ?
> 
> I think Ivan will comment on this.
> 
> With best regards,
> Alexander Amelkin,
> BIOS/BMC Team Lead, YADRO
> https://yadro.com
> 
> 



Reply via email to