Am Wed, 17 May 2023 12:52:31 +0200
schrieb Jan Kiszka <[email protected]>:

> On 15.05.23 20:20, 'Henning Schild' via EFI Boot Guard wrote:
> > Tested on a Supermicro X11 series (Simatic IPC 1047E) and on some
> > other random Supermicro X9 series machine. Should work for any
> > board that has IPMI in any version. When an IPMI watchdog is
> > present, iTCO is likely not working. The Linux driver would detect
> > that and has special code to deal with especially Supermicro, here
> > we use probe order.
> > 
> > Developed using the IPMI Spec 2.0 sections 9 and 27.
> > 
> > Signed-off-by: Henning Schild <[email protected]>
> > ---
> >  Makefile.am                 |   2 +
> >  drivers/watchdog/ipmi_wdt.c | 193
> > ++++++++++++++++++++++++++++++++++++ 2 files changed, 195
> > insertions(+) create mode 100644 drivers/watchdog/ipmi_wdt.c
> > 
> > diff --git a/Makefile.am b/Makefile.am
> > index 48c560f72f0c..ba2440025a0c 100644
> > --- a/Makefile.am
> > +++ b/Makefile.am
> > @@ -159,12 +159,14 @@ if BOOTLOADER
> >  if ARCH_IS_X86
> >  # NOTE: wdat.c is placed first so it is tried before any other
> > drivers # NOTE: ipc4x7e_wdt.c must be *before* itco.c
> > +# NOTE: ipmi_wdt.c must be *before* itco.c
> >  efi_sources_watchdogs = \
> >     drivers/watchdog/wdat.c \
> >     drivers/watchdog/amdfch_wdt.c \
> >     drivers/watchdog/i6300esb.c \
> >     drivers/watchdog/atom-quark.c \
> >     drivers/watchdog/ipc4x7e_wdt.c \
> > +   drivers/watchdog/ipmi_wdt.c \
> >     drivers/watchdog/itco.c \
> >     drivers/watchdog/hpwdt.c
> >  else
> > diff --git a/drivers/watchdog/ipmi_wdt.c
> > b/drivers/watchdog/ipmi_wdt.c new file mode 100644
> > index 000000000000..af55e57c5a0e
> > --- /dev/null
> > +++ b/drivers/watchdog/ipmi_wdt.c
> > @@ -0,0 +1,193 @@
> > +/*
> > + * EFI Boot Guard
> > + *
> > + * Copyright (c) Siemens AG, 2023
> > + *
> > + * Authors:
> > + *  Henning Schild <[email protected]>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version
> > 2.  See
> > + * the COPYING file in the top-level directory.
> > + *
> > + * SPDX-License-Identifier:        GPL-2.0
> > + */
> > +
> > +#include <efi.h>
> > +#include <pci/header.h>
> > +#include <sys/io.h>
> > +#include "utils.h"
> > +
> > +#define SMBIOS_TYPE_IPMI_KCS               38
> > +#define IPMI_KCS_DEFAULT_IOBASE            0xca2
> > +
> > +#define IPMI_KCS_STS_OBF           0x1
> > +#define IPMI_KCS_STS_IBF           0x2
> > +
> > +#define IPMI_KCS_CMD_ABORT         0x60
> > +#define IPMI_KCS_CMD_WRITE_START        0x61
> > +#define IPMI_KCS_CMD_WRITE_END          0x62
> > +
> > +#define IPMI_KCS_NETFS_LUN_WDT     0x18
> > +
> > +#define IPMI_WDT_CMD_RESET         0x22
> > +#define IPMI_WDT_CMD_SET           0x24
> > +#define  IPMI_WDT_SET_USE_OSLOAD        0x3
> > +#define  IPMI_WDT_SET_ACTION_HARD_RESET 0x1
> > +
> > +  
> 
> Extra newline.
> 
> > +#define kcs_sts_is_error(sts) (((sts >> 6 ) & 0x3) == 0x3)
> > +
> > +static char
> > +set_wdt_data[] = {IPMI_WDT_SET_USE_OSLOAD,
> > IPMI_WDT_SET_ACTION_HARD_RESET,
> > +             0x00, 0x00,0x00, 0x00};
> > +
> > +static EFI_STATUS
> > +kcs_wait_iobf(UINT16 io_base, char iobf)
> > +{
> > +   char sts;
> > +
> > +   while (1) {
> > +           sts = inb(io_base + 1);
> > +           if (kcs_sts_is_error(sts))
> > +                   return EFI_DEVICE_ERROR;
> > +           if (iobf == IPMI_KCS_STS_IBF) {
> > +                   // IBF we wait for clear
> > +                   if (!(sts & IPMI_KCS_STS_IBF))
> > +                           break;
> > +           } else {
> > +                   // OBF we wait for set
> > +                   if (sts & IPMI_KCS_STS_OBF)
> > +                           break;
> > +           }
> > +   }  
> 
> Is that loop guaranteed to eventually terminate? Or do we have to set
> a waiting limit? What does the kernel do?

Not sure what the kernel really does, did not dig that deep since the
kernel code is hard to read and was not even looked at.

The loop is expected to eventually end. We wait for input to become
available, or output to be received by the BMC. Or we should see an
error.

But thinking about it again, i think it might indeed be better to put a
timer around all of that. I will have a look at the kernel and freeipmi
(userspace implementation) and kind of expect endless retry loops
guarded with timers that use multiples of 5s (number coming from the
spec).

> > +
> > +   return EFI_SUCCESS;
> > +}
> > +
> > +static EFI_STATUS
> > +kcs_outb(UINT8 value, UINT16 io_base, UINT16 port)
> > +{
> > +   EFI_STATUS status;
> > +
> > +   status = kcs_wait_iobf(io_base, IPMI_KCS_STS_IBF);
> > +   if (status)
> > +           return status;
> > +
> > +   outb(value, io_base + port);
> > +   // dummy read  
> 
> Why? As delay or because the BMC expects a read in order to continue?

BMC expects that, that is written in the SPEC and also called "dummy
read" there.

> > +   inb(io_base);
> > +
> > +   return EFI_SUCCESS;
> > +}
> > +
> > +static EFI_STATUS
> > +_send_ipmi_cmd(UINT16 io_base, char cmd, char *data, int datalen)
> > +{
> > +   int i, err = 0;
> > +   char lastbyte = cmd;
> > +
> > +   err += kcs_outb(IPMI_KCS_CMD_WRITE_START, io_base, 1);
> > +   err += kcs_outb(IPMI_KCS_NETFS_LUN_WDT, io_base, 0);
> > +
> > +   if (datalen) {
> > +           lastbyte = data[datalen - 1];
> > +           err += kcs_outb(cmd, io_base, 0);
> > +           for (i = 0; i < datalen - 1; i++)
> > +                   err += kcs_outb(data[i], io_base, 0);
> > +   }
> > +
> > +   err += kcs_outb(IPMI_KCS_CMD_WRITE_END, io_base,  1);
> > +   err += kcs_outb(lastbyte, io_base, 0);
> > +
> > +   if (err)
> > +           return EFI_DEVICE_ERROR;
> > +
> > +   return kcs_wait_iobf(io_base, IPMI_KCS_STS_OBF);
> > +}
> > +
> > +static VOID
> > +handle_ipmi_error(UINT16 io_base)
> > +{
> > +   WARNING(L"Handling Error Status 0x%x\n", inb(io_base + 1));
> > +
> > +   outb(IPMI_KCS_CMD_ABORT, io_base + 1);
> > +
> > +   if (kcs_wait_iobf(io_base, IPMI_KCS_STS_IBF))
> > +           // stuck in error state, hopefully does not happen
> >  
> 
> Reads a bit like "we are lost when we get here", which is not
> completely true as we have an error message on the console and an
> error return code from the caller, send_ipmi_cmd. What we then do
> with the error code is a different story, see below.

I will remove the comments. We come here when we are in error-state and
hope to transition out of it by sending an abort command. If that all
goes well we can try sending our real command again ... but if we are
stuck in error we eventually have to give up.

> > +           return;
> > +
> > +   if (inb((io_base + 1) & IPMI_KCS_STS_OBF))
> > +           inb(io_base);
> > +   outb(0x0, io_base);
> > +
> > +   if (kcs_wait_iobf(io_base, IPMI_KCS_STS_IBF))
> > +           // stuck in error state, hopefully does not happen
> > +           return;
> > +}
> > +
> > +static EFI_STATUS
> > +send_ipmi_cmd(UINT16 io_base, char cmd, char *data, int datalen)
> > +{
> > +   EFI_STATUS status;
> > +   int i;
> > +
> > +   for (i = 0; i < 5; i++) {
> > +           status = _send_ipmi_cmd(io_base, cmd, data,
> > datalen);
> > +           if (status == EFI_SUCCESS)
> > +                   return status;
> > +           handle_ipmi_error(io_base);
> > +   }
> > +
> > +   return status;
> > +}
> > +
> > +static EFI_STATUS __attribute__((constructor))
> > +init(__attribute__((unused)) EFI_PCI_IO *pci_io,
> > +     __attribute__((unused)) UINT16 pci_vendor_id,
> > +     __attribute__((unused)) UINT16 pci_device_id,
> > +     UINTN timeout)
> > +{
> > +   SMBIOS_STRUCTURE_TABLE *smbios_table;
> > +   SMBIOS_STRUCTURE_POINTER smbios_struct;
> > +   EFI_STATUS status;
> > +   UINT64 io_base;
> > +   UINT16 *timeout_value;
> > +
> > +   status = LibGetSystemConfigurationTable(&SMBIOSTableGuid,
> > +                                           (VOID
> > **)&smbios_table); +
> > +   if (status != EFI_SUCCESS)
> > +           return EFI_UNSUPPORTED;
> > +
> > +   smbios_struct = smbios_find_struct(smbios_table,
> > SMBIOS_TYPE_IPMI_KCS); +
> > +   if (smbios_struct.Raw == NULL)
> > +           return EFI_UNSUPPORTED;
> > +
> > +   io_base = *((UINT64 *)(smbios_struct.Raw + 8));
> > +   if (io_base == 0) {
> > +           io_base = IPMI_KCS_DEFAULT_IOBASE;
> > +   } else {
> > +           if (!(io_base & 1))
> > +                   // MMIO not implemented
> > +                   return EFI_UNSUPPORTED;
> > +
> > +           io_base &= ~1;
> > +   }
> > +
> > +   INFO(L"Detected IPMI watchdog at I/O 0x%x\n", io_base);
> > +   timeout_value = (UINT16 *)(set_wdt_data + 4);
> > +   *timeout_value = timeout * 10;
> > +
> > +   status = send_ipmi_cmd(io_base, IPMI_WDT_CMD_SET,
> > set_wdt_data,
> > +                          sizeof(set_wdt_data));
> > +   if (status == EFI_SUCCESS)
> > +           status = send_ipmi_cmd(io_base,
> > IPMI_WDT_CMD_RESET, NULL, 0); +
> > +   if (status != EFI_SUCCESS) {
> > +           WARNING(L"Watchdog device repeatedly reported
> > errors. " \
> > +                   "Failed to arm the watchdog, but still
> > booting up.\n");  
> 
> That's deviating for other watchdog drivers. What makes this watchdog
> different to justify it? If nothing, we can still argue about the best
> behavior at EBG level. But, IIRC, a failing EFI application would
> cause a reboot attempt of the firmware.

This watchdog driver needs to have error-handling built-in the actual
arming of the watchdog. Because we talk to a potentially multi-user
device and there can be races.

But you are totally right, i never thought about what would happen if i
just returned the error to the firmware. I played with all the error
handling in an UEFI shell ... where an error code looks like a "stuck
in boot".

I think i will indeed remove much of the error handling and
retry/abort logic and simply pass the error to the firmware. And for
the "breaking up endless waiting" simply use a 5s timer. We manage to
arm in less than 5 seconds, or we tell the firmware there was an error.

Henning

> Jan
> 
> > +   }
> > +
> > +   return EFI_SUCCESS;
> > +}  
> 

-- 
You received this message because you are subscribed to the Google Groups "EFI 
Boot Guard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/efibootguard-dev/20230522163653.18f68869%40md1za8fc.ad001.siemens.net.

Reply via email to