On Mon, 2 Dec 2013 10:19:37 -0500 Prarit Bhargava <pra...@redhat.com> wrote:
> Logging and tracking firmware bugs in the kernel has long been an issue > for system administrators. The current kernel does not have a good > uniform method of reporting firmware bugs and the code in the kernel is a > mix of printk's and WARN_ONs. This causes problems for both system > administrators and QA engineers who attempt to diagnose problems within > the kernel. > > Using printk's is somewhat effective but lacks information useful for > reporting a bug such as the system vendor or model, BIOS revision, etc. > Using WARN_ONs is also questionable because the data like the backtrace > and the list of modules is usually unnecessary for firmware issues as the > warning stems from one call path during system or driver initialization. > We have heard many complaints from users about the excess verbosity and > confusing stacktraces for these messages. > > I'm proposing with this patch to do something similar to the WARN() > mechanism that is currently implemented in the kernel. This > patchset introduces FW_INFO() and FW_INFO_DEV() which logs output like: > > [ 230.661137] [Firmware Info]: pci_bus 0000:00: at > /home/prarit_modules/prarit.c:21 Your BIOS is broken because it is > -ENOWORKY. > [ 230.671076] [Firmware Info]: Intel Corporation SandyBridge Platform/To > be filled by O.E.M., BIOS RMLCRB.86I.R3.27.D685.1305151733 05/15/2013 > > instead of the verbose back traces we are currently seeing. These messages > can be easily gleaned from /var/log/messages, etc., by automatic bug > reporting tools and system administrators to properly report bugs to > hardware vendors. > > I found an improperly classified FW_INFO in arch/x86/kernel/cpu/amd.c > which that should be a FW_BUG. A slight simplification: > +static inline char *dump_hadware_arch_desc(void) > +{ > + return NULL; > +} return "unavailable"; > +void warn_slowpath_fmt_dev(const char *file, int line, > + struct device *dev, int level, const char *fmt, ...) > +{ > + struct slowpath_args args; > + static char fw_str[16] = "\0"; > + > + switch (level) { > + case 1: > + strcpy(fw_str, "[Firmware Info]"); > + break; > + case 2: > + strcpy(fw_str, "[Firmware Warn]"); > + break; > + case 3: > + strcpy(fw_str, "[Firmware Bug]"); What's With The Crazy Capitalization In This Code? It's Illiterate! > + add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK); > + break; > + default: > + strcpy(fw_str, "[Firmware Bug]"); > + break; > + } > + > + if (dev) > + pr_info("%s: %s %s ", fw_str, > + dev_driver_string(dev), dev_name(dev)); > + pr_info("%s: at %s:%d\n", fw_str, file, line); > + > + args.fmt = fmt; > + va_start(args.args, fmt); > + printk(KERN_WARNING "%s: %pV", fw_str, &args); > + va_end(args.args); > + > + if (dump_hardware_arch_desc()) > + pr_info("%s: System Info: %s\n", fw_str, > + dump_hardware_arch_desc()); > + else > + pr_info("%s: System Info: Hardware Unidentified\n", fw_str); pr_info(""%s: system info: %s\n", fw_str, dump_hardware_arch_desc()); > +} > +EXPORT_SYMBOL(warn_slowpath_fmt_dev); > + > + > +char *dump_hardware_arch_desc(void) > +{ > + if (dump_stack_arch_desc_str[0] != '\0') > + return dump_stack_arch_desc_str; > + return NULL; return "unavaliable"; > +} The general thrust of the patchset seems useful and important. I do agree with Joe's suggestion regarding the presentation, although it isn't a show-stopper. I do wonder if it all should be generalised a bit - it creates a bunch of infrastructure which is specific to system firmware issues, but what's so special about firmware? Why can't I use this infrastructure to log netdev errors or acpi errors or PM errors or...? But I didn't think about it much ;) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/