Re: [PATCH v6 0/2] Add Realtek Otto GPIO support
On Wed, 2021-03-31 at 09:49 +0200, Bartosz Golaszewski wrote: > On Tue, Mar 30, 2021 at 7:48 PM Sander Vanheule > wrote: > > > > Add support for the GPIO controller employed by Realtek in multiple > > series of MIPS SoCs. These include the supported RTL838x and > > RTL839x. The register layout also matches the one found in the GPIO > > controller of other (Lexra-based) SoCs such as RTL8196E, RTL8197D, > > and RTL8197F. > > Series applied, thanks! Thanks for merging, and thanks for the discussion everyone! Best, Sander
[PATCH v6 0/2] Add Realtek Otto GPIO support
Add support for the GPIO controller employed by Realtek in multiple series of MIPS SoCs. These include the supported RTL838x and RTL839x. The register layout also matches the one found in the GPIO controller of other (Lexra-based) SoCs such as RTL8196E, RTL8197D, and RTL8197F. For the platform name 'otto', I am not aware of any official resources as to what hardware this specifically applies to. However, in all of the GPL archives we've received, from vendors using compatible SoCs in their design, the platform under the MIPS architecture is referred to by this name. The GPIO ports have been tested on a Zyxel GS1900-8 (RTL8380), and Zyxel GS1900-48 (RTL8393). Furthermore, the GPIO ports and interrupt controller have been tested on a Netgear GS110TPPv1 (RTL8381). Changes in v6: - Use devm_gpiochip_add_data() - Code style for reading ngpios, header order - Add Andy's Reviewed-by tag Changes in v5: - Edited code comments - Fold functions that were used only once or twice (ISR/IMR accessors) - Drop trivial functions for line to port/pin calculations - Use gpio_irq_chip->init_hw() to initialise IRQ registers - Invert GPIO_INTERRUPTS flag to GPIO_INTERRUPTS_DISABLED - Support building as module - Add Rob's Reviewed-by tag Changes in v4: - Fix pointer notation style - Drop unused read_u16_reg() function - Drop 'inline' specifier from functions Changes in v3: - Remove OF dependencies in driver probe - Don't accept IRQ_TYPE_NONE as a valid interrupt type - Remove (now unused) dev property from control structure - Use u8/u16 port registers, instead of raw u32 registers - Use 'line' name for gpiochip, 'port' and 'pin' names for hardware - Renamed DT bindings file - Dropped fallback-only DT compatible - Various code style clean-ups Changes in v2: - Clarify structure and usage of IMR registers - Added Linus' Reviewed-by tags Sander Vanheule (2): dt-bindings: gpio: Binding for Realtek Otto GPIO gpio: Add Realtek Otto GPIO support .../bindings/gpio/realtek,otto-gpio.yaml | 78 + drivers/gpio/Kconfig | 13 + drivers/gpio/Makefile | 1 + drivers/gpio/gpio-realtek-otto.c | 325 ++ 4 files changed, 417 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml create mode 100644 drivers/gpio/gpio-realtek-otto.c -- 2.30.2
[PATCH v6 2/2] gpio: Add Realtek Otto GPIO support
Realtek MIPS SoCs (platform name Otto) have GPIO controllers with up to 64 GPIOs, divided over two banks. Each bank has a set of registers for 32 GPIOs, with support for edge-triggered interrupts. Each GPIO bank consists of four 8-bit GPIO ports (ABCD and EFGH). Most registers pack one bit per GPIO, except for the IMR register, which packs two bits per GPIO (AB-CD). Although the byte order is currently assumed to have port A..D at offset 0x0..0x3, this has been observed to be reversed on other, Lexra-based, SoCs (e.g. RTL8196E/97D/97F). Interrupt support is disabled for the fallback devicetree-compatible 'realtek,otto-gpio'. This allows for quick support of GPIO banks in which the byte order would be unknown. In this case, the port ordering in the IMR registers may not match the reversed order in the other registers (DCBA, and BA-DC or DC-BA). Signed-off-by: Sander Vanheule Reviewed-by: Linus Walleij Reviewed-by: Andy Shevchenko --- drivers/gpio/Kconfig | 13 ++ drivers/gpio/Makefile| 1 + drivers/gpio/gpio-realtek-otto.c | 325 +++ 3 files changed, 339 insertions(+) create mode 100644 drivers/gpio/gpio-realtek-otto.c diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig index e3607ec4c2e8..6fb13d6507db 100644 --- a/drivers/gpio/Kconfig +++ b/drivers/gpio/Kconfig @@ -502,6 +502,19 @@ config GPIO_RDA help Say Y here to support RDA Micro GPIO controller. +config GPIO_REALTEK_OTTO + tristate "Realtek Otto GPIO support" + depends on MACH_REALTEK_RTL + default MACH_REALTEK_RTL + select GPIO_GENERIC + select GPIOLIB_IRQCHIP + help + The GPIO controller on the Otto MIPS platform supports up to two + banks of 32 GPIOs, with edge triggered interrupts. The 32 GPIOs + are grouped in four 8-bit wide ports. + + When built as a module, the module will be called realtek_otto_gpio. + config GPIO_REG bool help diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile index c58a90a3c3b1..8ace5934e3c3 100644 --- a/drivers/gpio/Makefile +++ b/drivers/gpio/Makefile @@ -124,6 +124,7 @@ obj-$(CONFIG_GPIO_RC5T583) += gpio-rc5t583.o obj-$(CONFIG_GPIO_RCAR)+= gpio-rcar.o obj-$(CONFIG_GPIO_RDA) += gpio-rda.o obj-$(CONFIG_GPIO_RDC321X) += gpio-rdc321x.o +obj-$(CONFIG_GPIO_REALTEK_OTTO)+= gpio-realtek-otto.o obj-$(CONFIG_GPIO_REG) += gpio-reg.o obj-$(CONFIG_ARCH_SA1100) += gpio-sa1100.o obj-$(CONFIG_GPIO_SAMA5D2_PIOBU) += gpio-sama5d2-piobu.o diff --git a/drivers/gpio/gpio-realtek-otto.c b/drivers/gpio/gpio-realtek-otto.c new file mode 100644 index ..cb64fb5a51aa --- /dev/null +++ b/drivers/gpio/gpio-realtek-otto.c @@ -0,0 +1,325 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include +#include +#include +#include +#include +#include + +/* + * Total register block size is 0x1C for one bank of four ports (A, B, C, D). + * An optional second bank, with ports E, F, G, and H, may be present, starting + * at register offset 0x1C. + */ + +/* + * Pin select: (0) "normal", (1) "dedicate peripheral" + * Not used on RTL8380/RTL8390, peripheral selection is managed by control bits + * in the peripheral registers. + */ +#define REALTEK_GPIO_REG_CNR 0x00 +/* Clear bit (0) for input, set bit (1) for output */ +#define REALTEK_GPIO_REG_DIR 0x08 +#define REALTEK_GPIO_REG_DATA 0x0C +/* Read bit for IRQ status, write 1 to clear IRQ */ +#define REALTEK_GPIO_REG_ISR 0x10 +/* Two bits per GPIO in IMR registers */ +#define REALTEK_GPIO_REG_IMR 0x14 +#define REALTEK_GPIO_REG_IMR_AB0x14 +#define REALTEK_GPIO_REG_IMR_CD0x18 +#define REALTEK_GPIO_IMR_LINE_MASK GENMASK(1, 0) +#define REALTEK_GPIO_IRQ_EDGE_FALLING 1 +#define REALTEK_GPIO_IRQ_EDGE_RISING 2 +#define REALTEK_GPIO_IRQ_EDGE_BOTH 3 + +#define REALTEK_GPIO_MAX 32 +#define REALTEK_GPIO_PORTS_PER_BANK4 + +/** + * realtek_gpio_ctrl - Realtek Otto GPIO driver data + * + * @gc: Associated gpio_chip instance + * @base: Base address of the register block for a GPIO bank + * @lock: Lock for accessing the IRQ registers and values + * @intr_mask: Mask for interrupts lines + * @intr_type: Interrupt type selection + * + * Because the interrupt mask register (IMR) combines the function of IRQ type + * selection and masking, two extra values are stored. @intr_mask is used to + * mask/unmask the interrupts for a GPIO port, and @intr_type is used to store + * the selected interrupt types. The logical AND of these values is written to + * IMR on changes. + */ +struct realtek_gpio_ctrl { + struct gpio_chip gc; + void __iomem *base; + raw_spinlock_t lock; + u16 intr_mask[REALTEK_GPIO_PORTS_PER_BANK]; + u16 intr_type[REALTEK_GPI
[PATCH v6 1/2] dt-bindings: gpio: Binding for Realtek Otto GPIO
Add a binding description for Realtek's GPIO controller found on several of their MIPS-based SoCs (codenamed Otto), such as the RTL838x and RTL839x series of switch SoCs. A fallback binding 'realtek,otto-gpio' is provided for cases where the actual port ordering is not known yet, and enabling the interrupt controller may result in uncaught interrupts. Signed-off-by: Sander Vanheule Reviewed-by: Linus Walleij Reviewed-by: Rob Herring --- .../bindings/gpio/realtek,otto-gpio.yaml | 78 +++ 1 file changed, 78 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml diff --git a/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml b/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml new file mode 100644 index ..100f20cebd76 --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml @@ -0,0 +1,78 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpio/realtek,otto-gpio.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Realtek Otto GPIO controller + +maintainers: + - Sander Vanheule + - Bert Vermeulen + +description: | + Realtek's GPIO controller on their MIPS switch SoCs (Otto platform) consists + of two banks of 32 GPIOs. These GPIOs can generate edge-triggered interrupts. + Each bank's interrupts are cascased into one interrupt line on the parent + interrupt controller, if provided. + This binding allows defining a single bank in the devicetree. The interrupt + controller is not supported on the fallback compatible name, which only + allows for GPIO port use. + +properties: + $nodename: +pattern: "^gpio@[0-9a-f]+$" + + compatible: +items: + - enum: + - realtek,rtl8380-gpio + - realtek,rtl8390-gpio + - const: realtek,otto-gpio + + reg: +maxItems: 1 + + "#gpio-cells": +const: 2 + + gpio-controller: true + + ngpios: +minimum: 1 +maximum: 32 + + interrupt-controller: true + + "#interrupt-cells": +const: 2 + + interrupts: +maxItems: 1 + +required: + - compatible + - reg + - "#gpio-cells" + - gpio-controller + +additionalProperties: false + +dependencies: + interrupt-controller: [ interrupts ] + +examples: + - | + gpio@3500 { +compatible = "realtek,rtl8380-gpio", "realtek,otto-gpio"; +reg = <0x3500 0x1c>; +gpio-controller; +#gpio-cells = <2>; +ngpios = <24>; +interrupt-controller; +#interrupt-cells = <2>; +interrupt-parent = <&rtlintc>; +interrupts = <23>; + }; + +... -- 2.30.2
[PATCH v5 0/2] Add Realtek Otto GPIO support
Add support for the GPIO controller employed by Realtek in multiple series of MIPS SoCs. These include the supported RTL838x and RTL839x. The register layout also matches the one found in the GPIO controller of other (Lexra-based) SoCs such as RTL8196E, RTL8197D, and RTL8197F. For the platform name 'otto', I am not aware of any official resources as to what hardware this specifically applies to. However, in all of the GPL archives we've received, from vendors using compatible SoCs in their design, the platform under the MIPS architecture is referred to by this name. The GPIO ports have been tested on a Zyxel GS1900-8 (RTL8380), and Zyxel GS1900-48 (RTL8393). Furthermore, the GPIO ports and interrupt controller have been tested on a Netgear GS110TPPv1 (RTL8381). Changes in v5: - Edited code comments - Fold functions that were used only once or twice (ISR/IMR accessors) - Drop trivial functions for line to port/pin calculations - Use gpio_irq_chip->init_hw() to initialise IRQ registers - Invert GPIO_INTERRUPTS flag to GPIO_INTERRUPTS_DISABLED - Support building as module - Add Rob's Reviewed-by tag Changes in v4: - Fix pointer notation style - Drop unused read_u16_reg() function - Drop 'inline' specifier from functions Changes in v3: - Remove OF dependencies in driver probe - Don't accept IRQ_TYPE_NONE as a valid interrupt type - Remove (now unused) dev property from control structure - Use u8/u16 port registers, instead of raw u32 registers - Use 'line' name for gpiochip, 'port' and 'pin' names for hardware - Renamed DT bindings file - Dropped fallback-only DT compatible - Various code style clean-ups Changes in v2: - Clarify structure and usage of IMR registers - Added Linus' Reviewed-by tags Sander Vanheule (2): dt-bindings: gpio: Binding for Realtek Otto GPIO gpio: Add Realtek Otto GPIO support .../bindings/gpio/realtek,otto-gpio.yaml | 78 + drivers/gpio/Kconfig | 13 + drivers/gpio/Makefile | 1 + drivers/gpio/gpio-realtek-otto.c | 326 ++ 4 files changed, 418 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml create mode 100644 drivers/gpio/gpio-realtek-otto.c -- 2.30.2
[PATCH v5 2/2] gpio: Add Realtek Otto GPIO support
Realtek MIPS SoCs (platform name Otto) have GPIO controllers with up to 64 GPIOs, divided over two banks. Each bank has a set of registers for 32 GPIOs, with support for edge-triggered interrupts. Each GPIO bank consists of four 8-bit GPIO ports (ABCD and EFGH). Most registers pack one bit per GPIO, except for the IMR register, which packs two bits per GPIO (AB-CD). Although the byte order is currently assumed to have port A..D at offset 0x0..0x3, this has been observed to be reversed on other, Lexra-based, SoCs (e.g. RTL8196E/97D/97F). Interrupt support is disabled for the fallback devicetree-compatible 'realtek,otto-gpio'. This allows for quick support of GPIO banks in which the byte order would be unknown. In this case, the port ordering in the IMR registers may not match the reversed order in the other registers (DCBA, and BA-DC or DC-BA). Signed-off-by: Sander Vanheule Reviewed-by: Linus Walleij --- drivers/gpio/Kconfig | 13 ++ drivers/gpio/Makefile| 1 + drivers/gpio/gpio-realtek-otto.c | 326 +++ 3 files changed, 340 insertions(+) create mode 100644 drivers/gpio/gpio-realtek-otto.c diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig index e3607ec4c2e8..6fb13d6507db 100644 --- a/drivers/gpio/Kconfig +++ b/drivers/gpio/Kconfig @@ -502,6 +502,19 @@ config GPIO_RDA help Say Y here to support RDA Micro GPIO controller. +config GPIO_REALTEK_OTTO + tristate "Realtek Otto GPIO support" + depends on MACH_REALTEK_RTL + default MACH_REALTEK_RTL + select GPIO_GENERIC + select GPIOLIB_IRQCHIP + help + The GPIO controller on the Otto MIPS platform supports up to two + banks of 32 GPIOs, with edge triggered interrupts. The 32 GPIOs + are grouped in four 8-bit wide ports. + + When built as a module, the module will be called realtek_otto_gpio. + config GPIO_REG bool help diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile index c58a90a3c3b1..8ace5934e3c3 100644 --- a/drivers/gpio/Makefile +++ b/drivers/gpio/Makefile @@ -124,6 +124,7 @@ obj-$(CONFIG_GPIO_RC5T583) += gpio-rc5t583.o obj-$(CONFIG_GPIO_RCAR)+= gpio-rcar.o obj-$(CONFIG_GPIO_RDA) += gpio-rda.o obj-$(CONFIG_GPIO_RDC321X) += gpio-rdc321x.o +obj-$(CONFIG_GPIO_REALTEK_OTTO)+= gpio-realtek-otto.o obj-$(CONFIG_GPIO_REG) += gpio-reg.o obj-$(CONFIG_ARCH_SA1100) += gpio-sa1100.o obj-$(CONFIG_GPIO_SAMA5D2_PIOBU) += gpio-sama5d2-piobu.o diff --git a/drivers/gpio/gpio-realtek-otto.c b/drivers/gpio/gpio-realtek-otto.c new file mode 100644 index ..05ce5d48e121 --- /dev/null +++ b/drivers/gpio/gpio-realtek-otto.c @@ -0,0 +1,326 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include +#include +#include +#include +#include +#include + +/* + * Total register block size is 0x1C for one bank of four ports (A, B, C, D). + * An optional second bank, with ports E, F, G, and H, may be present, starting + * at register offset 0x1C. + */ + +/* + * Pin select: (0) "normal", (1) "dedicate peripheral" + * Not used on RTL8380/RTL8390, peripheral selection is managed by control bits + * in the peripheral registers. + */ +#define REALTEK_GPIO_REG_CNR 0x00 +/* Clear bit (0) for input, set bit (1) for output */ +#define REALTEK_GPIO_REG_DIR 0x08 +#define REALTEK_GPIO_REG_DATA 0x0C +/* Read bit for IRQ status, write 1 to clear IRQ */ +#define REALTEK_GPIO_REG_ISR 0x10 +/* Two bits per GPIO in IMR registers */ +#define REALTEK_GPIO_REG_IMR 0x14 +#define REALTEK_GPIO_REG_IMR_AB0x14 +#define REALTEK_GPIO_REG_IMR_CD0x18 +#define REALTEK_GPIO_IMR_LINE_MASK GENMASK(1, 0) +#define REALTEK_GPIO_IRQ_EDGE_FALLING 1 +#define REALTEK_GPIO_IRQ_EDGE_RISING 2 +#define REALTEK_GPIO_IRQ_EDGE_BOTH 3 + +#define REALTEK_GPIO_MAX 32 +#define REALTEK_GPIO_PORTS_PER_BANK4 + +/** + * realtek_gpio_ctrl - Realtek Otto GPIO driver data + * + * @gc: Associated gpio_chip instance + * @base: Base address of the register block for a GPIO bank + * @lock: Lock for accessing the IRQ registers and values + * @intr_mask: Mask for interrupts lines + * @intr_type: Interrupt type selection + * + * Because the interrupt mask register (IMR) combines the function of IRQ type + * selection and masking, two extra values are stored. @intr_mask is used to + * mask/unmask the interrupts for a GPIO port, and @intr_type is used to store + * the selected interrupt types. The logical AND of these values is written to + * IMR on changes. + */ +struct realtek_gpio_ctrl { + struct gpio_chip gc; + void __iomem *base; + raw_spinlock_t lock; + u16 intr_mask[REALTEK_GPIO_PORTS_PER_BANK]; + u16 intr_type[REALTEK_GPIO_PORTS_PER_BANK]; +}; + +/
[PATCH v5 1/2] dt-bindings: gpio: Binding for Realtek Otto GPIO
Add a binding description for Realtek's GPIO controller found on several of their MIPS-based SoCs (codenamed Otto), such as the RTL838x and RTL839x series of switch SoCs. A fallback binding 'realtek,otto-gpio' is provided for cases where the actual port ordering is not known yet, and enabling the interrupt controller may result in uncaught interrupts. Signed-off-by: Sander Vanheule Reviewed-by: Linus Walleij Reviewed-by: Rob Herring --- .../bindings/gpio/realtek,otto-gpio.yaml | 78 +++ 1 file changed, 78 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml diff --git a/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml b/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml new file mode 100644 index ..100f20cebd76 --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml @@ -0,0 +1,78 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpio/realtek,otto-gpio.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Realtek Otto GPIO controller + +maintainers: + - Sander Vanheule + - Bert Vermeulen + +description: | + Realtek's GPIO controller on their MIPS switch SoCs (Otto platform) consists + of two banks of 32 GPIOs. These GPIOs can generate edge-triggered interrupts. + Each bank's interrupts are cascased into one interrupt line on the parent + interrupt controller, if provided. + This binding allows defining a single bank in the devicetree. The interrupt + controller is not supported on the fallback compatible name, which only + allows for GPIO port use. + +properties: + $nodename: +pattern: "^gpio@[0-9a-f]+$" + + compatible: +items: + - enum: + - realtek,rtl8380-gpio + - realtek,rtl8390-gpio + - const: realtek,otto-gpio + + reg: +maxItems: 1 + + "#gpio-cells": +const: 2 + + gpio-controller: true + + ngpios: +minimum: 1 +maximum: 32 + + interrupt-controller: true + + "#interrupt-cells": +const: 2 + + interrupts: +maxItems: 1 + +required: + - compatible + - reg + - "#gpio-cells" + - gpio-controller + +additionalProperties: false + +dependencies: + interrupt-controller: [ interrupts ] + +examples: + - | + gpio@3500 { +compatible = "realtek,rtl8380-gpio", "realtek,otto-gpio"; +reg = <0x3500 0x1c>; +gpio-controller; +#gpio-cells = <2>; +ngpios = <24>; +interrupt-controller; +#interrupt-cells = <2>; +interrupt-parent = <&rtlintc>; +interrupts = <23>; + }; + +... -- 2.30.2
Re: [PATCH v4 2/2] gpio: Add Realtek Otto GPIO support
Hi Andy, Thank you for clarifying your remarks. I'll support for building as a module, and have implemented the gpio_irq_chip->init_hw() callback. On Mon, 2021-03-29 at 13:26 +0300, Andy Shevchenko wrote: > On Fri, Mar 26, 2021 at 11:11 PM Sander Vanheule < > san...@svanheule.net> wrote: > > On Fri, 2021-03-26 at 20:19 +0200, Andy Shevchenko wrote: > > > On Fri, Mar 26, 2021 at 2:05 PM Sander Vanheule < > > > san...@svanheule.net> > > > wrote: > > > > +static const struct of_device_id realtek_gpio_of_match[] = { > > > > + { .compatible = "realtek,otto-gpio" }, > > > > + { > > > > + .compatible = "realtek,rtl8380-gpio", > > > > + .data = (void *)GPIO_INTERRUPTS > > > > > > Not sure why this flag is needed right now. Drop it completely for > > > good. > > > > + }, > > > > + { > > > > + .compatible = "realtek,rtl8390-gpio", > > > > + .data = (void *)GPIO_INTERRUPTS > > > > > > Ditto > > > > Linus Walleij asked this question too after v1: > > https://lore.kernel.org/linux-gpio/e9f0651e5fb52b7d56361ceb30b41759b6f2ec13.ca...@svanheule.net/ > > > > Note that the fall-back compatible doesn't have this flag set. > > AFAICS all, except one have this flag, I suggest you to do other way > around, i.e. check compatible string in the code. Or do something more > clever. What happens if you have this flag enabled for the fallback > node? > > If two people ask the same, it might be a smoking gun. > Testing for the fallback wouldn't work, since of_device_is_compatible() would always match. Setting the (inverse) flag only on the fallback would indeed reduce the clutter. If the port order is reversed w.r.t. to the current implementation, enabling a GPIO+IRQ would enable the same pin on a different port. I don't think the result would be catastrophical, but it would result in unexpected behaviour. When A0 and C0 are then enabled, A0 interrupts would actually come from C0, and vice versa. Intended port | A | B | C | D -+---+---+---+--- Actual GPIO port | D | C | B | A Actual IRQ port | B | A | D | C If only the actual GPIO ports change, at least you can still use a modified GPIO line number and polling. The user could just leave out the optional irq-controller from the devicetree, but I would rather have it enforced in some way. Best, Sander
Re: [PATCH v4 2/2] gpio: Add Realtek Otto GPIO support
Hi Andy, Replies inline below. On Fri, 2021-03-26 at 20:19 +0200, Andy Shevchenko wrote: > On Fri, Mar 26, 2021 at 2:05 PM Sander Vanheule > wrote: > > > +config GPIO_REALTEK_OTTO > > + bool "Realtek Otto GPIO support" > > Why not module? This driver is only useful on a few specific MIPS SoCs, where this GPIO peripheral is a part of that SoC. What would be the point of providing this driver as a module? > > > + depends on MACH_REALTEK_RTL > > + default MACH_REALTEK_RTL > > + select GPIO_GENERIC > > + select GPIOLIB_IRQCHIP > > > + help > > + The GPIO controller on the Otto MIPS platform supports up > > to two > > + banks of 32 GPIOs, with edge triggered interrupts. The 32 > > GPIOs > > + are grouped in four 8-bit wide ports. > > When allowing module build, here you may add what will be the name of > it. > > ... > > > +/* > > + * Total register block size is 0x1C for four ports. > > + * On the RTL8380/RLT8390 platforms port A, B, and C are > > implemented. > > D? No port D on 8380/8390. Only 24 GPIO lines are present on these platforms. I'll rephrase this comment. > > > + * RTL8389 and RTL8328 implement a second bank with ports E, F, G, > > and H. > > + * > > + * Port information is stored with the first port at offset 0, > > followed by the > > + * second, etc. Most registers store one bit per GPIO and should be > > read out in > > + * reversed endian order. The two interrupt mask registers store two > > bits per > > + * GPIO, and should be manipulated with swahw32, if required. > > + */ This reference to swahw32 and the include of linux/swab.h will be dropped. > > > +/* > > Seems like kernel doc format with missed ** header and properly formed > summary and description. I'll reformat. > > > + * Realtek GPIO driver data > > + * Because the interrupt mask register (IMR) combines the function > > of > > + * IRQ type selection and masking, two extra values are stored. > > + * intr_mask is used to mask/unmask the interrupts for certain > > GPIO, > > + * and intr_type is used to store the selected interrupt types. > > The > > + * logical AND of these values is written to IMR on changes. > > + * > > + * @gc Associated gpio_chip instance > > + * @base Base address of the register block > > + * @lock Lock for accessing the IRQ registers and values > > + * @intr_mask Mask for GPIO interrupts > > + * @intr_type GPIO interrupt type selection > > + */ > > +struct realtek_gpio_ctrl { > > + struct gpio_chip gc; > > + void __iomem *base; > > + raw_spinlock_t lock; > > + u16 intr_mask[REALTEK_GPIO_PORTS_PER_BANK]; > > + u16 intr_type[REALTEK_GPIO_PORTS_PER_BANK]; > > +}; > > + > > +enum realtek_gpio_flags { > > + GPIO_INTERRUPTS = BIT(0), > > +}; > > ... See below. I'll add a comment. > > > +static struct realtek_gpio_ctrl *irq_data_to_ctrl(struct irq_data > > *data) > > +{ > > + struct gpio_chip *gc = irq_data_get_irq_chip_data(data); > > + > > + return container_of(gc, struct realtek_gpio_ctrl, gc); > > +} > > > +static unsigned int line_to_port(unsigned int line) > > +{ > > + return line / 8; > > +} > > + > > +static unsigned int line_to_port_pin(unsigned int line) > > +{ > > + return line % 8; > > +} > > These are useless. Just use them inline. I added these as the alternative of the /16 and %16 I had for the IMR offsets in v2. The function names tell the reader _why_ I'm doing the division and modulo operations, but I guess a properly named variable would do the same. > > > +static u8 read_u8_reg(void __iomem *reg, unsigned int port) > > +{ > > + return ioread8(reg + port); > > +} > > + > > +static void write_u8_reg(void __iomem *reg, unsigned int port, u8 > > value) > > +{ > > + iowrite8(value, reg + port); > > +} > > + > > +static void write_u16_reg(void __iomem *reg, unsigned int port, u16 > > value) > > +{ > > + iowrite16(value, reg + 2 * port); > > +} > > What's the point? You better provide a controller structure as a > parameter. Look into other drivers. There are plenty of examples how > to provide IO accessors in smarter way. Since these are currently only really used for IMR and ISR, I'll fold them into their accessor functions for v5. > > > +static void realtek_gpio_write
[PATCH v4 2/2] gpio: Add Realtek Otto GPIO support
Realtek MIPS SoCs (platform name Otto) have GPIO controllers with up to 64 GPIOs, divided over two banks. Each bank has a set of registers for 32 GPIOs, with support for edge-triggered interrupts. Each GPIO bank consists of four 8-bit GPIO ports (ABCD and EFGH). Most registers pack one bit per GPIO, except for the IMR register, which packs two bits per GPIO (AB-CD). Although the byte order is currently assumed to have port A..D at offset 0x0..0x3, this has been observed to be reversed on other, Lexra-based, SoCs (e.g. RTL8196E/97D/97F). Interrupt support is disabled for the fallback devicetree-compatible 'realtek,otto-gpio'. This allows for quick support of GPIO banks in which the byte order would be unknown. In this case, the port ordering in the IMR registers may not match the reversed order in the other registers (DCBA, and BA-DC or DC-BA). Signed-off-by: Sander Vanheule Reviewed-by: Linus Walleij --- drivers/gpio/Kconfig | 11 ++ drivers/gpio/Makefile| 1 + drivers/gpio/gpio-realtek-otto.c | 330 +++ 3 files changed, 342 insertions(+) create mode 100644 drivers/gpio/gpio-realtek-otto.c diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig index e3607ec4c2e8..d3be17812f94 100644 --- a/drivers/gpio/Kconfig +++ b/drivers/gpio/Kconfig @@ -502,6 +502,17 @@ config GPIO_RDA help Say Y here to support RDA Micro GPIO controller. +config GPIO_REALTEK_OTTO + bool "Realtek Otto GPIO support" + depends on MACH_REALTEK_RTL + default MACH_REALTEK_RTL + select GPIO_GENERIC + select GPIOLIB_IRQCHIP + help + The GPIO controller on the Otto MIPS platform supports up to two + banks of 32 GPIOs, with edge triggered interrupts. The 32 GPIOs + are grouped in four 8-bit wide ports. + config GPIO_REG bool help diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile index c58a90a3c3b1..8ace5934e3c3 100644 --- a/drivers/gpio/Makefile +++ b/drivers/gpio/Makefile @@ -124,6 +124,7 @@ obj-$(CONFIG_GPIO_RC5T583) += gpio-rc5t583.o obj-$(CONFIG_GPIO_RCAR)+= gpio-rcar.o obj-$(CONFIG_GPIO_RDA) += gpio-rda.o obj-$(CONFIG_GPIO_RDC321X) += gpio-rdc321x.o +obj-$(CONFIG_GPIO_REALTEK_OTTO)+= gpio-realtek-otto.o obj-$(CONFIG_GPIO_REG) += gpio-reg.o obj-$(CONFIG_ARCH_SA1100) += gpio-sa1100.o obj-$(CONFIG_GPIO_SAMA5D2_PIOBU) += gpio-sama5d2-piobu.o diff --git a/drivers/gpio/gpio-realtek-otto.c b/drivers/gpio/gpio-realtek-otto.c new file mode 100644 index ..07641a1686eb --- /dev/null +++ b/drivers/gpio/gpio-realtek-otto.c @@ -0,0 +1,330 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Total register block size is 0x1C for four ports. + * On the RTL8380/RLT8390 platforms port A, B, and C are implemented. + * RTL8389 and RTL8328 implement a second bank with ports E, F, G, and H. + * + * Port information is stored with the first port at offset 0, followed by the + * second, etc. Most registers store one bit per GPIO and should be read out in + * reversed endian order. The two interrupt mask registers store two bits per + * GPIO, and should be manipulated with swahw32, if required. + */ + +/* + * Pin select: (0) "normal", (1) "dedicate peripheral" + * Not used on RTL8380/RTL8390, peripheral selection is managed by control bits + * in the peripheral registers. + */ +#define REALTEK_GPIO_REG_CNR 0x00 +/* Clear bit (0) for input, set bit (1) for output */ +#define REALTEK_GPIO_REG_DIR 0x08 +#define REALTEK_GPIO_REG_DATA 0x0C +/* Read bit for IRQ status, write 1 to clear IRQ */ +#define REALTEK_GPIO_REG_ISR 0x10 +/* Two bits per GPIO in IMR registers */ +#define REALTEK_GPIO_REG_IMR 0x14 +#define REALTEK_GPIO_REG_IMR_AB0x14 +#define REALTEK_GPIO_REG_IMR_CD0x18 +#define REALTEK_GPIO_IMR_LINE_MASK GENMASK(1, 0) +#define REALTEK_GPIO_IRQ_EDGE_FALLING 1 +#define REALTEK_GPIO_IRQ_EDGE_RISING 2 +#define REALTEK_GPIO_IRQ_EDGE_BOTH 3 + +#define REALTEK_GPIO_MAX 32 +#define REALTEK_GPIO_PORTS_PER_BANK4 + +/* + * Realtek GPIO driver data + * Because the interrupt mask register (IMR) combines the function of + * IRQ type selection and masking, two extra values are stored. + * intr_mask is used to mask/unmask the interrupts for certain GPIO, + * and intr_type is used to store the selected interrupt types. The + * logical AND of these values is written to IMR on changes. + * + * @gc Associated gpio_chip instance + * @base Base address of the register block + * @lock Lock for accessing the IRQ registers and values + * @intr_mask Mask for GPIO interrupts + * @intr_type GPIO interrupt type selection + */ +struct realtek_gpio_ctrl { +
[PATCH v4 1/2] dt-bindings: gpio: Binding for Realtek Otto GPIO
Add a binding description for Realtek's GPIO controller found on several of their MIPS-based SoCs (codenamed Otto), such as the RTL838x and RTL839x series of switch SoCs. A fallback binding 'realtek,otto-gpio' is provided for cases where the actual port ordering is not known yet, and enabling the interrupt controller may result in uncaught interrupts. Signed-off-by: Sander Vanheule Reviewed-by: Linus Walleij --- .../bindings/gpio/realtek,otto-gpio.yaml | 78 +++ 1 file changed, 78 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml diff --git a/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml b/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml new file mode 100644 index ..100f20cebd76 --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml @@ -0,0 +1,78 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpio/realtek,otto-gpio.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Realtek Otto GPIO controller + +maintainers: + - Sander Vanheule + - Bert Vermeulen + +description: | + Realtek's GPIO controller on their MIPS switch SoCs (Otto platform) consists + of two banks of 32 GPIOs. These GPIOs can generate edge-triggered interrupts. + Each bank's interrupts are cascased into one interrupt line on the parent + interrupt controller, if provided. + This binding allows defining a single bank in the devicetree. The interrupt + controller is not supported on the fallback compatible name, which only + allows for GPIO port use. + +properties: + $nodename: +pattern: "^gpio@[0-9a-f]+$" + + compatible: +items: + - enum: + - realtek,rtl8380-gpio + - realtek,rtl8390-gpio + - const: realtek,otto-gpio + + reg: +maxItems: 1 + + "#gpio-cells": +const: 2 + + gpio-controller: true + + ngpios: +minimum: 1 +maximum: 32 + + interrupt-controller: true + + "#interrupt-cells": +const: 2 + + interrupts: +maxItems: 1 + +required: + - compatible + - reg + - "#gpio-cells" + - gpio-controller + +additionalProperties: false + +dependencies: + interrupt-controller: [ interrupts ] + +examples: + - | + gpio@3500 { +compatible = "realtek,rtl8380-gpio", "realtek,otto-gpio"; +reg = <0x3500 0x1c>; +gpio-controller; +#gpio-cells = <2>; +ngpios = <24>; +interrupt-controller; +#interrupt-cells = <2>; +interrupt-parent = <&rtlintc>; +interrupts = <23>; + }; + +... -- 2.30.2
[PATCH v4 0/2] Add Realtek Otto GPIO support
Add support for the GPIO controller employed by Realtek in multiple series of MIPS SoCs. These include the supported RTL838x and RTL839x. The register layout also matches the one found in the GPIO controller of other (Lexra-based) SoCs such as RTL8196E, RTL8197D, and RTL8197F. For the platform name 'otto', I am not aware of any official resources as to what hardware this specifically applies to. However, in all of the GPL archives we've received, from vendors using compatible SoCs in their design, the platform under the MIPS architecture is referred to by this name. The GPIO ports have been tested on a Zyxel GS1900-8 (RTL8380), and Zyxel GS1900-48 (RTL8393). Furthermore, the GPIO ports and interrupt controller have been tested on a Netgear GS110TPPv1 (RTL8381). Changes in v4: - Fix pointer notation style - Drop unused read_u16_reg() function - Drop 'inline' specifier from functions Changes in v3: - Remove OF dependencies in driver probe - Don't accept IRQ_TYPE_NONE as a valid interrupt type - Remove (now unused) dev property from control structure - Use u8/u16 port registers, instead of raw u32 registers - Use 'line' name for gpiochip, 'port' and 'pin' names for hardware - Renamed DT bindings file - Dropped fallback-only DT compatible - Various code style clean-ups Changes in v2: - Clarify structure and usage of IMR registers - Added Linus' Reviewed-by tags Sander Vanheule (2): dt-bindings: gpio: Binding for Realtek Otto GPIO gpio: Add Realtek Otto GPIO support .../bindings/gpio/realtek,otto-gpio.yaml | 78 + drivers/gpio/Kconfig | 11 + drivers/gpio/Makefile | 1 + drivers/gpio/gpio-realtek-otto.c | 330 ++ 4 files changed, 420 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml create mode 100644 drivers/gpio/gpio-realtek-otto.c -- 2.30.2
Re: [PATCH v3 2/2] gpio: Add Realtek Otto GPIO support
On Wed, 2021-03-24 at 22:22 +0100, Sander Vanheule wrote: > +static inline u8 read_u8_reg(void __iomem* reg, unsigned int port) > +{ > + return ioread8(reg + port); > +} > + > +static inline void write_u8_reg(void __iomem* reg, unsigned int port, > u8 value) > +{ > + iowrite8(value, reg + port); > +} > + > +static inline u16 read_u16_reg(void __iomem* reg, unsigned int port) > +{ > + return ioread16(reg + 2 * port); > +} > + > +static inline void write_u16_reg(void __iomem* reg, unsigned int > port, u16 value) > +{ > + iowrite16(value, reg + 2 * port); > +} Of course I only noticed this after sending v3, but these functions should have "void __iomem *reg" instead. I can fix this in a next version. Best, Sander
[PATCH v3 0/2] Add Realtek Otto GPIO support
Add support for the GPIO controller employed by Realtek in multiple series of MIPS SoCs. These include the supported RTL838x and RTL839x. The register layout also matches the one found in the GPIO controller of other (Lexra-based) SoCs such as RTL8196E, RTL8197D, and RTL8197F. For the platform name 'otto', I am not aware of any official resources as to what hardware this specifically applies to. However, in all of the GPL archives we've received, from vendors using compatible SoCs in their design, the platform under the MIPS architecture is referred to by this name. The GPIO ports have been tested on a Zyxel GS1900-8 (RTL8380), and Zyxel GS1900-48 (RTL8393). Furthermore, the GPIO ports and interrupt controller have been tested on a Netgear GS110TPPv1 (RTL8381). Changes in v3: - Remove OF dependencies in driver probe - Don't accept IRQ_TYPE_NONE as a valid interrupt type - Remove (now unused) dev property from control structure - Use u8/u16 port registers, instead of raw u32 registers - Use 'line' name for gpiochip, 'port' and 'pin' names for hardware - Renamed DT bindings file - Dropped fallback-only DT compatible - Various code style clean-ups Changes in v2: - Clarify structure and usage of IMR registers - Added Linus' Reviewed-by tags Sander Vanheule (2): dt-bindings: gpio: Binding for Realtek Otto GPIO gpio: Add Realtek Otto GPIO support .../bindings/gpio/realtek,otto-gpio.yaml | 78 drivers/gpio/Kconfig | 11 + drivers/gpio/Makefile | 1 + drivers/gpio/gpio-realtek-otto.c | 335 ++ 4 files changed, 425 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml create mode 100644 drivers/gpio/gpio-realtek-otto.c -- 2.30.2
[PATCH v3 2/2] gpio: Add Realtek Otto GPIO support
Realtek MIPS SoCs (platform name Otto) have GPIO controllers with up to 64 GPIOs, divided over two banks. Each bank has a set of registers for 32 GPIOs, with support for edge-triggered interrupts. Each GPIO bank consists of four 8-bit GPIO ports (ABCD and EFGH). Most registers pack one bit per GPIO, except for the IMR register, which packs two bits per GPIO (AB-CD). Although the byte order is currently assumed to have port A..D at offset 0x0..0x3, this has been observed to be reversed on other, Lexra-based, SoCs (e.g. RTL8196E/97D/97F). Interrupt support is disabled for the fallback devicetree-compatible 'realtek,otto-gpio'. This allows for quick support of GPIO banks in which the byte order would be unknown. In this case, the port ordering in the IMR registers may not match the reversed order in the other registers (DCBA, and BA-DC or DC-BA). Signed-off-by: Sander Vanheule Reviewed-by: Linus Walleij --- drivers/gpio/Kconfig | 11 + drivers/gpio/Makefile| 1 + drivers/gpio/gpio-realtek-otto.c | 335 +++ 3 files changed, 347 insertions(+) create mode 100644 drivers/gpio/gpio-realtek-otto.c diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig index e3607ec4c2e8..d3be17812f94 100644 --- a/drivers/gpio/Kconfig +++ b/drivers/gpio/Kconfig @@ -502,6 +502,17 @@ config GPIO_RDA help Say Y here to support RDA Micro GPIO controller. +config GPIO_REALTEK_OTTO + bool "Realtek Otto GPIO support" + depends on MACH_REALTEK_RTL + default MACH_REALTEK_RTL + select GPIO_GENERIC + select GPIOLIB_IRQCHIP + help + The GPIO controller on the Otto MIPS platform supports up to two + banks of 32 GPIOs, with edge triggered interrupts. The 32 GPIOs + are grouped in four 8-bit wide ports. + config GPIO_REG bool help diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile index c58a90a3c3b1..8ace5934e3c3 100644 --- a/drivers/gpio/Makefile +++ b/drivers/gpio/Makefile @@ -124,6 +124,7 @@ obj-$(CONFIG_GPIO_RC5T583) += gpio-rc5t583.o obj-$(CONFIG_GPIO_RCAR)+= gpio-rcar.o obj-$(CONFIG_GPIO_RDA) += gpio-rda.o obj-$(CONFIG_GPIO_RDC321X) += gpio-rdc321x.o +obj-$(CONFIG_GPIO_REALTEK_OTTO)+= gpio-realtek-otto.o obj-$(CONFIG_GPIO_REG) += gpio-reg.o obj-$(CONFIG_ARCH_SA1100) += gpio-sa1100.o obj-$(CONFIG_GPIO_SAMA5D2_PIOBU) += gpio-sama5d2-piobu.o diff --git a/drivers/gpio/gpio-realtek-otto.c b/drivers/gpio/gpio-realtek-otto.c new file mode 100644 index ..0714d54e08d1 --- /dev/null +++ b/drivers/gpio/gpio-realtek-otto.c @@ -0,0 +1,335 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Total register block size is 0x1C for four ports. + * On the RTL8380/RLT8390 platforms port A, B, and C are implemented. + * RTL8389 and RTL8328 implement a second bank with ports E, F, G, and H. + * + * Port information is stored with the first port at offset 0, followed by the + * second, etc. Most registers store one bit per GPIO and should be read out in + * reversed endian order. The two interrupt mask registers store two bits per + * GPIO, and should be manipulated with swahw32, if required. + */ + +/* + * Pin select: (0) "normal", (1) "dedicate peripheral" + * Not used on RTL8380/RTL8390, peripheral selection is managed by control bits + * in the peripheral registers. + */ +#define REALTEK_GPIO_REG_CNR 0x00 +/* Clear bit (0) for input, set bit (1) for output */ +#define REALTEK_GPIO_REG_DIR 0x08 +#define REALTEK_GPIO_REG_DATA 0x0C +/* Read bit for IRQ status, write 1 to clear IRQ */ +#define REALTEK_GPIO_REG_ISR 0x10 +/* Two bits per GPIO in IMR registers */ +#define REALTEK_GPIO_REG_IMR 0x14 +#define REALTEK_GPIO_REG_IMR_AB0x14 +#define REALTEK_GPIO_REG_IMR_CD0x18 +#define REALTEK_GPIO_IMR_LINE_MASK GENMASK(1, 0) +#define REALTEK_GPIO_IRQ_EDGE_FALLING 1 +#define REALTEK_GPIO_IRQ_EDGE_RISING 2 +#define REALTEK_GPIO_IRQ_EDGE_BOTH 3 + +#define REALTEK_GPIO_MAX 32 +#define REALTEK_GPIO_PORTS_PER_BANK4 + +/* + * Realtek GPIO driver data + * Because the interrupt mask register (IMR) combines the function of + * IRQ type selection and masking, two extra values are stored. + * intr_mask is used to mask/unmask the interrupts for certain GPIO, + * and intr_type is used to store the selected interrupt types. The + * logical AND of these values is written to IMR on changes. + * + * @gc Associated gpio_chip instance + * @base Base address of the register block + * @lock Lock for accessing the IRQ registers and values + * @intr_mask Mask for GPIO interrupts + * @intr_type GPIO interrupt type selection + */ +struct realtek_gpio_ctrl { +
[PATCH v3 1/2] dt-bindings: gpio: Binding for Realtek Otto GPIO
Add a binding description for Realtek's GPIO controller found on several of their MIPS-based SoCs (codenamed Otto), such as the RTL838x and RTL839x series of switch SoCs. A fallback binding 'realtek,otto-gpio' is provided for cases where the actual port ordering is not known yet, and enabling the interrupt controller may result in uncaught interrupts. Signed-off-by: Sander Vanheule Reviewed-by: Linus Walleij --- .../bindings/gpio/realtek,otto-gpio.yaml | 78 +++ 1 file changed, 78 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml diff --git a/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml b/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml new file mode 100644 index ..100f20cebd76 --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/realtek,otto-gpio.yaml @@ -0,0 +1,78 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpio/realtek,otto-gpio.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Realtek Otto GPIO controller + +maintainers: + - Sander Vanheule + - Bert Vermeulen + +description: | + Realtek's GPIO controller on their MIPS switch SoCs (Otto platform) consists + of two banks of 32 GPIOs. These GPIOs can generate edge-triggered interrupts. + Each bank's interrupts are cascased into one interrupt line on the parent + interrupt controller, if provided. + This binding allows defining a single bank in the devicetree. The interrupt + controller is not supported on the fallback compatible name, which only + allows for GPIO port use. + +properties: + $nodename: +pattern: "^gpio@[0-9a-f]+$" + + compatible: +items: + - enum: + - realtek,rtl8380-gpio + - realtek,rtl8390-gpio + - const: realtek,otto-gpio + + reg: +maxItems: 1 + + "#gpio-cells": +const: 2 + + gpio-controller: true + + ngpios: +minimum: 1 +maximum: 32 + + interrupt-controller: true + + "#interrupt-cells": +const: 2 + + interrupts: +maxItems: 1 + +required: + - compatible + - reg + - "#gpio-cells" + - gpio-controller + +additionalProperties: false + +dependencies: + interrupt-controller: [ interrupts ] + +examples: + - | + gpio@3500 { +compatible = "realtek,rtl8380-gpio", "realtek,otto-gpio"; +reg = <0x3500 0x1c>; +gpio-controller; +#gpio-cells = <2>; +ngpios = <24>; +interrupt-controller; +#interrupt-cells = <2>; +interrupt-parent = <&rtlintc>; +interrupts = <23>; + }; + +... -- 2.30.2
Re: [PATCH v2 2/2] gpio: Add Realtek Otto GPIO support
On Fri, 2021-03-19 at 23:24 +0200, Andy Shevchenko wrote: > On Fri, Mar 19, 2021 at 11:20 PM Sander Vanheule < > san...@svanheule.net> wrote: > > On Fri, 2021-03-19 at 19:57 +0200, Andy Shevchenko wrote: > > > On Fri, Mar 19, 2021 at 5:51 PM Sander Vanheule > > > wrote: > > > > On Wed, 2021-03-17 at 15:08 +0200, Andy Shevchenko wrote: > > > > > On Mon, Mar 15, 2021 at 11:11 PM Sander Vanheule < > > > > > san...@svanheule.net> wrote: > > ... > > > > > > > + return swab32(readl(ctrl->base + > > > > > > REALTEK_GPIO_REG_ISR)); > > > > > > > > > > Why swab?! How is this supposed to work on BE CPUs? > > > > > Ditto for all swabXX() usage. > > > > > > > > My use of swab32/swahw32 has little to do with the CPU being BE > > > > or > > > > LE, > > > > but more with the register packing in the GPIO peripheral. > > > > > > > > The supported SoCs have port layout A-B-C-D in the registers, > > > > where > > > > firmware built with Realtek's SDK always denotes A0 as the first > > > > GPIO > > > > line. So bit 24 in a register has the value for A0 (with the > > > > exception > > > > of the IMR register). > > > > > > > > I wrote these wrapper functions to be able to use the BIT() macro > > > > with > > > > the GPIO line number, similar to how gpio-mmio uses ioread32be() > > > > when > > > > the BGPIOF_BIG_ENDIAN_BYTE_ORDER flag is used. > > > > > > > > For the IMR register, port A again comes first, but is now 16 > > > > bits > > > > wide > > > > instead of 8, with A0 at bits 16:17. That's why swahw32 is used > > > > for > > > > this register. > > > > > > > > On the currently unsupported RTL9300-series, the port layout is > > > > reversed: D-C-B-A. GPIO line A0 is then at bit 0, so the swapping > > > > functions won't be required. When support for this alternate port > > > > layout is added, some code will need to be added to differentiate > > > > between the two cases. > > > > > > Yes, you have different endianess on the hardware level, why not to > > > use the proper accessors (with or without utilization of the above > > > mentioned BGPIOF_BIG_ENDIAN_BYTE_ORDER)? > > > > The point I was trying to make, is that it isn't an endianess issue. > > I > > shouldn't have used a register with single byte values to try to > > illustrate that. > > > > Consider instead the interrupt masking registers. To write the IMR > > bits > > for port A (GPIO 0-7), a 16-bit value must be written. This value > > (e.g. > > u16 port_a_imr) is always BE, independent of the packing order of the > > ports in the registers: > > > > // On RTL8380: port A is in the upper word > > writew(port_a_imr, base + OFFSET_IMR_AB); > > > > // On RTL9300: port A is in the lower word > > writew(port_a_imr, base + OFFSET_IMR_AB + 2); > > > > I want the low GPIO lines to be in the lower half-word, so I can > > manipulate GPIO lines 0-15 with simple mask and shift operations. > > > > It just so happens, that all registers needed by bgpio_init contain > > single-byte values. With BGPIO_BIG_ENDIAN_BYTE_ORDER the port order > > is > > reversed as required, but it's a bit of a misnomer here. > > How many registers (per GPIO / port) do you have? > Can you list them and show endianess of the data for each of them and > for old and new hardware (something like a 3 column table)? Each GPIO bank, with 32 GPIO lines, consists of four 8-line ports. There are seven registers per port, but only five are used: || Data| RTL8380| RTL9300 Reg| Offset | type| byte order | byte order ---++-++--- DIR| 0x08 | 4 * u8 | A-B-C-D| D-C-B-A DATA | 0x0C | 4 * u8 | A-B-C-D| D-C-B-A ISR| 0x10 | 4 * u8 | A-B-C-D| D-C-B-A IMR_AB | 0x14 | 2 * u16 | A-A-B-B| B-B-A-A IMR_CD | 0x18 | 2 * u16 | C-C-D-D| D-D-C-C The unused other registers are all 4*u8. A-B-C-D means: (A << 24) | (B << 16) | (C << 8) | D A-A-B-B means: (A << 16) | B -- Best, Sander
Re: [PATCH v2 2/2] gpio: Add Realtek Otto GPIO support
On Fri, 2021-03-19 at 19:57 +0200, Andy Shevchenko wrote: > On Fri, Mar 19, 2021 at 5:51 PM Sander Vanheule > wrote: > > On Wed, 2021-03-17 at 15:08 +0200, Andy Shevchenko wrote: > > > On Mon, Mar 15, 2021 at 11:11 PM Sander Vanheule < > > > san...@svanheule.net> wrote: > > ... > > > > > +#include > > > > > > Not sure why you need this? See below. > > > > > + return swab32(readl(ctrl->base + > > > > REALTEK_GPIO_REG_ISR)); > > > > > > Why swab?! How is this supposed to work on BE CPUs? > > > Ditto for all swabXX() usage. > > > > My use of swab32/swahw32 has little to do with the CPU being BE or > > LE, > > but more with the register packing in the GPIO peripheral. > > > > The supported SoCs have port layout A-B-C-D in the registers, where > > firmware built with Realtek's SDK always denotes A0 as the first > > GPIO > > line. So bit 24 in a register has the value for A0 (with the > > exception > > of the IMR register). > > > > I wrote these wrapper functions to be able to use the BIT() macro > > with > > the GPIO line number, similar to how gpio-mmio uses ioread32be() > > when > > the BGPIOF_BIG_ENDIAN_BYTE_ORDER flag is used. > > > > For the IMR register, port A again comes first, but is now 16 bits > > wide > > instead of 8, with A0 at bits 16:17. That's why swahw32 is used for > > this register. > > > > On the currently unsupported RTL9300-series, the port layout is > > reversed: D-C-B-A. GPIO line A0 is then at bit 0, so the swapping > > functions won't be required. When support for this alternate port > > layout is added, some code will need to be added to differentiate > > between the two cases. > > Yes, you have different endianess on the hardware level, why not to > use the proper accessors (with or without utilization of the above > mentioned BGPIOF_BIG_ENDIAN_BYTE_ORDER)? The point I was trying to make, is that it isn't an endianess issue. I shouldn't have used a register with single byte values to try to illustrate that. Consider instead the interrupt masking registers. To write the IMR bits for port A (GPIO 0-7), a 16-bit value must be written. This value (e.g. u16 port_a_imr) is always BE, independent of the packing order of the ports in the registers: // On RTL8380: port A is in the upper word writew(port_a_imr, base + OFFSET_IMR_AB); // On RTL9300: port A is in the lower word writew(port_a_imr, base + OFFSET_IMR_AB + 2); I want the low GPIO lines to be in the lower half-word, so I can manipulate GPIO lines 0-15 with simple mask and shift operations. It just so happens, that all registers needed by bgpio_init contain single-byte values. With BGPIO_BIG_ENDIAN_BYTE_ORDER the port order is reversed as required, but it's a bit of a misnomer here. Best, Sander
Re: [PATCH v2 2/2] gpio: Add Realtek Otto GPIO support
Hi Andy, Thanks for the review. I'll address the style comments in a v3. Some further comments and discussion below. On Wed, 2021-03-17 at 15:08 +0200, Andy Shevchenko wrote: > On Mon, Mar 15, 2021 at 11:11 PM Sander Vanheule < > san...@svanheule.net> wrote: > > + depends on OF_GPIO > > Don't see how it's used. It isn't, so I'll remove it. > > +#include > > Why? > Perhaps what you need is property.h and mod_devicetable.h. See below. With you suggestions, I was able to drop most explicit OF references. Only of_device_id remains, for which I'll include mod_devicetable.h. > > +#include > > Not sure why you need this? See below. [snip] > > > + > > +static inline u32 realtek_gpio_isr_read(struct realtek_gpio_ctrl > > *ctrl) > > +{ > > + return swab32(readl(ctrl->base + REALTEK_GPIO_REG_ISR)); > > Why swab?! How is this supposed to work on BE CPUs? > Ditto for all swabXX() usage. My use of swab32/swahw32 has little to do with the CPU being BE or LE, but more with the register packing in the GPIO peripheral. The supported SoCs have port layout A-B-C-D in the registers, where firmware built with Realtek's SDK always denotes A0 as the first GPIO line. So bit 24 in a register has the value for A0 (with the exception of the IMR register). I wrote these wrapper functions to be able to use the BIT() macro with the GPIO line number, similar to how gpio-mmio uses ioread32be() when the BGPIOF_BIG_ENDIAN_BYTE_ORDER flag is used. For the IMR register, port A again comes first, but is now 16 bits wide instead of 8, with A0 at bits 16:17. That's why swahw32 is used for this register. On the currently unsupported RTL9300-series, the port layout is reversed: D-C-B-A. GPIO line A0 is then at bit 0, so the swapping functions won't be required. When support for this alternate port layout is added, some code will need to be added to differentiate between the two cases. > > +} > > + > > +static inline void realtek_gpio_isr_clear(struct realtek_gpio_ctrl > > *ctrl, > > + unsigned int pin_mask) > > +{ > > + writel(swab32(pin_mask), ctrl->base + > > REALTEK_GPIO_REG_ISR); > > +} > > + > > +static inline void realtek_gpio_update_imr(struct > > realtek_gpio_ctrl *ctrl, > > + unsigned int imr_offset, u32 type, u32 mask) > > +{ > > + unsigned int reg; > > + > > + if (imr_offset == 0) > > + reg = REALTEK_GPIO_REG_IMR_AB; > > + else > > + reg = REALTEK_GPIO_REG_IMR_CD; > > + writel(swahw32(type & mask), ctrl->base + reg); > > +} [snip] > > + switch (flow_type & IRQ_TYPE_SENSE_MASK) { > > > + case IRQ_TYPE_NONE: > > + type = 0; > > + handler = handle_bad_irq; > > + break; > > Why is it here? Make it default like many other GPIO drivers do. > > > + case IRQ_TYPE_EDGE_FALLING: > > + type = REALTEK_GPIO_IRQ_EDGE_FALLING; > > + handler = handle_edge_irq; > > + break; > > + case IRQ_TYPE_EDGE_RISING: > > + type = REALTEK_GPIO_IRQ_EDGE_RISING; > > + handler = handle_edge_irq; > > + break; > > + case IRQ_TYPE_EDGE_BOTH: > > + type = REALTEK_GPIO_IRQ_EDGE_BOTH; > > + handler = handle_edge_irq; > > + break; > > + default: > > + return -EINVAL; > > + } > > + > > + irq_set_handler_locked(data, handler); > > handler is always the same. Use it directly here. I'll drop the IRQ_TYPE_NONE case. Do I understand it correctly, that IRQ_TYPE_NONE should never be used as the new value, but only as the default initial value? Best, Sander
Re: [PATCH 2/2] gpio: Add Realtek Otto GPIO support
On Mon, 2021-03-15 at 16:10 +0100, Linus Walleij wrote: > On Mon, Mar 15, 2021 at 9:26 AM Sander Vanheule > wrote: > > > Realtek MIPS SoCs (platform name Otto) have GPIO controllers with > > up to > > 64 GPIOs, divided over two banks. Each bank has a set of registers > > for > > 32 GPIOs, with support for edge-triggered interrupts. > > > > Each GPIO bank consists of four 8-bit GPIO ports (ABCD and EFGH). > > Most > > registers pack one bit per GPIO, except for the IMR register, which > > packs two bits per GPIO (AB-CD). > > > > Although the byte order is currently assumed to have port A..D at > > offset > > 0x0..0x3, this has been observed to be reversed on other, Lexra- > > based, > > SoCs (e.g. RTL8196E/97D/97F). > > > > Interrupt support is disabled for the fallback devicetree- > > compatible > > 'realtek,otto-gpio'. This allows for quick support of GPIO banks in > > which the byte order would be unknown. In this case, the port > > ordering > > in the IMR registers may not match the reversed order in the other > > registers (DCBA, and BA-DC or DC-BA). > > > > Signed-off-by: Sander Vanheule > > Overall this is a beautiful driver and it makes use of all the > generic > frameworks I can think of. I don't see any reason not to merge > it so: > Reviewed-by: Linus Walleij Thanks for the review and the kind comments! > > The following is some notes and nitpicks, nothing blocking any > merge, more like discussion. > > > +enum realtek_gpio_flags { > > + GPIO_INTERRUPTS = BIT(0), > > +}; > > I suppose this looks like this because more flags will be introduced > when you add more functionality to the driver. Otherwise it seems > like overkill so a bool would suffice. > > I would add a comment /* TODO: this will be expanded */ > That's correct, I would like this to be extendable. Like the commit message noted, some other SoC appear to have port order D-C-B-A. The current driver only supports the A-B-C-D port order, so a flag could be added to differentiate between A-first and D-first. Another flag that will be added in the future, is one to indicate that the GPIO block has extra interrupt control registers, located after the second GPIO bank. For example, the rtl9300-series appears to have both the reversed port order, and an extra "interrupt enable" register. This is not yet implemented, since I don't currently have a device with this type of SoC. > > +static inline u32 realtek_gpio_imr_bits(unsigned int pin, u32 > > value) > > +{ > > + return ((value & 0x3) << 2*(pin % 16)); > > +} > > I would explain a bit about this, obviouslt it is two bit per > line, but it took me some time to parse, so a comment > about the bit layout would be nice. > > > + unsigned int offset = pin/16; > > Here that number appears again. > I've updated the patch (and added your Reviewed-by tags) for a v2. Hopefully this is now more obvious from the code and comments. Best, Sander > The use of GPIO_GENERIC and GPIO irqchip is flawless > and first class. > > Thanks! > Linus Walleij
[PATCH v2 0/2] Add Realtek Otto GPIO support
Add support for the GPIO controller employed by Realtek in multiple series of MIPS SoCs. These include the supported RTL838x and RTL839x. The register layout also matches the one found in the GPIO controller of other (Lexra-based) SoCs such as RTL8196E, RTL8197D, and RTL8197F. For the platform name 'otto', I am not aware of any official resources as to what hardware this specifically applies to. However, in all of the GPL archives we've received, from vendors using compatible SoCs in their design, the platform under the MIPS architecture is referred to by this name. The GPIO ports have been tested on a Zyxel GS1900-8 (RTL8380), and Zyxel GS1900-48 (RTL8393). Furthermore, the GPIO ports and interrupt controller have been tested on a Netgear GS110TPPv1 (RTL8381). Changes in v2: - Clarify structure and usage of IMR registers Sander Vanheule (2): dt-bindings: gpio: Binding for Realtek Otto GPIO gpio: Add Realtek Otto GPIO support .../bindings/gpio/gpio-realtek-otto.yaml | 80 + drivers/gpio/Kconfig | 12 + drivers/gpio/Makefile | 1 + drivers/gpio/gpio-realtek-otto.c | 331 ++ 4 files changed, 424 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml create mode 100644 drivers/gpio/gpio-realtek-otto.c -- 2.30.2
[PATCH v2 2/2] gpio: Add Realtek Otto GPIO support
Realtek MIPS SoCs (platform name Otto) have GPIO controllers with up to 64 GPIOs, divided over two banks. Each bank has a set of registers for 32 GPIOs, with support for edge-triggered interrupts. Each GPIO bank consists of four 8-bit GPIO ports (ABCD and EFGH). Most registers pack one bit per GPIO, except for the IMR register, which packs two bits per GPIO (AB-CD). Although the byte order is currently assumed to have port A..D at offset 0x0..0x3, this has been observed to be reversed on other, Lexra-based, SoCs (e.g. RTL8196E/97D/97F). Interrupt support is disabled for the fallback devicetree-compatible 'realtek,otto-gpio'. This allows for quick support of GPIO banks in which the byte order would be unknown. In this case, the port ordering in the IMR registers may not match the reversed order in the other registers (DCBA, and BA-DC or DC-BA). Signed-off-by: Sander Vanheule Reviewed-by: Linus Walleij --- drivers/gpio/Kconfig | 12 ++ drivers/gpio/Makefile| 1 + drivers/gpio/gpio-realtek-otto.c | 331 +++ 3 files changed, 344 insertions(+) create mode 100644 drivers/gpio/gpio-realtek-otto.c diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig index e3607ec4c2e8..fedf1e49469e 100644 --- a/drivers/gpio/Kconfig +++ b/drivers/gpio/Kconfig @@ -502,6 +502,18 @@ config GPIO_RDA help Say Y here to support RDA Micro GPIO controller. +config GPIO_REALTEK_OTTO + bool "Realtek Otto GPIO support" + depends on MACH_REALTEK_RTL + depends on OF_GPIO + default MACH_REALTEK_RTL + select GPIO_GENERIC + select GPIOLIB_IRQCHIP + help + The GPIO controller on the Otto MIPS platform supports up to two + banks of 32 GPIOs, with edge triggered interrupts. The 32 GPIOs + are grouped in four 8-bit wide ports. + config GPIO_REG bool help diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile index c58a90a3c3b1..8ace5934e3c3 100644 --- a/drivers/gpio/Makefile +++ b/drivers/gpio/Makefile @@ -124,6 +124,7 @@ obj-$(CONFIG_GPIO_RC5T583) += gpio-rc5t583.o obj-$(CONFIG_GPIO_RCAR)+= gpio-rcar.o obj-$(CONFIG_GPIO_RDA) += gpio-rda.o obj-$(CONFIG_GPIO_RDC321X) += gpio-rdc321x.o +obj-$(CONFIG_GPIO_REALTEK_OTTO)+= gpio-realtek-otto.o obj-$(CONFIG_GPIO_REG) += gpio-reg.o obj-$(CONFIG_ARCH_SA1100) += gpio-sa1100.o obj-$(CONFIG_GPIO_SAMA5D2_PIOBU) += gpio-sama5d2-piobu.o diff --git a/drivers/gpio/gpio-realtek-otto.c b/drivers/gpio/gpio-realtek-otto.c new file mode 100644 index ..818412687346 --- /dev/null +++ b/drivers/gpio/gpio-realtek-otto.c @@ -0,0 +1,331 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include +#include +#include +#include +#include + +/* + * Total register block size is 0x1C for four ports. + * On the RTL8380/RLT8390 platforms port A, B, and C are implemented. + * RTL8389 and RTL8328 implement a second bank with ports E, F, G, and H. + * + * Port information is stored with the first port at offset 0, followed by the + * second, etc. Most registers store one bit per GPIO and should be read out in + * reversed endian order. The two interrupt mask registers store two bits per + * GPIO, and should be manipulated with swahw32, if required. + */ + +/* + * Pin select: (0) "normal", (1) "dedicate peripheral" + * Not used on RTL8380/RTL8390, peripheral selection is managed by control bits + * in the peripheral registers. + */ +#define REALTEK_GPIO_REG_CNR 0x00 +/* Clear bit (0) for input, set bit (1) for output */ +#define REALTEK_GPIO_REG_DIR 0x08 +#define REALTEK_GPIO_REG_DATA 0x0C +/* Read bit for IRQ status, write 1 to clear IRQ */ +#define REALTEK_GPIO_REG_ISR 0x10 +/* Two bits per GPIO */ +#define REALTEK_GPIO_REG_IMR_AB0x14 +#define REALTEK_GPIO_REG_IMR_CD0x18 +#define REALTEK_GPIO_IMR_LINE_MASK 3 +#define REALTEK_GPIO_IRQ_EDGE_FALLING 1 +#define REALTEK_GPIO_IRQ_EDGE_RISING 2 +#define REALTEK_GPIO_IRQ_EDGE_BOTH 3 + +#define REALTEK_GPIO_MAX 32 + +/* + * Realtek GPIO driver data + * Because the interrupt mask register (IMR) combines the function of + * IRQ type selection and masking, two extra values are stored. + * intr_mask is used to mask/unmask the interrupts for certain GPIO, + * and intr_type is used to store the selected interrupt types. The + * logical AND of these values is written to IMR on changes. + * + * @dev Parent device + * @gc Associated gpio_chip instance + * @base Base address of the register block + * @lock Lock for accessing the IRQ registers and values + * @intr_mask Mask for GPIO interrupts + * @intr_type GPIO interrupt type selection + */ +struct realtek_gpio_ctrl { + struct device *dev; + struct gpio_chip gc; + void __iomem *base;
[PATCH v2 1/2] dt-bindings: gpio: Binding for Realtek Otto GPIO
Add a binding description for Realtek's GPIO controller found on several of their MIPS-based SoCs (codenamed Otto), such as the RTL838x and RTL839x series of switch SoCs. A fallback binding 'realtek,otto-gpio' is provided for cases where the actual port ordering is not known yet, and enabling the interrupt controller may result in uncaught interrupts. Signed-off-by: Sander Vanheule Reviewed-by: Linus Walleij --- .../bindings/gpio/gpio-realtek-otto.yaml | 80 +++ 1 file changed, 80 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml diff --git a/Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml b/Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml new file mode 100644 index ..3e8151e3a169 --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml @@ -0,0 +1,80 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpio/gpio-realtek-otto.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Realtek Otto GPIO controller + +maintainers: + - Sander Vanheule + - Bert Vermeulen + +description: | + Realtek's GPIO controller on their MIPS switch SoCs (Otto platform) consists + of two banks of 32 GPIOs. These GPIOs can generate edge-triggered interrupts. + Each bank's interrupts are cascased into one interrupt line on the parent + interrupt controller, if provided. + This binding allows defining a single bank in the devicetree. The interrupt + controller is not supported on the fallback compatible name, which only + allows for GPIO port use. + +properties: + $nodename: +pattern: "^gpio@[0-9a-f]+$" + + compatible: +oneOf: + - items: + - enum: + - realtek,rtl8380-gpio + - realtek,rtl8390-gpio + - const: realtek,otto-gpio + - const: realtek,otto-gpio + + reg: +maxItems: 1 + + "#gpio-cells": +const: 2 + + gpio-controller: true + + ngpios: +minimum: 1 +maximum: 32 + + interrupt-controller: true + + "#interrupt-cells": +const: 2 + + interrupts: +maxItems: 1 + +required: + - compatible + - reg + - "#gpio-cells" + - gpio-controller + +additionalProperties: false + +dependencies: + interrupt-controller: [ interrupts ] + +examples: + - | + gpio@3500 { +compatible = "realtek,rtl8380-gpio", "realtek,otto-gpio"; +reg = <0x3500 0x1c>; +gpio-controller; +#gpio-cells = <2>; +ngpios = <24>; +interrupt-controller; +#interrupt-cells = <2>; +interrupt-parent = <&rtlintc>; +interrupts = <23>; + }; + +... -- 2.30.2
[PATCH 1/2] dt-bindings: gpio: Binding for Realtek Otto GPIO
Add a binding description for Realtek's GPIO controller found on several of their MIPS-based SoCs (codenamed Otto), such as the RTL838x and RTL839x series of switch SoCs. A fallback binding 'realtek,otto-gpio' is provided for cases where the actual port ordering is not known yet, and enabling the interrupt controller may result in uncaught interrupts. Signed-off-by: Sander Vanheule --- .../bindings/gpio/gpio-realtek-otto.yaml | 80 +++ 1 file changed, 80 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml diff --git a/Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml b/Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml new file mode 100644 index ..3e8151e3a169 --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml @@ -0,0 +1,80 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpio/gpio-realtek-otto.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Realtek Otto GPIO controller + +maintainers: + - Sander Vanheule + - Bert Vermeulen + +description: | + Realtek's GPIO controller on their MIPS switch SoCs (Otto platform) consists + of two banks of 32 GPIOs. These GPIOs can generate edge-triggered interrupts. + Each bank's interrupts are cascased into one interrupt line on the parent + interrupt controller, if provided. + This binding allows defining a single bank in the devicetree. The interrupt + controller is not supported on the fallback compatible name, which only + allows for GPIO port use. + +properties: + $nodename: +pattern: "^gpio@[0-9a-f]+$" + + compatible: +oneOf: + - items: + - enum: + - realtek,rtl8380-gpio + - realtek,rtl8390-gpio + - const: realtek,otto-gpio + - const: realtek,otto-gpio + + reg: +maxItems: 1 + + "#gpio-cells": +const: 2 + + gpio-controller: true + + ngpios: +minimum: 1 +maximum: 32 + + interrupt-controller: true + + "#interrupt-cells": +const: 2 + + interrupts: +maxItems: 1 + +required: + - compatible + - reg + - "#gpio-cells" + - gpio-controller + +additionalProperties: false + +dependencies: + interrupt-controller: [ interrupts ] + +examples: + - | + gpio@3500 { +compatible = "realtek,rtl8380-gpio", "realtek,otto-gpio"; +reg = <0x3500 0x1c>; +gpio-controller; +#gpio-cells = <2>; +ngpios = <24>; +interrupt-controller; +#interrupt-cells = <2>; +interrupt-parent = <&rtlintc>; +interrupts = <23>; + }; + +... -- 2.30.2
[PATCH 0/2] Add Realtek Otto GPIO support
Add support for the GPIO controller employed by Realtek in multiple series of MIPS SoCs. These include the supported RTL838x and RTL839x series. The register layout also matches the one found in GPIO controllers of other (Lexra-based) SoCs such as RTL8196E, RTL8197D, and RTL8197F. For the platform name 'otto', I am not aware of any official resources as to what hardware this specifically applies to. However, in all of the GPL archives we've received, from vendors using compatible SoCs in their design, the platform under the MIPS architecture is referred to by this name. The GPIO ports have been tested on a Zyxel GS1900-8 (RTL8380M), and Zyxel GS1900-48 (RTL8393M). Furthermore, the GPIO ports and interrupt controller have been tested on a Netgear GS110TPPv1 (RTL8381M). Sander Vanheule (2): dt-bindings: gpio: Binding for Realtek Otto GPIO gpio: Add Realtek Otto GPIO support .../bindings/gpio/gpio-realtek-otto.yaml | 80 + drivers/gpio/Kconfig | 12 + drivers/gpio/Makefile | 1 + drivers/gpio/gpio-realtek-otto.c | 320 ++ 4 files changed, 413 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpio/gpio-realtek-otto.yaml create mode 100644 drivers/gpio/gpio-realtek-otto.c -- 2.30.2
[PATCH 2/2] gpio: Add Realtek Otto GPIO support
Realtek MIPS SoCs (platform name Otto) have GPIO controllers with up to 64 GPIOs, divided over two banks. Each bank has a set of registers for 32 GPIOs, with support for edge-triggered interrupts. Each GPIO bank consists of four 8-bit GPIO ports (ABCD and EFGH). Most registers pack one bit per GPIO, except for the IMR register, which packs two bits per GPIO (AB-CD). Although the byte order is currently assumed to have port A..D at offset 0x0..0x3, this has been observed to be reversed on other, Lexra-based, SoCs (e.g. RTL8196E/97D/97F). Interrupt support is disabled for the fallback devicetree-compatible 'realtek,otto-gpio'. This allows for quick support of GPIO banks in which the byte order would be unknown. In this case, the port ordering in the IMR registers may not match the reversed order in the other registers (DCBA, and BA-DC or DC-BA). Signed-off-by: Sander Vanheule --- drivers/gpio/Kconfig | 12 ++ drivers/gpio/Makefile| 1 + drivers/gpio/gpio-realtek-otto.c | 320 +++ 3 files changed, 333 insertions(+) create mode 100644 drivers/gpio/gpio-realtek-otto.c diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig index e3607ec4c2e8..fedf1e49469e 100644 --- a/drivers/gpio/Kconfig +++ b/drivers/gpio/Kconfig @@ -502,6 +502,18 @@ config GPIO_RDA help Say Y here to support RDA Micro GPIO controller. +config GPIO_REALTEK_OTTO + bool "Realtek Otto GPIO support" + depends on MACH_REALTEK_RTL + depends on OF_GPIO + default MACH_REALTEK_RTL + select GPIO_GENERIC + select GPIOLIB_IRQCHIP + help + The GPIO controller on the Otto MIPS platform supports up to two + banks of 32 GPIOs, with edge triggered interrupts. The 32 GPIOs + are grouped in four 8-bit wide ports. + config GPIO_REG bool help diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile index c58a90a3c3b1..8ace5934e3c3 100644 --- a/drivers/gpio/Makefile +++ b/drivers/gpio/Makefile @@ -124,6 +124,7 @@ obj-$(CONFIG_GPIO_RC5T583) += gpio-rc5t583.o obj-$(CONFIG_GPIO_RCAR)+= gpio-rcar.o obj-$(CONFIG_GPIO_RDA) += gpio-rda.o obj-$(CONFIG_GPIO_RDC321X) += gpio-rdc321x.o +obj-$(CONFIG_GPIO_REALTEK_OTTO)+= gpio-realtek-otto.o obj-$(CONFIG_GPIO_REG) += gpio-reg.o obj-$(CONFIG_ARCH_SA1100) += gpio-sa1100.o obj-$(CONFIG_GPIO_SAMA5D2_PIOBU) += gpio-sama5d2-piobu.o diff --git a/drivers/gpio/gpio-realtek-otto.c b/drivers/gpio/gpio-realtek-otto.c new file mode 100644 index ..04c11b2085f8 --- /dev/null +++ b/drivers/gpio/gpio-realtek-otto.c @@ -0,0 +1,320 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include +#include +#include +#include +#include + +/* + * Total register block size is 0x1C for four ports. + * On the RTL8380/RLT8390 platforms port A, B, and C are implemented. + * RTL8389 and RTL8328 implement a second bank with ports E, F, G, and H. + * + * Port information is stored with the first port at offset 0, followed by the + * second, etc. Most registers store one bit per GPIO and should be read out in + * reversed endian order. The two interrupt mask registers store two bits per + * GPIO, and should be manipulated with swahw32, if required. + */ + +/* + * Pin select: (0) "normal", (1) "dedicate peripheral" + * Not used on RTL8380/RTL8390, peripheral selection is managed by control bits + * in the peripheral registers. + */ +#define REALTEK_GPIO_REG_CNR 0x00 +/* Clear bit (0) for input, set bit (1) for output */ +#define REALTEK_GPIO_REG_DIR 0x08 +#define REALTEK_GPIO_REG_DATA 0x0C +/* Read bit for IRQ status, write 1 to clear IRQ */ +#define REALTEK_GPIO_REG_ISR 0x10 +/* Two bits per GPIO */ +#define REALTEK_GPIO_REG_IMR_AB0x14 +#define REALTEK_GPIO_REG_IMR_CD0x18 +#define REALTEK_GPIO_IRQ_EDGE_FALLING 1 +#define REALTEK_GPIO_IRQ_EDGE_RISING 2 +#define REALTEK_GPIO_IRQ_EDGE_BOTH 3 + +#define REALTEK_GPIO_MAX 32 + +/* + * Realtek GPIO driver data + * Because the interrupt mask register (IMR) combines the function of + * IRQ type selection and masking, two extra values are stored. + * intr_mask is used to mask/unmask the interrupts for certain GPIO, + * and intr_type is used to store the selected interrupt types. The + * logical AND of these values is written to IMR on changes. + * + * @dev Parent device + * @gc Associated gpio_chip instance + * @base Base address of the register block + * @lock Lock for accessing the IRQ registers and values + * @intr_mask Mask for GPIO interrupts + * @intr_type GPIO interrupt type selection + */ +struct realtek_gpio_ctrl { + struct device *dev; + struct gpio_chip gc; + void __iomem *base; + raw_spinlock_t lock; + u32 intr_mask[2]; + u32 intr
[PATCH] MIPS: ralink: manage low reset lines
Reset lines with indices smaller than 8 are currently considered invalid by the rt2880-reset reset controller. The MT7621 SoC uses a number of these low reset lines. The DTS defines reset lines "hsdma", "fe", and "mcm" with respective values 5, 6, and 2. As a result of the above restriction, these resets cannot be asserted or de-asserted by the reset controller. In cases where the bootloader does not de-assert these lines, this results in e.g. the MT7621's internal switch staying in reset. Change the reset controller to only ignore the system reset, so all reset lines with index greater than 0 are considered valid. Signed-off-by: Sander Vanheule --- This patch was tested on a TP-Link EAP235-Wall, with an MT7621DA SoC. The bootloader on this device would leave reset line 2 ("mcm") asserted, which caused the internal switch to be unresponsive on an uninterrupted boot from flash. When tftpboot was used in the bootloader to load an initramfs, it did initialise the internal switch, and cleared the mcm reset line. In this case the switch could be used from the OS. With this patch applied, the switch works both in an initramfs, and when (cold) booting from flash. arch/mips/ralink/reset.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/mips/ralink/reset.c b/arch/mips/ralink/reset.c index 8126f1260407..274d33078c5e 100644 --- a/arch/mips/ralink/reset.c +++ b/arch/mips/ralink/reset.c @@ -27,7 +27,7 @@ static int ralink_assert_device(struct reset_controller_dev *rcdev, { u32 val; - if (id < 8) + if (id == 0) return -1; val = rt_sysc_r32(SYSC_REG_RESET_CTRL); @@ -42,7 +42,7 @@ static int ralink_deassert_device(struct reset_controller_dev *rcdev, { u32 val; - if (id < 8) + if (id == 0) return -1; val = rt_sysc_r32(SYSC_REG_RESET_CTRL); -- 2.29.2
Re: mtd raw nand denali.c broken for Intel/Altera Cyclone V
Hi Am Mittwoch, 11. September 2019, 04:37:46 CEST schrieb Masahiro Yamada: > Hi Dinh, > > On Wed, Sep 11, 2019 at 12:22 AM Dinh Nguyen wrote: > > On 9/10/19 8:48 AM, Tim Sander wrote: > > > Hi > > > > > > I have noticed that my SPF records where not in place after moving the > > > server, so it seems the mail didn't go to the mailing list. Hopefully > > > that's fixed now.> > > > > Am Dienstag, 10. September 2019, 09:16:37 CEST schrieb Masahiro Yamada: > > >> On Fri, Sep 6, 2019 at 9:39 PM Tim Sander wrote: > > >>> Hi > > >>> > > >>> I have noticed that there multiple breakages piling up for the denali > > >>> nand > > >>> driver on the Intel/Altera Cyclone V. Unfortunately i had no time to > > >>> track > > >>> the mainline kernel closely. So the breakage seems to pile up. I am a > > >>> little disapointed that Intel is not on the lookout that the kernel > > >>> works > > >>> on the chips they are selling. I was really happy about the state of > > >>> the > > >>> platform before concerning mainline support. > > >>> > > >>> The failure starts with kernel 4.19 or stable kernel release 4.18.19. > > >>> The > > >>> commit is ba4a1b62a2d742df9e9c607ac53b3bf33496508f. > > >> > > >> Just for clarification, this corresponds to > > >> 0d55c668b218a1db68b5044bce4de74e1bd0f0c8 upstream. > > >> > > >>> The problem here is that > > >>> our platform works with a zero in the SPARE_AREA_SKIP_BYTES register. > > >> > > >> Please clarify the scope of "our platform". > > >> (Only you, or your company, or every individual using this chip?) > > > > > > The company i work for uses this chip as a base for multiple products. > > > > > >> First, SPARE_AREA_SKIP_BYTES is not the property of the hardware. > > >> Rather, it is about the OOB layout, in other words, this parameter > > >> is defined by software. > > >> > > >> For example, U-Boot supports the Denali NAND driver. > > >> The SPARE_AREA_SKIP_BYTES is a user-configurable parameter: > > >> https://github.com/u-boot/u-boot/blob/v2019.10-rc3/drivers/mtd/nand/raw > > >> /Kcon fig#L112 I am using barebox for booting. I looked at the code and found a comment in denali_hw_init: * tell driver how many bit controller will skip before * writing ECC code in OOB, this register may be already * set by firmware. So we read this value out. * if this value is 0, just let it be. I have checked the barebox code and the denali register SPARE_AREA_SKIP_BYTES (offset 0x230) is read only once on booting. I have not found any occurrence of the register being set by barebox. So i would concur as the value is zero in my case that the boot ROM seems not to set the value. The code in barebox is mostly imported from linux in 2015 which is before the reorganization which happened on the linux side later on. > > >> > > >> > > >> Your platform works with a zero in the SPARE_AREA_SKIP_BYTES register > > >> because the NAND chip on the board was initialized with a zero > > >> set to the SPARE_AREA_SKIP_BYTES register. > > >> > > >> If the NAND chip had been initialized with 8 > > >> set to the SPARE_AREA_SKIP_BYTES register, it would have > > >> been working with 8 to the SPARE_AREA_SKIP_BYTES. > > >> > > >> The Boot ROM is the only (semi-)software that is unconfigurable by > > >> users, > > >> so the value of SPARE_AREA_SKIP_BYTES should be aligned with > > >> the boot ROM. > > >> I recommend you to check the spec of the boot ROM. > > > > > > We boot from NOR flash. That's why i didn't see a problem booting > > > probably. > > > > > >> (The maintainer of the platform, Dihn is CC'ed, > > >> so I hope he will jump in) > > > > > > Yes i hope so too. > > > > I don't have access to a NAND device at the moment. I'll try to find one > > and debug. I have hardware available to me, so i would be happy to test any ideas/ guesses. > Dinh, > Do you have answers for the following questions? > > > - Does the SOCFPGA boot ROM support the NAND boot mode? > > - If so, which value does it use for SPARE_AREA_SKIP_BYTES? Best regards Tim
Re: mtd raw nand denali.c broken for Intel/Altera Cyclone V
Hi Am Mittwoch, 11. September 2019, 04:37:46 CEST schrieb Masahiro Yamada: > - Does the SOCFPGA boot ROM support the NAND boot mode? Cyclone V HPS TRM Section "A3 Booting and Configuration" lists QSPI, SD/MMC and Nand as bootsource. > - If so, which value does it use for SPARE_AREA_SKIP_BYTES? I have no idea about this one. Tim
Re: mtd raw nand denali.c broken for Intel/Altera Cyclone V
Hi I have noticed that my SPF records where not in place after moving the server, so it seems the mail didn't go to the mailing list. Hopefully that's fixed now. Am Dienstag, 10. September 2019, 09:16:37 CEST schrieb Masahiro Yamada: > On Fri, Sep 6, 2019 at 9:39 PM Tim Sander wrote: > > Hi > > > > I have noticed that there multiple breakages piling up for the denali nand > > driver on the Intel/Altera Cyclone V. Unfortunately i had no time to track > > the mainline kernel closely. So the breakage seems to pile up. I am a > > little disapointed that Intel is not on the lookout that the kernel works > > on the chips they are selling. I was really happy about the state of the > > platform before concerning mainline support. > > > > The failure starts with kernel 4.19 or stable kernel release 4.18.19. The > > commit is ba4a1b62a2d742df9e9c607ac53b3bf33496508f. > > Just for clarification, this corresponds to > 0d55c668b218a1db68b5044bce4de74e1bd0f0c8 upstream. > > > The problem here is that > > our platform works with a zero in the SPARE_AREA_SKIP_BYTES register. > > Please clarify the scope of "our platform". > (Only you, or your company, or every individual using this chip?) The company i work for uses this chip as a base for multiple products. > First, SPARE_AREA_SKIP_BYTES is not the property of the hardware. > Rather, it is about the OOB layout, in other words, this parameter > is defined by software. > > For example, U-Boot supports the Denali NAND driver. > The SPARE_AREA_SKIP_BYTES is a user-configurable parameter: > https://github.com/u-boot/u-boot/blob/v2019.10-rc3/drivers/mtd/nand/raw/Kcon > fig#L112 > > > Your platform works with a zero in the SPARE_AREA_SKIP_BYTES register > because the NAND chip on the board was initialized with a zero > set to the SPARE_AREA_SKIP_BYTES register. > > If the NAND chip had been initialized with 8 > set to the SPARE_AREA_SKIP_BYTES register, it would have > been working with 8 to the SPARE_AREA_SKIP_BYTES. > > The Boot ROM is the only (semi-)software that is unconfigurable by users, > so the value of SPARE_AREA_SKIP_BYTES should be aligned with > the boot ROM. > I recommend you to check the spec of the boot ROM. We boot from NOR flash. That's why i didn't see a problem booting probably. > (The maintainer of the platform, Dihn is CC'ed, > so I hope he will jump in) Yes i hope so too. > Second, I doubt 0 is a good value for SPARE_AREA_SKIP_BYTES. > > As explained in commit log, SPARE_AREA_SKIP_BYTES==0 means > the OOB is used for ECC without any offset. > So, the BBM marked in the factory will be destroyed. Oh my! Thats bad news. > > But in > > this case the patch assumes the default value 8 which is straight out > > wrong on this variant. Without this patch reverted all blocks of the nand > > flash are beeing marked bad :-(. > > > > When reverting the patch ba4a1b62a2d742df9e9c607ac53b3bf33496508f i can > > boot 4.19.10 again. > > > > With 5.0 the it goes further down the drain and i didn't manage to boot it > > even with the above patch reverted. > > > > I also tried 5.3-rc7 with the above patch reverted and the variable t_x > > dirty hacked to the value 0x1388 as i got the impression that the timing > > calculation is off too. I still get an > > interrupt error and boot failure: > git-bisect is a general solution to pin point the problem. > > BTW, if you end up with hacking the clock frequency, something is already > wrong. This was just a dirty hack to verify that this is the problem. > denali->clk_rate, denali->clk_x_rate should be 50MHz, 200MHz, respectively. > > If not, please check the clock driver and your DT. We include the device tree file for this chip directly from kernel sources. Which means that we are using the settings which are within the kernel tree in linux-5.3-rc8/arch/arm/boot/dts/socfpga.dtsi The dts entries taken verbatim from the above file are: nand0: nand@ff90 { #address-cells = <0x1>; #size-cells = <0x1>; compatible = "altr,socfpga-denali-nand"; reg = <0xff90 0x10>, <0xffb8 0x1>; reg-names = "nand_data", "denali_reg"; interrupts = <0x0 0x90 0x4>; clocks = <&nand_clk>, <&nand_x_clk>, <&nand_ecc_clk>; clock-names = "nand", "nand_x", "ecc"; resets = <&rst NAND_RESET>;
mtd raw nand denali.c broken for Intel/Altera Cyclone V
Hi I have noticed that there multiple breakages piling up for the denali nand driver on the Intel/Altera Cyclone V. Unfortunately i had no time to track the mainline kernel closely. So the breakage seems to pile up. I am a little disapointed that Intel is not on the lookout that the kernel works on the chips they are selling. I was really happy about the state of the platform before concerning mainline support. The failure starts with kernel 4.19 or stable kernel release 4.18.19. The commit is ba4a1b62a2d742df9e9c607ac53b3bf33496508f. The problem here is that our platform works with a zero in the SPARE_AREA_SKIP_BYTES register. But in this case the patch assumes the default value 8 which is straight out wrong on this variant. Without this patch reverted all blocks of the nand flash are beeing marked bad :-(. When reverting the patch ba4a1b62a2d742df9e9c607ac53b3bf33496508f i can boot 4.19.10 again. With 5.0 the it goes further down the drain and i didn't manage to boot it even with the above patch reverted. I also tried 5.3-rc7 with the above patch reverted and the variable t_x dirty hacked to the value 0x1388 as i got the impression that the timing calculation is off too. I still get an interrupt error and boot failure: [0.817588] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda [0.823946] nand: Micron MT29F2G08ABAEAWP [0.827965] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 [1.887052] denali-nand-dt ff90.nand: timeout while waiting for irq 0x1000 [2.911056] denali-nand-dt ff90.nand: timeout while waiting for irq 0x1000 I have seen this https://lore.kernel.org/patchwork/patch/983055/ thread and this might fix at least the 4.19 boot problem. I would be really happy for hints how to get the Intel Cyclone V working again. Best regards Tim
Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0
On 17/08/2019 18:35, Eric Dumazet wrote: > > > On 8/17/19 10:24 AM, Sander Eikelenboom wrote: >> On 12/08/2019 19:56, Eric Dumazet wrote: >>> >>> >>> On 8/12/19 2:50 PM, Sander Eikelenboom wrote: >>>> L.S., >>>> >>>> While testing a somewhere-after-5.3-rc3 kernel (which included the latest >>>> net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9), >>>> one of my Xen VM's (which gets quite some network load) crashed. >>>> See below for the stacktrace. >>>> >>>> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to >>>> be an option at the moment. >>>> I haven't encountered this on 5.2, so it seems to be an regression against >>>> 5.2. >>>> >>>> Any ideas ? >>>> >>>> -- >>>> Sander >>>> >>>> >>>> [16930.653595] general protection fault: [#1] SMP NOPTI >>>> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted >>>> 5.3.0-rc3-20190809-doflr+ #1 >>>> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0 >>>> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 >>>> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 >>>> <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8 >>>> [16930.653741] RSP: :c9003ad8 EFLAGS: 00010286 >>>> [16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: >>>> 801b >>> >>> crash in " mov0x20(%rax),%eax" and RAX=fffe888005bf62c0 (not a valid >>> kernel address) >>> >>> Look like one bit corruption maybe. >>> >>> Nothing comes to mind really between 5.2 and 53 that could explain this. >>> >>>> [16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: >>>> 888016b00880 >>>> [16930.653819] RBP: 888016b00880 R08: 0001 R09: >>>> >>>> [16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: >>>> 05a0 >>>> [16930.653875] R13: 0001 R14: bfe62d46 R15: >>>> 0004 >>>> [16930.653913] FS: 7fe71fe2cb80() GS:88801f20() >>>> knlGS: >>>> [16930.653943] CS: 0010 DS: ES: CR0: 80050033 >>>> [16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: >>>> 06f0 >>>> [16930.653993] Call Trace: >>>> [16930.654005] >>>> [16930.654018] tcp_ack+0xbb0/0x1230 >>>> [16930.654033] tcp_rcv_established+0x2e8/0x630 >>>> [16930.654053] tcp_v4_do_rcv+0x129/0x1d0 >>>> [16930.654070] tcp_v4_rcv+0xac9/0xcb0 >>>> [16930.654088] ip_protocol_deliver_rcu+0x27/0x1b0 >>>> [16930.654109] ip_local_deliver_finish+0x3f/0x50 >>>> [16930.654128] ip_local_deliver+0x4d/0xe0 >>>> [16930.654145] ? ip_protocol_deliver_rcu+0x1b0/0x1b0 >>>> [16930.654163] ip_rcv+0x4c/0xd0 >>>> [16930.654179] __netif_receive_skb_one_core+0x79/0x90 >>>> [16930.654200] netif_receive_skb_internal+0x2a/0xa0 >>>> [16930.654219] napi_gro_receive+0xe7/0x140 >>>> [16930.654237] xennet_poll+0x9be/0xae0 >>>> [16930.654254] net_rx_action+0x136/0x340 >>>> [16930.654271] __do_softirq+0xdd/0x2cf >>>> [16930.654287] irq_exit+0x7a/0xa0 >>>> [16930.654304] xen_evtchn_do_upcall+0x27/0x40 >>>> [16930.654320] xen_hvm_callback_vector+0xf/0x20 >>>> [16930.654339] >>>> [16930.654349] RIP: 0033:0x55de0d87db99 >>>> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 >>>> f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a >>>> <75> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6 >>>> [16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: >>>> ff0c >>>> [16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: >>>> 007f >>>> [16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: >>>> 0002 >>>> [16930.655062] RBP: 7fff R08: 80ea R09: >>>> 01f0 >>>> [16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: >>>> 55de0f3e0f2a >>>>
Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0
On 12/08/2019 19:56, Eric Dumazet wrote: > > > On 8/12/19 2:50 PM, Sander Eikelenboom wrote: >> L.S., >> >> While testing a somewhere-after-5.3-rc3 kernel (which included the latest >> net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9), >> one of my Xen VM's (which gets quite some network load) crashed. >> See below for the stacktrace. >> >> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to be >> an option at the moment. >> I haven't encountered this on 5.2, so it seems to be an regression against >> 5.2. >> >> Any ideas ? >> >> -- >> Sander >> >> >> [16930.653595] general protection fault: [#1] SMP NOPTI >> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted >> 5.3.0-rc3-20190809-doflr+ #1 >> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0 >> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 >> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> >> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8 >> [16930.653741] RSP: :c9003ad8 EFLAGS: 00010286 >> [16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: >> 801b > > crash in " mov0x20(%rax),%eax" and RAX=fffe888005bf62c0 (not a valid > kernel address) > > Look like one bit corruption maybe. > > Nothing comes to mind really between 5.2 and 53 that could explain this. > >> [16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: >> 888016b00880 >> [16930.653819] RBP: 888016b00880 R08: 0001 R09: >> >> [16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: >> 05a0 >> [16930.653875] R13: 0001 R14: bfe62d46 R15: >> 0004 >> [16930.653913] FS: 7fe71fe2cb80() GS:88801f20() >> knlGS: >> [16930.653943] CS: 0010 DS: ES: CR0: 80050033 >> [16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: >> 06f0 >> [16930.653993] Call Trace: >> [16930.654005] >> [16930.654018] tcp_ack+0xbb0/0x1230 >> [16930.654033] tcp_rcv_established+0x2e8/0x630 >> [16930.654053] tcp_v4_do_rcv+0x129/0x1d0 >> [16930.654070] tcp_v4_rcv+0xac9/0xcb0 >> [16930.654088] ip_protocol_deliver_rcu+0x27/0x1b0 >> [16930.654109] ip_local_deliver_finish+0x3f/0x50 >> [16930.654128] ip_local_deliver+0x4d/0xe0 >> [16930.654145] ? ip_protocol_deliver_rcu+0x1b0/0x1b0 >> [16930.654163] ip_rcv+0x4c/0xd0 >> [16930.654179] __netif_receive_skb_one_core+0x79/0x90 >> [16930.654200] netif_receive_skb_internal+0x2a/0xa0 >> [16930.654219] napi_gro_receive+0xe7/0x140 >> [16930.654237] xennet_poll+0x9be/0xae0 >> [16930.654254] net_rx_action+0x136/0x340 >> [16930.654271] __do_softirq+0xdd/0x2cf >> [16930.654287] irq_exit+0x7a/0xa0 >> [16930.654304] xen_evtchn_do_upcall+0x27/0x40 >> [16930.654320] xen_hvm_callback_vector+0xf/0x20 >> [16930.654339] >> [16930.654349] RIP: 0033:0x55de0d87db99 >> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 >> f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a <75> >> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6 >> [16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: >> ff0c >> [16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: >> 007f >> [16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: >> 0002 >> [16930.655062] RBP: 7fff R08: 80ea R09: >> 01f0 >> [16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: >> 55de0f3e0f2a >> [16930.655116] R13: 0010 R14: 7f16 R15: >> 0080 >> [16930.655144] Modules linked in: >> [16930.655200] ---[ end trace 533367c95501b645 ]--- >> [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0 >> [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 >> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> >> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8 >> [16930.655312] RSP: :c9003ad8 EFLAGS: 00010286 >> [16930.655331] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: >> 801b >> [16930.655360] RDX: 05a0 RSI: 8880115fb800 RDI: >> 888016b00880 >> [16930.655387] RBP: 888016b00880 R08
Re: 5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0
On 12/08/2019 19:56, Eric Dumazet wrote: > > > On 8/12/19 2:50 PM, Sander Eikelenboom wrote: >> L.S., >> >> While testing a somewhere-after-5.3-rc3 kernel (which included the latest >> net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9), >> one of my Xen VM's (which gets quite some network load) crashed. >> See below for the stacktrace. >> >> Unfortunately I haven't got a clear trigger, so bisection doesn't seem to be >> an option at the moment. >> I haven't encountered this on 5.2, so it seems to be an regression against >> 5.2. >> >> Any ideas ? >> >> -- >> Sander >> >> >> [16930.653595] general protection fault: [#1] SMP NOPTI >> [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted >> 5.3.0-rc3-20190809-doflr+ #1 >> [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0 >> [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 >> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> >> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8 >> [16930.653741] RSP: :c9003ad8 EFLAGS: 00010286 >> [16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: >> 801b > > crash in " mov0x20(%rax),%eax" and RAX=fffe888005bf62c0 (not a valid > kernel address) > > Look like one bit corruption maybe. > > Nothing comes to mind really between 5.2 and 53 that could explain this. Hi Eric, Hmm could be it's a rare coincidence, sp that it just never occurred on pre 5.3 by chance. Let's wait and see if it reoccurs, will report back if it does. Thanks for your explanation. -- Sander >> [16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: >> 888016b00880 >> [16930.653819] RBP: 888016b00880 R08: 0001 R09: >> >> [16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: >> 05a0 >> [16930.653875] R13: 0001 R14: bfe62d46 R15: >> 0004 >> [16930.653913] FS: 7fe71fe2cb80() GS:88801f20() >> knlGS: >> [16930.653943] CS: 0010 DS: ES: CR0: 80050033 >> [16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: >> 06f0 >> [16930.653993] Call Trace: >> [16930.654005] >> [16930.654018] tcp_ack+0xbb0/0x1230 >> [16930.654033] tcp_rcv_established+0x2e8/0x630 >> [16930.654053] tcp_v4_do_rcv+0x129/0x1d0 >> [16930.654070] tcp_v4_rcv+0xac9/0xcb0 >> [16930.654088] ip_protocol_deliver_rcu+0x27/0x1b0 >> [16930.654109] ip_local_deliver_finish+0x3f/0x50 >> [16930.654128] ip_local_deliver+0x4d/0xe0 >> [16930.654145] ? ip_protocol_deliver_rcu+0x1b0/0x1b0 >> [16930.654163] ip_rcv+0x4c/0xd0 >> [16930.654179] __netif_receive_skb_one_core+0x79/0x90 >> [16930.654200] netif_receive_skb_internal+0x2a/0xa0 >> [16930.654219] napi_gro_receive+0xe7/0x140 >> [16930.654237] xennet_poll+0x9be/0xae0 >> [16930.654254] net_rx_action+0x136/0x340 >> [16930.654271] __do_softirq+0xdd/0x2cf >> [16930.654287] irq_exit+0x7a/0xa0 >> [16930.654304] xen_evtchn_do_upcall+0x27/0x40 >> [16930.654320] xen_hvm_callback_vector+0xf/0x20 >> [16930.654339] >> [16930.654349] RIP: 0033:0x55de0d87db99 >> [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 >> f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a <75> >> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6 >> [16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: >> ff0c >> [16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: >> 007f >> [16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: >> 0002 >> [16930.655062] RBP: 7fff R08: 80ea R09: >> 01f0 >> [16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: >> 55de0f3e0f2a >> [16930.655116] R13: 0010 R14: 7f16 R15: >> 0080 >> [16930.655144] Modules linked in: >> [16930.655200] ---[ end trace 533367c95501b645 ]--- >> [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0 >> [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 >> fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> >> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8 >> [16930.655312] RSP: :c9003ad8 EFLAGS: 00010286 >> [16930.655331] RAX: f
5.3-rc3-ish VM crash: RIP: 0010:tcp_trim_head+0x20/0xe0
L.S., While testing a somewhere-after-5.3-rc3 kernel (which included the latest net merge (33920f1ec5bf47c5c0a1d2113989bdd9dfb3fae9), one of my Xen VM's (which gets quite some network load) crashed. See below for the stacktrace. Unfortunately I haven't got a clear trigger, so bisection doesn't seem to be an option at the moment. I haven't encountered this on 5.2, so it seems to be an regression against 5.2. Any ideas ? -- Sander [16930.653595] general protection fault: [#1] SMP NOPTI [16930.653624] CPU: 0 PID: 3275 Comm: rsync Not tainted 5.3.0-rc3-20190809-doflr+ #1 [16930.653657] RIP: 0010:tcp_trim_head+0x20/0xe0 [16930.653677] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8 [16930.653741] RSP: :c9003ad8 EFLAGS: 00010286 [16930.653762] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 801b [16930.653791] RDX: 05a0 RSI: 8880115fb800 RDI: 888016b00880 [16930.653819] RBP: 888016b00880 R08: 0001 R09: [16930.653848] R10: 88800ae00800 R11: bfe632e6 R12: 05a0 [16930.653875] R13: 0001 R14: bfe62d46 R15: 0004 [16930.653913] FS: 7fe71fe2cb80() GS:88801f20() knlGS: [16930.653943] CS: 0010 DS: ES: CR0: 80050033 [16930.653965] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 06f0 [16930.653993] Call Trace: [16930.654005] [16930.654018] tcp_ack+0xbb0/0x1230 [16930.654033] tcp_rcv_established+0x2e8/0x630 [16930.654053] tcp_v4_do_rcv+0x129/0x1d0 [16930.654070] tcp_v4_rcv+0xac9/0xcb0 [16930.654088] ip_protocol_deliver_rcu+0x27/0x1b0 [16930.654109] ip_local_deliver_finish+0x3f/0x50 [16930.654128] ip_local_deliver+0x4d/0xe0 [16930.654145] ? ip_protocol_deliver_rcu+0x1b0/0x1b0 [16930.654163] ip_rcv+0x4c/0xd0 [16930.654179] __netif_receive_skb_one_core+0x79/0x90 [16930.654200] netif_receive_skb_internal+0x2a/0xa0 [16930.654219] napi_gro_receive+0xe7/0x140 [16930.654237] xennet_poll+0x9be/0xae0 [16930.654254] net_rx_action+0x136/0x340 [16930.654271] __do_softirq+0xdd/0x2cf [16930.654287] irq_exit+0x7a/0xa0 [16930.654304] xen_evtchn_do_upcall+0x27/0x40 [16930.654320] xen_hvm_callback_vector+0xf/0x20 [16930.654339] [16930.654349] RIP: 0033:0x55de0d87db99 [16930.654364] Code: 00 00 48 89 7c 24 f8 45 39 fe 45 0f 42 fe 44 89 7c 24 f4 eb 09 0f 1f 40 00 83 e9 01 74 3e 89 f2 48 63 f8 4c 01 d2 44 38 1c 3a <75> 25 44 38 6c 3a ff 75 1e 41 0f b6 3c 24 40 38 3a 75 14 41 0f b6 [16930.654432] RSP: 002b:7ffd5531eec8 EFLAGS: 0a87 ORIG_RAX: ff0c [16930.655004] RAX: 0002 RBX: 55de0f3e8e50 RCX: 007f [16930.655034] RDX: 55de0f3dc2d2 RSI: 3492 RDI: 0002 [16930.655062] RBP: 7fff R08: 80ea R09: 01f0 [16930.655089] R10: 55de0f3d8e40 R11: 0094 R12: 55de0f3e0f2a [16930.655116] R13: 0010 R14: 7f16 R15: 0080 [16930.655144] Modules linked in: [16930.655200] ---[ end trace 533367c95501b645 ]--- [16930.655223] RIP: 0010:tcp_trim_head+0x20/0xe0 [16930.655243] Code: 2e 0f 1f 84 00 00 00 00 00 90 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 f6 46 7e 01 74 2f 8b 86 bc 00 00 00 48 03 86 c0 00 00 00 <8b> 40 20 66 83 f8 01 74 19 31 d2 31 f6 b9 20 0a 00 00 48 89 df e8 [16930.655312] RSP: :c9003ad8 EFLAGS: 00010286 [16930.655331] RAX: fffe888005bf62c0 RBX: 8880115fb800 RCX: 801b [16930.655360] RDX: 05a0 RSI: 8880115fb800 RDI: 888016b00880 [16930.655387] RBP: 888016b00880 R08: 0001 R09: [16930.655414] R10: 88800ae00800 R11: bfe632e6 R12: 05a0 [16930.655441] R13: 0001 R14: bfe62d46 R15: 0004 [16930.655475] FS: 7fe71fe2cb80() GS:88801f20() knlGS: [16930.655502] CS: 0010 DS: ES: CR0: 80050033 [16930.655525] CR2: 55de0f3e7000 CR3: 11f32000 CR4: 06f0 [16930.63] Kernel panic - not syncing: Fatal exception in interrupt [16930.655789] Kernel Offset: disabled
Re: RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0
On 08/08/2019 12:21, Paolo Valente wrote: > > >> Il giorno 8 ago 2019, alle ore 12:21, Sander Eikelenboom >> ha scritto: >> >> On 08/08/2019 11:10, Paolo Valente wrote: >>> >>> >>>> Il giorno 8 ago 2019, alle ore 11:05, Sander Eikelenboom >>>> ha scritto: >>>> >>>> L.S., >>>> >>>> While testing a linux 5.3-rc3 kernel on my Xen server I come across the >>>> splat below when trying to shutdown all the VM's. >>>> This is after the server has ran for a few days without any problem. It >>>> seems to happen consistently. >>>> >>>> It seems it's in the same area as >>>> dbc3117d4ca9e17819ac73501e914b8422686750, but already rc3 incorporates >>>> that patch. >>>> >>>> Any ideas ? >>>> >>> >>> Could you try these fixes I proposed yesterday: >>> https://lkml.org/lkml/2019/8/7/536 >>> or, on patchwork: >>> https://patchwork.kernel.org/patch/11082247/ >>> https://patchwork.kernel.org/patch/11082249/ >> >> Hi Paolo, >> >> These two above seem to fix the issue ! >> So thanks for the swift reply (and the patchwork links for easy >> downloading the patches). >> >> I will test the third unrelated patch as well, but if you don't hear >> back , it's all good. >> > > Great! Thank you for offering to test also the other patch. Tested-by are > welcome too :) Hi, Haven't seen any problems with the patch so far, but haven't tested it on constraint memory, so i don't think a tested-by is justified in this case. -- Sander > Thanks, > Paolo > >> Thanks again ! >> >> -- >> Sander >> >>> I posted a further fix too, which should be unrelated. But, just in case: >>> https://lkml.org/lkml/2019/8/7/715 >>> or, on patchwork: >>> https://patchwork.kernel.org/patch/11082521/ >>> >>> Crossing my fingers (and think you for reporting this), >>> Paolo >>> >>>> -- >>>> Sander >>>> >>>> >>>> [80915.716048] BUG: unable to handle page fault for address: >>>> 1008 >>>> [80915.724188] #PF: supervisor write access in kernel mode >>>> [80915.733182] #PF: error_code(0x0002) - not-present page >>>> [80915.741455] PGD 0 P4D 0 >>>> [80915.750538] Oops: 0002 [#1] SMP NOPTI >>>> [80915.758425] CPU: 4 PID: 11407 Comm: 17.hda-2 Tainted: GW >>>> 5.3.0-rc3-20190807-doflr+ #1 >>>> [80915.766137] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS >>>> V1.8B1 09/13/2010 >>>> [80915.773737] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0 >>>> [80915.781294] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 >>>> f0 01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 >>>> <48> 89 4e 08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00 >>>> [80915.796792] RSP: e02b:c9000473be28 EFLAGS: 00010006 >>>> [80915.804419] RAX: 888070393200 RBX: 888076c4a800 RCX: >>>> 888076c4a9f8 >>>> [80915.810254] device vif17.0 left promiscuous mode >>>> [80915.811906] RDX: 1000 RSI: 1000 RDI: >>>> >>>> [80915.811908] RBP: 888077efc398 R08: 0004 R09: >>>> 81106800 >>>> [80915.811909] R10: 88807804ca40 R11: c9000473be31 R12: >>>> 888005256bf0 >>>> [80915.811909] R13: R14: 888005256800 R15: >>>> 82a6a3c0 >>>> [80915.811919] FS: 7f1c30a8dbc0() GS:88807d50() >>>> knlGS: >>>> [80915.819456] xen_bridge: port 18(vif17.0) entered disabled state >>>> [80915.826569] CS: 1e030 DS: ES: CR0: 80050033 >>>> [80915.826571] CR2: 1008 CR3: 5d9d CR4: >>>> 0660 >>>> [80915.826575] Call Trace: >>>> [80915.826592] bfq_exit_icq+0xe/0x20 >>>> [80915.826595] put_io_context_active+0x52/0x80 >>>> [80915.826599] do_exit+0x774/0xac0 >>>> [80915.906037] ? xen_blkif_be_int+0x30/0x30 >>>> [80915.913311] kthread+0xda/0x130 >>>> [80915.920398] ? kthread_park+0x80/0x80 >>>> [80915.927524] ret_from_fork+0x22/0x40 >>>> [80915.934512] Modules linked in: >>>> [8
Re: RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0
On 08/08/2019 11:10, Paolo Valente wrote: > > >> Il giorno 8 ago 2019, alle ore 11:05, Sander Eikelenboom >> ha scritto: >> >> L.S., >> >> While testing a linux 5.3-rc3 kernel on my Xen server I come across the >> splat below when trying to shutdown all the VM's. >> This is after the server has ran for a few days without any problem. It >> seems to happen consistently. >> >> It seems it's in the same area as dbc3117d4ca9e17819ac73501e914b8422686750, >> but already rc3 incorporates that patch. >> >> Any ideas ? >> > > Could you try these fixes I proposed yesterday: > https://lkml.org/lkml/2019/8/7/536 > or, on patchwork: > https://patchwork.kernel.org/patch/11082247/ > https://patchwork.kernel.org/patch/11082249/ Hi Paolo, These two above seem to fix the issue ! So thanks for the swift reply (and the patchwork links for easy downloading the patches). I will test the third unrelated patch as well, but if you don't hear back , it's all good. Thanks again ! -- Sander > I posted a further fix too, which should be unrelated. But, just in case: > https://lkml.org/lkml/2019/8/7/715 > or, on patchwork: > https://patchwork.kernel.org/patch/11082521/ > > Crossing my fingers (and think you for reporting this), > Paolo > >> -- >> Sander >> >> >> [80915.716048] BUG: unable to handle page fault for address: 1008 >> [80915.724188] #PF: supervisor write access in kernel mode >> [80915.733182] #PF: error_code(0x0002) - not-present page >> [80915.741455] PGD 0 P4D 0 >> [80915.750538] Oops: 0002 [#1] SMP NOPTI >> [80915.758425] CPU: 4 PID: 11407 Comm: 17.hda-2 Tainted: GW >> 5.3.0-rc3-20190807-doflr+ #1 >> [80915.766137] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS >> V1.8B1 09/13/2010 >> [80915.773737] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0 >> [80915.781294] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 >> f0 01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 <48> >> 89 4e 08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00 >> [80915.796792] RSP: e02b:c9000473be28 EFLAGS: 00010006 >> [80915.804419] RAX: 888070393200 RBX: 888076c4a800 RCX: >> 888076c4a9f8 >> [80915.810254] device vif17.0 left promiscuous mode >> [80915.811906] RDX: 1000 RSI: 1000 RDI: >> >> [80915.811908] RBP: 888077efc398 R08: 0004 R09: >> 81106800 >> [80915.811909] R10: 88807804ca40 R11: c9000473be31 R12: >> 888005256bf0 >> [80915.811909] R13: R14: 888005256800 R15: >> 82a6a3c0 >> [80915.811919] FS: 7f1c30a8dbc0() GS:88807d50() >> knlGS: >> [80915.819456] xen_bridge: port 18(vif17.0) entered disabled state >> [80915.826569] CS: 1e030 DS: ES: CR0: 80050033 >> [80915.826571] CR2: 1008 CR3: 5d9d CR4: >> 0660 >> [80915.826575] Call Trace: >> [80915.826592] bfq_exit_icq+0xe/0x20 >> [80915.826595] put_io_context_active+0x52/0x80 >> [80915.826599] do_exit+0x774/0xac0 >> [80915.906037] ? xen_blkif_be_int+0x30/0x30 >> [80915.913311] kthread+0xda/0x130 >> [80915.920398] ? kthread_park+0x80/0x80 >> [80915.927524] ret_from_fork+0x22/0x40 >> [80915.934512] Modules linked in: >> [80915.941412] CR2: 1008 >> [80915.948221] ---[ end trace 61315493e0f8ef40 ]--- >> [80915.954984] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0 >> [80915.961850] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 >> f0 01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 <48> >> 89 4e 08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00 >> [80915.976124] RSP: e02b:c9000473be28 EFLAGS: 00010006 >> [80915.983205] RAX: 888070393200 RBX: 888076c4a800 RCX: >> 888076c4a9f8 >> [80915.990321] RDX: 1000 RSI: 1000 RDI: >> >> [80915.997319] RBP: 888077efc398 R08: 0004 R09: >> 81106800 >> [80916.004427] R10: 88807804ca40 R11: c9000473be31 R12: >> 888005256bf0 >> [80916.011525] R13: R14: 888005256800 R15: >> 82a6a3c0 >> [80916.018679] FS: 7f1c30a8dbc0() GS:88807d50() >> knlGS: >> [80916.025897] CS: 1e030 DS: ES: CR0: 80050033 >> [80916.033116] CR2: 1008 CR3: 5d9d CR4: >> 0660 >> [80916.040348] Fixing recursive fault but reboot is needed! >
RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0
L.S., While testing a linux 5.3-rc3 kernel on my Xen server I come across the splat below when trying to shutdown all the VM's. This is after the server has ran for a few days without any problem. It seems to happen consistently. It seems it's in the same area as dbc3117d4ca9e17819ac73501e914b8422686750, but already rc3 incorporates that patch. Any ideas ? -- Sander [80915.716048] BUG: unable to handle page fault for address: 1008 [80915.724188] #PF: supervisor write access in kernel mode [80915.733182] #PF: error_code(0x0002) - not-present page [80915.741455] PGD 0 P4D 0 [80915.750538] Oops: 0002 [#1] SMP NOPTI [80915.758425] CPU: 4 PID: 11407 Comm: 17.hda-2 Tainted: GW 5.3.0-rc3-20190807-doflr+ #1 [80915.766137] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [80915.773737] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0 [80915.781294] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 f0 01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 <48> 89 4e 08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00 [80915.796792] RSP: e02b:c9000473be28 EFLAGS: 00010006 [80915.804419] RAX: 888070393200 RBX: 888076c4a800 RCX: 888076c4a9f8 [80915.810254] device vif17.0 left promiscuous mode [80915.811906] RDX: 1000 RSI: 1000 RDI: [80915.811908] RBP: 888077efc398 R08: 0004 R09: 81106800 [80915.811909] R10: 88807804ca40 R11: c9000473be31 R12: 888005256bf0 [80915.811909] R13: R14: 888005256800 R15: 82a6a3c0 [80915.811919] FS: 7f1c30a8dbc0() GS:88807d50() knlGS: [80915.819456] xen_bridge: port 18(vif17.0) entered disabled state [80915.826569] CS: 1e030 DS: ES: CR0: 80050033 [80915.826571] CR2: 1008 CR3: 5d9d CR4: 0660 [80915.826575] Call Trace: [80915.826592] bfq_exit_icq+0xe/0x20 [80915.826595] put_io_context_active+0x52/0x80 [80915.826599] do_exit+0x774/0xac0 [80915.906037] ? xen_blkif_be_int+0x30/0x30 [80915.913311] kthread+0xda/0x130 [80915.920398] ? kthread_park+0x80/0x80 [80915.927524] ret_from_fork+0x22/0x40 [80915.934512] Modules linked in: [80915.941412] CR2: 1008 [80915.948221] ---[ end trace 61315493e0f8ef40 ]--- [80915.954984] RIP: e030:bfq_exit_icq_bfqq+0x147/0x1c0 [80915.961850] Code: 00 00 00 00 00 00 48 0f ba b0 20 01 00 00 0c 48 8b 88 f0 01 00 00 48 85 c9 74 29 48 8b b0 e8 01 00 00 48 89 31 48 85 f6 74 04 <48> 89 4e 08 48 c7 80 e8 01 00 00 00 00 00 00 48 c7 80 f0 01 00 00 [80915.976124] RSP: e02b:c9000473be28 EFLAGS: 00010006 [80915.983205] RAX: 888070393200 RBX: 888076c4a800 RCX: 888076c4a9f8 [80915.990321] RDX: 1000 RSI: 1000 RDI: [80915.997319] RBP: 888077efc398 R08: 0004 R09: 81106800 [80916.004427] R10: 88807804ca40 R11: c9000473be31 R12: 888005256bf0 [80916.011525] R13: R14: 888005256800 R15: 82a6a3c0 [80916.018679] FS: 7f1c30a8dbc0() GS:88807d50() knlGS: [80916.025897] CS: 1e030 DS: ES: CR0: 80050033 [80916.033116] CR2: 1008 CR3: 5d9d CR4: 0660 [80916.040348] Fixing recursive fault but reboot is needed!
Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
On 10/02/2019 12:44, Heiner Kallweit wrote: > On 10.02.2019 10:16, Sander Eikelenboom wrote: >> On 09/02/2019 12:50, Heiner Kallweit wrote: >>> On 09.02.2019 11:07, Sander Eikelenboom wrote: >>>> On 09/02/2019 10:59, Heiner Kallweit wrote: >>>>> On 09.02.2019 10:34, Sander Eikelenboom wrote: >>>>>> On 09/02/2019 10:02, Heiner Kallweit wrote: >>>>>>> On 09.02.2019 00:09, Eric Dumazet wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote: >>>>>>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote: >>>>>>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote: >>>>>>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote: >>>>>>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote: >>>>>>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote: >>>>>>>>>>>>>> L.S., >>>>>>>>>>>>>> >>>>>>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top >>>>>>>>>>>>>> but they don't seem related) under Xen i the nasty splat below, >>>>>>>>>>>>>> that I haven encountered with Linux 4.20.x. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and >>>>>>>>>>>>>> bisecting could be nasty due to another (networking related) >>>>>>>>>>>>>> kernel bug. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If you need more info, want me to run a debug patch etc., please >>>>>>>>>>>>>> feel free to ask. >>>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the report. However I see no change in the r8169 >>>>>>>>>>>>> driver between >>>>>>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root >>>>>>>>>>>>> cause could >>>>>>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed. >>>>>>>>>>>> >>>>>>>>>>>> Hmm i did some diging and i think: >>>>>>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded >>>>>>>>>>>> mmiowb barriers >>>>>>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of >>>>>>>>>>>> xmit_more and __netdev_sent_queue >>>>>>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add >>>>>>>>>>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue >>>>>>>>>>>> >>>>>>>>>>> You're right. Thought this was added in 4.20 already. >>>>>>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I >>>>>>>>>>> haven't heard about >>>>>>>>>>> this issue from any user of physical hw. And due to the fact that a >>>>>>>>>>> lot of mainboards >>>>>>>>>>> have onboard Realtek network I have quite a few testers out there. >>>>>>>>>>> Does the issue occur under specific circumstances like very high >>>>>>>>>>> load? >>>>>>>>>> >>>>>>>>>> Yep, the box is already quite contented with the Xen VM's and if I >>>>>>>>>> remember correctly it occurred while kernel compiling >>>>>>>>>> on the host. >>>>>>>>>> >>>>>>>>>>> If indeed the xmit_more patch causes the issue, I think we have to >>>>>>>>>>> involve Eric Dumazet >>>>>>>>>>> as author of the underlying changes. >>>>>>>>>> >>>>>>>>>> It could also be the barriers weren't that unneeded as assumed. >>>>>>&g
Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
On 10/02/2019 12:44, Heiner Kallweit wrote: > On 10.02.2019 10:16, Sander Eikelenboom wrote: >> On 09/02/2019 12:50, Heiner Kallweit wrote: >>> On 09.02.2019 11:07, Sander Eikelenboom wrote: >>>> On 09/02/2019 10:59, Heiner Kallweit wrote: >>>>> On 09.02.2019 10:34, Sander Eikelenboom wrote: >>>>>> On 09/02/2019 10:02, Heiner Kallweit wrote: >>>>>>> On 09.02.2019 00:09, Eric Dumazet wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote: >>>>>>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote: >>>>>>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote: >>>>>>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote: >>>>>>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote: >>>>>>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote: >>>>>>>>>>>>>> L.S., >>>>>>>>>>>>>> >>>>>>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top >>>>>>>>>>>>>> but they don't seem related) under Xen i the nasty splat below, >>>>>>>>>>>>>> that I haven encountered with Linux 4.20.x. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and >>>>>>>>>>>>>> bisecting could be nasty due to another (networking related) >>>>>>>>>>>>>> kernel bug. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If you need more info, want me to run a debug patch etc., please >>>>>>>>>>>>>> feel free to ask. >>>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the report. However I see no change in the r8169 >>>>>>>>>>>>> driver between >>>>>>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root >>>>>>>>>>>>> cause could >>>>>>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed. >>>>>>>>>>>> >>>>>>>>>>>> Hmm i did some diging and i think: >>>>>>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded >>>>>>>>>>>> mmiowb barriers >>>>>>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of >>>>>>>>>>>> xmit_more and __netdev_sent_queue >>>>>>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add >>>>>>>>>>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue >>>>>>>>>>>> >>>>>>>>>>> You're right. Thought this was added in 4.20 already. >>>>>>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I >>>>>>>>>>> haven't heard about >>>>>>>>>>> this issue from any user of physical hw. And due to the fact that a >>>>>>>>>>> lot of mainboards >>>>>>>>>>> have onboard Realtek network I have quite a few testers out there. >>>>>>>>>>> Does the issue occur under specific circumstances like very high >>>>>>>>>>> load? >>>>>>>>>> >>>>>>>>>> Yep, the box is already quite contented with the Xen VM's and if I >>>>>>>>>> remember correctly it occurred while kernel compiling >>>>>>>>>> on the host. >>>>>>>>>> >>>>>>>>>>> If indeed the xmit_more patch causes the issue, I think we have to >>>>>>>>>>> involve Eric Dumazet >>>>>>>>>>> as author of the underlying changes. >>>>>>>>>> >>>>>>>>>> It could also be the barriers weren't that unneeded as assumed. >>>>>>&g
Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
On 09/02/2019 12:50, Heiner Kallweit wrote: > On 09.02.2019 11:07, Sander Eikelenboom wrote: >> On 09/02/2019 10:59, Heiner Kallweit wrote: >>> On 09.02.2019 10:34, Sander Eikelenboom wrote: >>>> On 09/02/2019 10:02, Heiner Kallweit wrote: >>>>> On 09.02.2019 00:09, Eric Dumazet wrote: >>>>>> >>>>>> >>>>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote: >>>>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote: >>>>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote: >>>>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote: >>>>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote: >>>>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote: >>>>>>>>>>>> L.S., >>>>>>>>>>>> >>>>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but >>>>>>>>>>>> they don't seem related) under Xen i the nasty splat below, >>>>>>>>>>>> that I haven encountered with Linux 4.20.x. >>>>>>>>>>>> >>>>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and >>>>>>>>>>>> bisecting could be nasty due to another (networking related) >>>>>>>>>>>> kernel bug. >>>>>>>>>>>> >>>>>>>>>>>> If you need more info, want me to run a debug patch etc., please >>>>>>>>>>>> feel free to ask. >>>>>>>>>>>> >>>>>>>>>>> Thanks for the report. However I see no change in the r8169 driver >>>>>>>>>>> between >>>>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root >>>>>>>>>>> cause could >>>>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed. >>>>>>>>>> >>>>>>>>>> Hmm i did some diging and i think: >>>>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded >>>>>>>>>> mmiowb barriers >>>>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of >>>>>>>>>> xmit_more and __netdev_sent_queue >>>>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add >>>>>>>>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue >>>>>>>>>> >>>>>>>>> You're right. Thought this was added in 4.20 already. >>>>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I >>>>>>>>> haven't heard about >>>>>>>>> this issue from any user of physical hw. And due to the fact that a >>>>>>>>> lot of mainboards >>>>>>>>> have onboard Realtek network I have quite a few testers out there. >>>>>>>>> Does the issue occur under specific circumstances like very high load? >>>>>>>> >>>>>>>> Yep, the box is already quite contented with the Xen VM's and if I >>>>>>>> remember correctly it occurred while kernel compiling >>>>>>>> on the host. >>>>>>>> >>>>>>>>> If indeed the xmit_more patch causes the issue, I think we have to >>>>>>>>> involve Eric Dumazet >>>>>>>>> as author of the underlying changes. >>>>>>>> >>>>>>>> It could also be the barriers weren't that unneeded as assumed. >>>>>>> >>>>>>> The barriers were removed after adding xmit_more handling. Therefore it >>>>>>> would be good to >>>>>>> test also with only >>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb >>>>>>> barriers >>>>>>> removed. >>>>>>> >>>>>>>> Since we are almost at RC6 i took the liberty to CC Eric now. >>>>>>>> >>>>>>> Sure, thanks. >>>>>>>
Re: Linux 5.0 regression: BUG: unable to handle kernel paging request at ffff888023e26778 RIP: e030:move_page_tables+0x7c1/0xae0
On 09/02/2019 19:48, Juergen Gross wrote: > On 09/02/2019 19:45, Sander Eikelenboom wrote: >> On 09/02/2019 09:26, Sander Eikelenboom wrote: >>> L.S., >>> >>> >>> While testing a Linux 5.0-rc5-ish kernel (pull of yesterday) with some >>> additional patches for >>> already reported other issues i came across the issue below which i haven't >>> seen with 4.20.x >>> >>> I haven't got a reproducer so i might be hard to hit it again, >>> system is AMD and this is from the host kernel running under >>> the Xen hypervisor might it matter. >> >>> -- >>> >>> Sander >> >> Hi Boris / Juergen, >> >> The commit causing this is: >> 2c91bd4a4e2e530582d6fd643ea7b86b27907151 mm: speed up mremap by 20x on large >> regions >> >> Since it seems there haven't been any other reports about this .. >> could it be this doesn't specifically work well with a Xen PVH dom0 ? > > PVH? Not PV? Ah sorry, indeed PV ! > > Juergen >
Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
On 09/02/2019 10:59, Heiner Kallweit wrote: > On 09.02.2019 10:34, Sander Eikelenboom wrote: >> On 09/02/2019 10:02, Heiner Kallweit wrote: >>> On 09.02.2019 00:09, Eric Dumazet wrote: >>>> >>>> >>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote: >>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote: >>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote: >>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote: >>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote: >>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote: >>>>>>>>>> L.S., >>>>>>>>>> >>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but >>>>>>>>>> they don't seem related) under Xen i the nasty splat below, >>>>>>>>>> that I haven encountered with Linux 4.20.x. >>>>>>>>>> >>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and >>>>>>>>>> bisecting could be nasty due to another (networking related) kernel >>>>>>>>>> bug. >>>>>>>>>> >>>>>>>>>> If you need more info, want me to run a debug patch etc., please >>>>>>>>>> feel free to ask. >>>>>>>>>> >>>>>>>>> Thanks for the report. However I see no change in the r8169 driver >>>>>>>>> between >>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause >>>>>>>>> could >>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed. >>>>>>>> >>>>>>>> Hmm i did some diging and i think: >>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb >>>>>>>> barriers >>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more >>>>>>>> and __netdev_sent_queue >>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add >>>>>>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue >>>>>>>> >>>>>>> You're right. Thought this was added in 4.20 already. >>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't >>>>>>> heard about >>>>>>> this issue from any user of physical hw. And due to the fact that a lot >>>>>>> of mainboards >>>>>>> have onboard Realtek network I have quite a few testers out there. >>>>>>> Does the issue occur under specific circumstances like very high load? >>>>>> >>>>>> Yep, the box is already quite contented with the Xen VM's and if I >>>>>> remember correctly it occurred while kernel compiling >>>>>> on the host. >>>>>> >>>>>>> If indeed the xmit_more patch causes the issue, I think we have to >>>>>>> involve Eric Dumazet >>>>>>> as author of the underlying changes. >>>>>> >>>>>> It could also be the barriers weren't that unneeded as assumed. >>>>> >>>>> The barriers were removed after adding xmit_more handling. Therefore it >>>>> would be good to >>>>> test also with only >>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb >>>>> barriers >>>>> removed. >>>>> >>>>>> Since we are almost at RC6 i took the liberty to CC Eric now. >>>>>> >>>>> Sure, thanks. >>>>> >>>>>> BTW am i correct these patches are merely optimizations ? >>>>> >>>>> Yes >>>>> >>>>>> If so and concluding they revert cleanly, perhaps it should be >>>>>> considered at this point in the RC's >>>>>> to revert them for 5.0 and try again for 5.1 ? >>>>>> >>>>> Before removing both it would be good to test with only the >>>>> barrier-removal removed. >>>>> >>>> >>>> Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169
Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
On 09/02/2019 10:02, Heiner Kallweit wrote: > On 09.02.2019 00:09, Eric Dumazet wrote: >> >> >> On 02/08/2019 01:50 PM, Heiner Kallweit wrote: >>> On 08.02.2019 22:45, Sander Eikelenboom wrote: >>>> On 08/02/2019 22:22, Heiner Kallweit wrote: >>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote: >>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote: >>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote: >>>>>>>> L.S., >>>>>>>> >>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but >>>>>>>> they don't seem related) under Xen i the nasty splat below, >>>>>>>> that I haven encountered with Linux 4.20.x. >>>>>>>> >>>>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting >>>>>>>> could be nasty due to another (networking related) kernel bug. >>>>>>>> >>>>>>>> If you need more info, want me to run a debug patch etc., please feel >>>>>>>> free to ask. >>>>>>>> >>>>>>> Thanks for the report. However I see no change in the r8169 driver >>>>>>> between >>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause >>>>>>> could >>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed. >>>>>> >>>>>> Hmm i did some diging and i think: >>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb >>>>>> barriers >>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more >>>>>> and __netdev_sent_queue >>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add >>>>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue >>>>>> >>>>> You're right. Thought this was added in 4.20 already. >>>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't >>>>> heard about >>>>> this issue from any user of physical hw. And due to the fact that a lot >>>>> of mainboards >>>>> have onboard Realtek network I have quite a few testers out there. >>>>> Does the issue occur under specific circumstances like very high load? >>>> >>>> Yep, the box is already quite contented with the Xen VM's and if I >>>> remember correctly it occurred while kernel compiling >>>> on the host. >>>> >>>>> If indeed the xmit_more patch causes the issue, I think we have to >>>>> involve Eric Dumazet >>>>> as author of the underlying changes. >>>> >>>> It could also be the barriers weren't that unneeded as assumed. >>> >>> The barriers were removed after adding xmit_more handling. Therefore it >>> would be good to >>> test also with only >>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb >>> barriers >>> removed. >>> >>>> Since we are almost at RC6 i took the liberty to CC Eric now. >>>> >>> Sure, thanks. >>> >>>> BTW am i correct these patches are merely optimizations ? >>> >>> Yes >>> >>>> If so and concluding they revert cleanly, perhaps it should be considered >>>> at this point in the RC's >>>> to revert them for 5.0 and try again for 5.1 ? >>>> >>> Before removing both it would be good to test with only the barrier-removal >>> removed. >>> >> >> Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more >> and __netdev_sent_queue >> looks buggy to me, since the skb might have been freed already on another >> cpu when you call >> >> You could try : >> >> diff --git a/drivers/net/ethernet/realtek/r8169.c >> b/drivers/net/ethernet/realtek/r8169.c >> index >> 3624e67aef72c92ed6e908e2c99ac2d381210126..f907d484165d9fd775e81bf2bfb9aa4ddedb1c93 >> 100644 >> --- a/drivers/net/ethernet/realtek/r8169.c >> +++ b/drivers/net/ethernet/realtek/r8169.c >> @@ -6070,6 +6070,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff >> *skb, >> dma_addr_t mapping; >> u32 opts[2], len; >> bool stop_queue; >> +
Linux 5.0 regression: BUG: unable to handle kernel paging request at ffff888023e26778
L.S., While testing a Linux 5.0-rc5-ish kernel (pull of yesterday) with some additional patches for already reported other issues i came across the issue below which i haven't seen with 4.20.x I haven't got a reproducer so i might be hard to hit it again, system is AMD and this is from the host kernel running under the Xen hypervisor might it matter. -- Sander [17035.016433] BUG: unable to handle kernel paging request at 888023e26778 [17035.025887] #PF error: [PROT] [WRITE] [17035.035146] PGD 2a2a067 P4D 2a2a067 PUD 2a2b067 PMD 7fe01067 PTE 801023e26065 [17035.044371] Oops: 0003 [#1] SMP NOPTI [17035.053720] CPU: 3 PID: 28310 Comm: apt-get Not tainted 5.0.0-rc5-20190208-thp-net-florian-rtl8169-eric-doflr+ #1 [17035.063440] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [17035.072635] RIP: e030:move_page_tables+0x7c1/0xae0 [17035.081585] Code: ce 00 48 8b 03 31 ff 48 89 44 24 20 e8 9e 72 e4 ff 66 90 48 89 c6 48 89 df e8 8b 89 e4 ff 66 90 48 8b 44 24 20 b9 0c 00 00 00 <48> 89 45 00 41 f6 46 52 40 0f 85 3f 02 00 00 49 8b 7e 40 45 31 c0 [17035.100225] RSP: e02b:c9f2bd40 EFLAGS: 00010282 [17035.109208] RAX: 000475e42067 RBX: 888023e267e0 RCX: 000c [17035.118332] RDX: RSI: RDI: 0201 [17035.127378] RBP: 888023e26778 R08: R09: 00051c1d9000 [17035.136310] R10: deadbeefdeadf00d R11: 88807fc17000 R12: 7fc59fa0 [17035.145433] R13: ea8f89a8 R14: 88801c2286c0 R15: 7fc59f80 [17035.154171] FS: 7fc5a5591100() GS:88807d4c() knlGS: [17035.162730] CS: e030 DS: ES: CR0: 80050033 [17035.171180] CR2: 888023e26778 CR3: 1c3f6000 CR4: 0660 [17035.179545] Call Trace: [17035.187736] move_vma.isra.3+0xd1/0x2d0 [17035.195837] __se_sys_mremap+0x3c6/0x5b0 [17035.203986] do_syscall_64+0x49/0x100 [17035.212109] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [17035.219971] RIP: 0033:0x7fc5a453527a [17035.227558] Code: 73 01 c3 48 8b 0d 1e fc 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 19 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ee fb 2a 00 f7 d8 64 89 01 48 [17035.243255] RSP: 002b:7ffda22d96f8 EFLAGS: 0246 ORIG_RAX: 0019 [17035.251121] RAX: ffda RBX: 557d40923a30 RCX: 7fc5a453527a [17035.258986] RDX: 01a0 RSI: 0190 RDI: 7fc59f7ff000 [17035.267127] RBP: 01a0 R08: 0020 R09: 0040 [17035.275259] R10: 0001 R11: 0246 R12: 7fc59f7ff060 [17035.282681] R13: 7fc59f7ff000 R14: 557d40923a30 R15: 557d40829aa0 [17035.290322] Modules linked in: [17035.297875] CR2: 888023e26778 [17035.305405] ---[ end trace 6ff49f09286816b6 ]--- [17035.313131] RIP: e030:move_page_tables+0x7c1/0xae0 [17035.320326] Code: ce 00 48 8b 03 31 ff 48 89 44 24 20 e8 9e 72 e4 ff 66 90 48 89 c6 48 89 df e8 8b 89 e4 ff 66 90 48 8b 44 24 20 b9 0c 00 00 00 <48> 89 45 00 41 f6 46 52 40 0f 85 3f 02 00 00 49 8b 7e 40 45 31 c0 [17035.334851] RSP: e02b:c9f2bd40 EFLAGS: 00010282 [17035.341727] RAX: 000475e42067 RBX: 888023e267e0 RCX: 000c [17035.348838] RDX: RSI: RDI: 0201 [17035.356000] RBP: 888023e26778 R08: R09: 00051c1d9000 [17035.363623] R10: deadbeefdeadf00d R11: 88807fc17000 R12: 7fc59fa0 [17035.371454] R13: ea8f89a8 R14: 88801c2286c0 R15: 7fc59f80 [17035.378958] FS: 7fc5a5591100() GS:88807d4c() knlGS: [17035.386585] CS: e030 DS: ES: CR0: 80050033 [17035.393797] CR2: 888023e26778 CR3: 1c3f6000 CR4: 0660
Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
On 08/02/2019 22:50, Heiner Kallweit wrote: > On 08.02.2019 22:45, Sander Eikelenboom wrote: >> On 08/02/2019 22:22, Heiner Kallweit wrote: >>> On 08.02.2019 21:55, Sander Eikelenboom wrote: >>>> On 08/02/2019 19:52, Heiner Kallweit wrote: >>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote: >>>>>> L.S., >>>>>> >>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they >>>>>> don't seem related) under Xen i the nasty splat below, >>>>>> that I haven encountered with Linux 4.20.x. >>>>>> >>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting >>>>>> could be nasty due to another (networking related) kernel bug. >>>>>> >>>>>> If you need more info, want me to run a debug patch etc., please feel >>>>>> free to ask. >>>>>> >>>>> Thanks for the report. However I see no change in the r8169 driver between >>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause >>>>> could >>>>> be somewhere else. Therefore I'm afraid a bisect will be needed. >>>> >>>> Hmm i did some diging and i think: >>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb >>>> barriers >>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and >>>> __netdev_sent_queue >>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add >>>> __netdev_sent_queue as variant of __netdev_tx_sent_queue >>>> >>> You're right. Thought this was added in 4.20 already. >>> The BQL code pattern I copied from the mlx4 driver and so far I haven't >>> heard about >>> this issue from any user of physical hw. And due to the fact that a lot of >>> mainboards >>> have onboard Realtek network I have quite a few testers out there. >>> Does the issue occur under specific circumstances like very high load? >> >> Yep, the box is already quite contented with the Xen VM's and if I remember >> correctly it occurred while kernel compiling >> on the host. >> >>> If indeed the xmit_more patch causes the issue, I think we have to involve >>> Eric Dumazet >>> as author of the underlying changes. >> >> It could also be the barriers weren't that unneeded as assumed. > > The barriers were removed after adding xmit_more handling. Therefore it would > be good to > test also with only > bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb > barriers > removed. *arghh* *grmbl* with both: bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 and 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 reverted i get yet another splat: [ 3769.246083] ld: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 [ 3769.246095] CPU: 2 PID: 3201 Comm: ld Not tainted 5.0.0-rc5-20190208-thp-net-florian-rtl8169-doflr+ #1 [ 3769.246096] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 3769.246098] Call Trace: [ 3769.246104] [ 3769.246114] dump_stack+0x5c/0x7b [ 3769.246120] warn_alloc+0x103/0x190 [ 3769.246122] __alloc_pages_nodemask+0xe3d/0xe80 [ 3769.246128] ? inet_gro_receive+0x232/0x2c0 [ 3769.246130] page_frag_alloc+0x117/0x150 [ 3769.246132] __napi_alloc_skb+0x83/0xd0 [ 3769.246137] rtl8169_poll+0x210/0x640 [ 3769.246140] net_rx_action+0x23d/0x370 [ 3769.246145] __do_softirq+0xed/0x229 [ 3769.246149] irq_exit+0xb7/0xc0 [ 3769.246152] xen_evtchn_do_upcall+0x27/0x40 [ 3769.246154] xen_do_hypervisor_callback+0x29/0x40 [ 3769.246155] [ 3769.246161] RIP: e030:__pv_queued_spin_lock_slowpath+0xda/0x280 [ 3769.246163] Code: 14 41 bc 01 00 00 00 41 bd 00 01 00 00 3c 02 0f 94 c0 0f b6 c0 48 89 04 24 c6 45 14 00 ba 00 80 00 00 c6 43 01 01 eb 0b f3 90 <83> ea 01 0f 84 49 01 00 00 0f b6 03 84 c0 75 ee 44 89 e8 f0 66 44 [ 3769.246164] RSP: e02b:c90005b0f780 EFLAGS: 0202 [ 3769.246166] RAX: 0001 RBX: 8880047c9200 RCX: 0001 [ 3769.246167] RDX: 7d75 RSI: RDI: 8880047c9200 [ 3769.246167] RBP: 88807d4a1a80 R08: c90005b0f978 R09: c90005b0f978 [ 3769.246168] R10: c90005b0f9d0 R11: 88807fc17000 R12: 0001 [ 3769.246169] R13: 0100 R14: R15: 000c [ 3769.246173] _raw_spin_lock+0x16/0x20 [ 3769.246176] list_lru_add+0x59/0x170 [ 3769.246179] inode_lru_list_add+0x1b/0x40 [ 3769.246182] iput+0x18b/0x1a0 [ 3769.246184] __dentry_kill+0xc5/0x170 [
Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
On 08/02/2019 22:22, Heiner Kallweit wrote: > On 08.02.2019 21:55, Sander Eikelenboom wrote: >> On 08/02/2019 19:52, Heiner Kallweit wrote: >>> On 08.02.2019 19:29, Sander Eikelenboom wrote: >>>> L.S., >>>> >>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they >>>> don't seem related) under Xen i the nasty splat below, >>>> that I haven encountered with Linux 4.20.x. >>>> >>>> Unfortunately I haven't got a clear reproducer for this and bisecting >>>> could be nasty due to another (networking related) kernel bug. >>>> >>>> If you need more info, want me to run a debug patch etc., please feel free >>>> to ask. >>>> >>> Thanks for the report. However I see no change in the r8169 driver between >>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could >>> be somewhere else. Therefore I'm afraid a bisect will be needed. >> >> Hmm i did some diging and i think: >> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb >> barriers >> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and >> __netdev_sent_queue >> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue >> as variant of __netdev_tx_sent_queue >> > You're right. Thought this was added in 4.20 already. > The BQL code pattern I copied from the mlx4 driver and so far I haven't heard > about > this issue from any user of physical hw. And due to the fact that a lot of > mainboards > have onboard Realtek network I have quite a few testers out there. > Does the issue occur under specific circumstances like very high load? Yep, the box is already quite contented with the Xen VM's and if I remember correctly it occurred while kernel compiling on the host. > If indeed the xmit_more patch causes the issue, I think we have to involve > Eric Dumazet > as author of the underlying changes. It could also be the barriers weren't that unneeded as assumed. Since we are almost at RC6 i took the liberty to CC Eric now. BTW am i correct these patches are merely optimizations ? If so and concluding they revert cleanly, perhaps it should be considered at this point in the RC's to revert them for 5.0 and try again for 5.1 ? -- Sander > >> would be candidates, which were merged in 5.0. >> >> I have reverted the first two, see how that works out. >> >> -- >> Sander >> > Heiner > >> >>>> -- >>>> Sander >>>> >>> Heiner >>> >>>> >>>> [ 6466.554866] kernel BUG at lib/dynamic_queue_limits.c:27! >>>> [ 6466.571425] invalid opcode: [#1] SMP NOPTI >>>> [ 6466.585890] CPU: 3 PID: 7057 Comm: as Not tainted >>>> 5.0.0-rc5-20190208-thp-net-florian-doflr+ #1 >>>> [ 6466.598693] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS >>>> V1.8B1 09/13/2010 >>>> [ 6466.611579] RIP: e030:dql_completed+0x126/0x140 >>>> [ 6466.624339] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 >>>> 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff >>>> <0f> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90 >>>> [ 6466.648130] RSP: e02b:88807d4c3e78 EFLAGS: 00010297 >>>> [ 6466.659616] RAX: 0042 RBX: 8880049cf800 RCX: >>>> >>>> [ 6466.672835] RDX: 0001 RSI: 0042 RDI: >>>> 8880049cf8c0 >>>> [ 6466.684521] RBP: 888077df7260 R08: 0001 R09: >>>> >>>> [ 6466.696824] R10: 387c2336 R11: 387c2336 R12: >>>> 1000 >>>> [ 6466.709953] R13: 888077df6898 R14: 888077df75c0 R15: >>>> 00454677 >>>> [ 6466.722165] FS: 7fd869147200() GS:88807d4c() >>>> knlGS: >>>> [ 6466.733228] CS: e030 DS: ES: CR0: 80050033 >>>> [ 6466.746581] CR2: 7fd867dfd000 CR3: 74884000 CR4: >>>> 0660 >>>> [ 6466.758366] Call Trace: >>>> [ 6466.768118] >>>> [ 6466.778214] rtl8169_poll+0x4f4/0x640 >>>> [ 6466.789198] net_rx_action+0x23d/0x370 >>>> [ 6466.798467] __do_softirq+0xed/0x229 >>>> [ 6466.807039] irq_exit+0xb7/0xc0 >>>> [ 6466.815471] xen_evtchn_do_upcall+0x27/0x40 >>>> [ 6466.826647] xen_
Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
On 08/02/2019 19:52, Heiner Kallweit wrote: > On 08.02.2019 19:29, Sander Eikelenboom wrote: >> L.S., >> >> While testing a linux 5.0-rc5 kernel (with some patches on top but they >> don't seem related) under Xen i the nasty splat below, >> that I haven encountered with Linux 4.20.x. >> >> Unfortunately I haven't got a clear reproducer for this and bisecting could >> be nasty due to another (networking related) kernel bug. >> >> If you need more info, want me to run a debug patch etc., please feel free >> to ask. >> > Thanks for the report. However I see no change in the r8169 driver between > 4.20 and 5.0 with regard to BQL code. Having said that the root cause could > be somewhere else. Therefore I'm afraid a bisect will be needed. Hmm i did some diging and i think: bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue would be candidates, which were merged in 5.0. I have reverted the first two, see how that works out. -- Sander >> -- >> Sander >> > Heiner > >> >> [ 6466.554866] kernel BUG at lib/dynamic_queue_limits.c:27! >> [ 6466.571425] invalid opcode: [#1] SMP NOPTI >> [ 6466.585890] CPU: 3 PID: 7057 Comm: as Not tainted >> 5.0.0-rc5-20190208-thp-net-florian-doflr+ #1 >> [ 6466.598693] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS >> V1.8B1 09/13/2010 >> [ 6466.611579] RIP: e030:dql_completed+0x126/0x140 >> [ 6466.624339] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 >> 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff <0f> >> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90 >> [ 6466.648130] RSP: e02b:88807d4c3e78 EFLAGS: 00010297 >> [ 6466.659616] RAX: 0042 RBX: 8880049cf800 RCX: >> >> [ 6466.672835] RDX: 0001 RSI: 0042 RDI: >> 8880049cf8c0 >> [ 6466.684521] RBP: 888077df7260 R08: 0001 R09: >> >> [ 6466.696824] R10: 387c2336 R11: 387c2336 R12: >> 1000 >> [ 6466.709953] R13: 888077df6898 R14: 888077df75c0 R15: >> 00454677 >> [ 6466.722165] FS: 7fd869147200() GS:88807d4c() >> knlGS: >> [ 6466.733228] CS: e030 DS: ES: CR0: 80050033 >> [ 6466.746581] CR2: 7fd867dfd000 CR3: 74884000 CR4: >> 0660 >> [ 6466.758366] Call Trace: >> [ 6466.768118] >> [ 6466.778214] rtl8169_poll+0x4f4/0x640 >> [ 6466.789198] net_rx_action+0x23d/0x370 >> [ 6466.798467] __do_softirq+0xed/0x229 >> [ 6466.807039] irq_exit+0xb7/0xc0 >> [ 6466.815471] xen_evtchn_do_upcall+0x27/0x40 >> [ 6466.826647] xen_do_hypervisor_callback+0x29/0x40 >> [ 6466.835902] >> [ 6466.845361] RIP: e030:xen_hypercall_mmu_update+0xa/0x20 >> [ 6466.853390] Code: 51 41 53 b8 00 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc >> cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 01 00 00 00 0f 05 <41> >> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc >> [ 6466.874031] RSP: e02b:c90003c0bdd0 EFLAGS: 0246 >> [ 6466.883452] RAX: RBX: 00041f83bfe8 RCX: >> 8100102a >> [ 6466.891986] RDX: deadbeefdeadf00d RSI: deadbeefdeadf00d RDI: >> deadbeefdeadf00d >> [ 6466.903402] RBP: 0fe8 R08: 000b R09: >> >> [ 6466.911201] R10: deadbeefdeadf00d R11: 0246 R12: >> 80050c346067 >> [ 6466.918491] R13: 8880607c4fe8 R14: 888005082800 R15: >> >> [ 6466.926647] ? xen_hypercall_mmu_update+0xa/0x20 >> [ 6466.938195] ? xen_set_pte_at+0x78/0xe0 >> [ 6466.947046] ? __handle_mm_fault+0xc43/0x1060 >> [ 6466.955772] ? do_mmap+0x44b/0x5b0 >> [ 6466.964410] ? handle_mm_fault+0xf8/0x200 >> [ 6466.973290] ? __do_page_fault+0x231/0x4a0 >> [ 6466.981973] ? page_fault+0x8/0x30 >> [ 6466.990904] ? page_fault+0x1e/0x30 >> [ 6466.999585] Modules linked in: >> [ 6467.007533] ---[ end trace 94bec01608fe4061 ]--- >> [ 6467.016751] RIP: e030:dql_completed+0x126/0x140 >> [ 6467.024271] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 >> 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff <0f> >> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90
Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
L.S., While testing a linux 5.0-rc5 kernel (with some patches on top but they don't seem related) under Xen i the nasty splat below, that I haven encountered with Linux 4.20.x. Unfortunately I haven't got a clear reproducer for this and bisecting could be nasty due to another (networking related) kernel bug. If you need more info, want me to run a debug patch etc., please feel free to ask. -- Sander [ 6466.554866] kernel BUG at lib/dynamic_queue_limits.c:27! [ 6466.571425] invalid opcode: [#1] SMP NOPTI [ 6466.585890] CPU: 3 PID: 7057 Comm: as Not tainted 5.0.0-rc5-20190208-thp-net-florian-doflr+ #1 [ 6466.598693] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 6466.611579] RIP: e030:dql_completed+0x126/0x140 [ 6466.624339] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff <0f> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90 [ 6466.648130] RSP: e02b:88807d4c3e78 EFLAGS: 00010297 [ 6466.659616] RAX: 0042 RBX: 8880049cf800 RCX: [ 6466.672835] RDX: 0001 RSI: 0042 RDI: 8880049cf8c0 [ 6466.684521] RBP: 888077df7260 R08: 0001 R09: [ 6466.696824] R10: 387c2336 R11: 387c2336 R12: 1000 [ 6466.709953] R13: 888077df6898 R14: 888077df75c0 R15: 00454677 [ 6466.722165] FS: 7fd869147200() GS:88807d4c() knlGS: [ 6466.733228] CS: e030 DS: ES: CR0: 80050033 [ 6466.746581] CR2: 7fd867dfd000 CR3: 74884000 CR4: 0660 [ 6466.758366] Call Trace: [ 6466.768118] [ 6466.778214] rtl8169_poll+0x4f4/0x640 [ 6466.789198] net_rx_action+0x23d/0x370 [ 6466.798467] __do_softirq+0xed/0x229 [ 6466.807039] irq_exit+0xb7/0xc0 [ 6466.815471] xen_evtchn_do_upcall+0x27/0x40 [ 6466.826647] xen_do_hypervisor_callback+0x29/0x40 [ 6466.835902] [ 6466.845361] RIP: e030:xen_hypercall_mmu_update+0xa/0x20 [ 6466.853390] Code: 51 41 53 b8 00 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 01 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [ 6466.874031] RSP: e02b:c90003c0bdd0 EFLAGS: 0246 [ 6466.883452] RAX: RBX: 00041f83bfe8 RCX: 8100102a [ 6466.891986] RDX: deadbeefdeadf00d RSI: deadbeefdeadf00d RDI: deadbeefdeadf00d [ 6466.903402] RBP: 0fe8 R08: 000b R09: [ 6466.911201] R10: deadbeefdeadf00d R11: 0246 R12: 80050c346067 [ 6466.918491] R13: 8880607c4fe8 R14: 888005082800 R15: [ 6466.926647] ? xen_hypercall_mmu_update+0xa/0x20 [ 6466.938195] ? xen_set_pte_at+0x78/0xe0 [ 6466.947046] ? __handle_mm_fault+0xc43/0x1060 [ 6466.955772] ? do_mmap+0x44b/0x5b0 [ 6466.964410] ? handle_mm_fault+0xf8/0x200 [ 6466.973290] ? __do_page_fault+0x231/0x4a0 [ 6466.981973] ? page_fault+0x8/0x30 [ 6466.990904] ? page_fault+0x1e/0x30 [ 6466.999585] Modules linked in: [ 6467.007533] ---[ end trace 94bec01608fe4061 ]--- [ 6467.016751] RIP: e030:dql_completed+0x126/0x140 [ 6467.024271] Code: 2b 47 54 ba 00 00 00 00 c7 47 54 ff ff ff ff 0f 48 c2 48 8b 15 7b 39 4a 01 48 89 57 58 e9 48 ff ff ff 44 89 c0 e9 40 ff ff ff <0f> 0b 8b 47 50 29 e8 41 0f 48 c3 eb 9f 90 90 90 90 90 90 90 90 90 [ 6467.039726] RSP: e02b:88807d4c3e78 EFLAGS: 00010297 [ 6467.047243] RAX: 0042 RBX: 8880049cf800 RCX: [ 6467.054202] RDX: 0001 RSI: 0042 RDI: 8880049cf8c0 [ 6467.062000] RBP: 888077df7260 R08: 0001 R09: [ 6467.069664] R10: 387c2336 R11: 387c2336 R12: 1000 [ 6467.077715] R13: 888077df6898 R14: 888077df75c0 R15: 00454677 [ 6467.084916] FS: 7fd869147200() GS:88807d4c() knlGS: [ 6467.093352] CS: e030 DS: ES: CR0: 80050033 [ 6467.101492] CR2: 7fd867dfd000 CR3: 74884000 CR4: 0660 [ 6467.110542] Kernel panic - not syncing: Fatal exception in interrupt [ 6467.118166] Kernel Offset: disabled (XEN) [2019-02-08 18:04:48.854] Hardware Dom0 crashed: rebooting machine in 5 seconds.
Re: Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt
On 08/02/2019 12:54, Florian Westphal wrote: > Florian Westphal wrote: >> Sander Eikelenboom wrote: >>> L.S., >>> >>> While trying out a 5.0-RC5 kernel I seem to have stumbled over a regression >>> with NAT. >>> (using an nftables firewall with NAT and connection tracking). >>> >>> Unfortunately it isn't too obvious since no errors are logged, but on >>> clients it >>> causes symptoms like firefox intermittently not being able to load pages >>> with: >>> Network Protocol Error >>> An error occurred during a connection to www.example.com >>> The page you are trying to view cannot be shown because an error in the >>> network protocol was detected. >>> Please contact the website owners to inform them of this problem. >>> >>> But it's only intermittently, so i can still visit some webpages with >>> clients, >>> could be that packet size and or fragments are at play ? >>> >>> So I tried testing with >>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git with >>> e8c32c32b48c2e889704d8ca0872f92eb027838e as last commit, to be sure to have >>> the latest netdev has to offer, >>> but to no avail. >>> >>> After that I tried to git bisect and ended up with: >>> >>> faec18dbb0405c7d4dda025054511dc3a6696918 is the first bad commit >>> commit faec18dbb0405c7d4dda025054511dc3a6696918 >>> Author: Florian Westphal >>> Date: Thu Dec 13 16:01:33 2018 +0100 >>> >>> netfilter: nat: remove l4proto->manip_pkt >> >> Thanks, this is immensely helpful. >> >> I think I see the bug, we can't use target->dst.protonum in >> nf_nat_l4proto_manip_pkt(), it will be TCP in case we're dealing >> with a related icmp packet. >> >> I will send a patch in a few hours when I get back. > > Sander, does this patch fix things for you? Hi Florian, You may stick on a reported/tested-by if you like. Thanks for the swift fix ! -- Sander > > Thanks! > > diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c > b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c > --- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c > +++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c > @@ -215,6 +215,7 @@ int nf_nat_icmp_reply_translation(struct sk_buff *skb, > > /* Change outer to look like the reply to an incoming packet */ > nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple); > + target.dst.protonum = IPPROTO_ICMP; > if (!nf_nat_ipv4_manip_pkt(skb, 0, &target, manip)) > return 0; > > diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c > b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c > --- a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c > +++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c > @@ -226,6 +226,7 @@ int nf_nat_icmpv6_reply_translation(struct sk_buff *skb, > } > > nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple); > + target.dst.protonum = IPPROTO_ICMPV6; > if (!nf_nat_ipv6_manip_pkt(skb, 0, &target, manip)) > return 0; > >
Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt
L.S., While trying out a 5.0-RC5 kernel I seem to have stumbled over a regression with NAT. (using an nftables firewall with NAT and connection tracking). Unfortunately it isn't too obvious since no errors are logged, but on clients it causes symptoms like firefox intermittently not being able to load pages with: Network Protocol Error An error occurred during a connection to www.example.com The page you are trying to view cannot be shown because an error in the network protocol was detected. Please contact the website owners to inform them of this problem. But it's only intermittently, so i can still visit some webpages with clients, could be that packet size and or fragments are at play ? So I tried testing with git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git with e8c32c32b48c2e889704d8ca0872f92eb027838e as last commit, to be sure to have the latest netdev has to offer, but to no avail. After that I tried to git bisect and ended up with: faec18dbb0405c7d4dda025054511dc3a6696918 is the first bad commit commit faec18dbb0405c7d4dda025054511dc3a6696918 Author: Florian Westphal Date: Thu Dec 13 16:01:33 2018 +0100 netfilter: nat: remove l4proto->manip_pkt This removes the last l4proto indirection, the two callers, the l3proto packet mangling helpers for ipv4 and ipv6, now call the nf_nat_l4proto_manip_pkt() helper. nf_nat_proto_{dccp,tcp,sctp,gre,icmp,icmpv6} are left behind, even though they contain no functionality anymore to not clutter this patch. Next patch will remove the empty files and the nf_nat_l4proto struct. nf_nat_proto_udp.c is renamed to nf_nat_proto.c, as it now contains the other nat manip functionality as well, not just udp and udplite. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso :04 04 22d8706921e03cbd6d78a6ebcc5f253ccfd2bf0c b6f8ab2779215b4495dfe641f50e798da73859ac M include :04 04 af212a756f1acf00cbe45c3be5b71f38f01f1d34 165c440f9e6f2e05738628a19b51f7603f95752a M net Any ideas or debugging hints ? -- Sander
Re: [ANNOUNCE] v4.18.12-rt7 stall
Hi I just tested this kernel and saw the stall output below. I think there is something fishy with the ethernet driver. I had one time where it just locked up on network traffic on issuing "ip a" via serial port on the device. All the problems i see, seem to be related to network traffic via the socfpga-dwmac stmicro/stmmac. Platform is pretty dated Intel/Altera Cortex A9 socfpga. I think this problem is there for a while but since i had problems due to the watchdog i was not able to detect it. Best regards Tim [ 251.440019] INFO: rcu_preempt self-detected stall on CPU [ 251.440036] 1-...!: (21000 ticks this GP) idle=5ae/1/1073741826 softirq=0/0 fqs=0 [ 251.440039] (t=21000 jiffies g=7702 c=7701 q=346) [ 251.440053] rcu_preempt kthread starved for 21000 jiffies! g7702 c7701 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 [ 251.440055] RCU grace-period kthread stack dump: [ 251.440059] rcu_preempt I011 2 0x [ 251.440066] Backtrace: [ 251.440086] [<8062d4b0>] (__schedule) from [<8062da30>] (schedule+0x68/0x128) [ 251.440096] r10:80a1569e r9:87d9a680 r8:80a04100 r7:80055eec r6:87d9a680 r5:80034600 [ 251.440100] r4:80054000 [ 251.440111] [<8062d9c8>] (schedule) from [<806306dc>] (schedule_timeout+0x1cc/0x368) [ 251.440116] r5:80a06488 r4:fffef04c [ 251.440128] [<80630510>] (schedule_timeout) from [<80184fdc>] (rcu_gp_kthread+0x750/0xac0) [ 251.440137] r10:80a1569e r9:80a04100 r8:0001 r7:0003 r6:80a15690 r5:80a1569c [ 251.440140] r4:80a154c0 [ 251.440150] [<8018488c>] (rcu_gp_kthread) from [<801461a8>] (kthread+0x138/0x168) [ 251.440153] r7:80a154c0 [ 251.440163] [<80146070>] (kthread) from [<801010bc>] (ret_from_fork+0x14/0x38) [ 251.440168] Exception stack(0x80055fb0 to 0x80055ff8) [ 251.440174] 5fa0: [ 251.440183] 5fc0: [ 251.440189] 5fe0: 0013 [ 251.440198] r10: r9: r8: r7: r6: r5:80146070 [ 251.440202] r4:8150fac0 r3:80054000 [ 251.440215] NMI backtrace for cpu 1 [ 251.440226] CPU: 1 PID: 157 Comm: RawMeasThread Tainted: GW O 4.18.12-rt7 #1 [ 251.440229] Hardware name: Altera SOCFPGA [ 251.440231] Backtrace: [ 251.440243] [<8010dda4>] (dump_backtrace) from [<8010e09c>] (show_stack+0x20/0x24) [ 251.440250] r7:80a573f8 r6: r5:600d0193 r4:80a573f8 [ 251.440264] [<8010e07c>] (show_stack) from [<80616120>] (dump_stack+0xb0/0xdc) [ 251.440278] [<80616070>] (dump_stack) from [<8061cb74>] (nmi_cpu_backtrace+0xc0/0xc4) [ 251.440286] r9:800d0193 r8:0180 r7:807017c4 r6:0001 r5: r4:0001 [ 251.440296] [<8061cab4>] (nmi_cpu_backtrace) from [<8061ccdc>] (nmi_trigger_cpumask_backtrace+0x164/0x1b0) [ 251.440301] r5:80a0906c r4:8010fa94 [ 251.440312] [<8061cb78>] (nmi_trigger_cpumask_backtrace) from [<8011077c>] (arch_trigger_cpumask_backtrace+0x20/0x24) [ 251.440318] r7:80a154c0 r6:807017bc r5:80a06534 r4:80a154c0 [ 251.440328] [<8011075c>] (arch_trigger_cpumask_backtrace) from [<80187944>] (rcu_dump_cpu_stacks+0xac/0xdc) [ 251.440337] [<80187898>] (rcu_dump_cpu_stacks) from [<801864b0>] (rcu_check_callbacks+0x9e8/0xb08) [ 251.440346] r10:80a06574 r9:80a154c0 r8:80a06528 r7:80a154c0 r6:07439000 r5:87d9edc0 [ 251.440350] r4:80965dc0 r3:6c2a9c31 [ 251.440360] [<80185ac8>] (rcu_check_callbacks) from [<8018e834>] (update_process_times+0x40/0x6c) [ 251.440368] r10:801a3024 r9:87d9b1a0 r8:87d9b000 r7:003a r6:8afdf535 r5:0001 [ 251.440372] r4:871baa00 [ 251.440383] [<8018e7f4>] (update_process_times) from [<801a30ac>] (tick_sched_timer+0x88/0xf4) [ 251.440387] r5:867cffb0 r4:87d9b310 [ 251.440396] [<801a3024>] (tick_sched_timer) from [<8018fc54>] (__hrtimer_run_queues+0x194/0x3e8) [ 251.440403] r7:80a064b0 r6:867ce000 r5:87d9b060 r4:87d9b310 [ 251.440411] [<8018fac0>] (__hrtimer_run_queues) from [<80190648>] (hrtimer_interrupt+0x138/0x2b0) [ 251.440419] r10:87d9b00c r9:87d9b1a0 r8: r7:7fff r6:0003 r5:200d0193 [ 251.440422] r4:87d9b000 [ 251.440432] [<80190510>] (hrtimer_interrupt) from [<8011140c>] (twd_handler+0x40/0x50) [ 251.440441] r10:765b03e0 r9:0010 r8:80a06d3c r7: r6:8001a500 r5:0010 [ 251.440444] r4:0001 [ 251.440454] [<801113cc>] (twd_handler) from [<80178510>] (handle_percpu_devid_irq+0x98/0x2dc) [ 251.440459] r5:0010 r4:81503cc0 [ 251.440472] [<80178478>] (handle_percpu_devid_irq) from [<8017230c>] (generic_handle_irq+0x34/0x44) [ 251.440480] r10:765b03e0 r9:90803100 r8:80009000 r7: r6: r5:0010 [ 251.440484] r4:80965208 r3:80178478 [ 251.440495] [<801722d8>] (generic_handle_irq) from [<801729e0>] (__handle_domain_irq+0x6c/0xc4) [ 251.440505] [<80172974>] (__handle_domain_irq) from [<80102310>] (gic_handle_irq+0x5c/0xa0) [
Re: [Xen-devel] [PATCH] xen/blkfront: When purging persistent grants, keep them in the buffer
On 27/09/18 23:48, Boris Ostrovsky wrote: > On 9/27/18 5:37 PM, Jens Axboe wrote: >> On 9/27/18 2:33 PM, Sander Eikelenboom wrote: >>> On 27/09/18 21:06, Boris Ostrovsky wrote: >>>> On 9/27/18 2:56 PM, Jens Axboe wrote: >>>>> On 9/27/18 12:52 PM, Sander Eikelenboom wrote: >>>>>> On 27/09/18 16:26, Jens Axboe wrote: >>>>>>> On 9/27/18 1:12 AM, Juergen Gross wrote: >>>>>>>> On 22/09/18 21:55, Boris Ostrovsky wrote: >>>>>>>>> Commit a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>>>>>>>> added support for purging persistent grants when they are not in use. >>>>>>>>> As >>>>>>>>> part of the purge, the grants were removed from the grant buffer, This >>>>>>>>> eventually causes the buffer to become empty, with BUG_ON triggered in >>>>>>>>> get_free_grant(). This can be observed even on an idle system, within >>>>>>>>> 20-30 minutes. >>>>>>>>> >>>>>>>>> We should keep the grants in the buffer when purging, and only free >>>>>>>>> the >>>>>>>>> grant ref. >>>>>>>>> >>>>>>>>> Fixes: a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>>>>>>>> Signed-off-by: Boris Ostrovsky >>>>>>>> Reviewed-by: Juergen Gross >>>>>>> Since Konrad is out, I'm going to queue this up for 4.19. >>>>>>> >>>>>> Hi Boris/Juergen. >>>>>> >>>>>> Last week i tested a linux-4.19-rc4 kernel with xen-next and this patch >>>>>> from Boris pulled on top. >>>>>> Unfortunately it made a VM hang (probably because it's rootFS is >>>>>> shuffled from under it's feet >>>> What do you mean by "rootFS is shuffled from under it's feet " ? >>> Assumption that block-front getting borked and either a kernel crash or >>> rootfs becoming mounted readonly. Didn't (try) to check though. >>> >>>>>> and it gave these in dom0 dmesg: >>>>>> >>>>>> [ 9251.696090] xen-blkback: requesting a grant already in use >>>>>> [ 9251.705861] xen-blkback: trying to add a gref that's already in the >>>>>> tree >>>>>> [ 9251.715781] xen-blkback: requesting a grant already in use >>>>>> [ 9251.725756] xen-blkback: trying to add a gref that's already in the >>>>>> tree >>>>>> [ 9251.735698] xen-blkback: requesting a grant already in use >>>>>> [ 9251.745573] xen-blkback: trying to add a gref that's already in the >>>>>> tree >>>>>> >>>>>> The VM was a HVM with 4 vcpu's and 2 phy disks: >>>>>> xen-blkback: backend/vbd/14/768: using 4 queues, protocol 1 (x86_64-abi) >>>>>> persistent grants >>>>>> xen-blkback: backend/vbd/14/832: using 4 queues, protocol 1 (x86_64-abi) >>>>>> persistent grants >>>>>> >>>>>> >>>>>> Currently i have been running 4.19-rc5 with xen-next on top and commit >>>>>> a46b53672b2c reverted, for a couple of days. That seems to run stable >>>>>> for me (since it's a small box so i'm not hit by what a46b53672b2c >>>>>> tried to fix. >>>>>> >>>>>> If you can come up with a debug patch i can give that a spin tomorrow >>>>>> evening or in the weekend, so we are hopefully still in time for the >>>>>> 4.19 release. >>>>> At this late in the game, might make more sense to simply revert the >>>>> buggy commit. Especially since what is currently out there doesn't fix >>>>> the issue for you. >>> Don't know if Boris or Juergen have a hunch about the issue, if not >>> perhaps a revert is the best. >> Anyone? Unless I hear otherwise, I'll revert the series tomorrow. > > Juergen may have something to say by tomorrow, but from my perspective, > given that we are coming up on rc6 --- yes. > > I looked at the patches again and didn't see anything obvious. > > -boris Could also be that what i hit is a latent bug, that is not caused by these patches but merely got uncovered by them. xl dmesg also shows quite some: (XEN) [2018-09-24 03:15:46.847] grant_table.c:1755:d14v0 Expanding d14 grant table from 19 to 20 frames (XEN) [2018-09-24 03:15:46.849] grant_table.c:1755:d14v0 Expanding d14 grant table from 20 to 21 frames (and has done that for ages on my box not leading to any direct problems to my knowledge) I don't know if there could be related and something around the (persistent) grants for block devices could be leaking under some conditions? -- Sander
Re: [Xen-devel] [PATCH] xen/blkfront: When purging persistent grants, keep them in the buffer
On 27/09/18 21:06, Boris Ostrovsky wrote: > On 9/27/18 2:56 PM, Jens Axboe wrote: >> On 9/27/18 12:52 PM, Sander Eikelenboom wrote: >>> On 27/09/18 16:26, Jens Axboe wrote: >>>> On 9/27/18 1:12 AM, Juergen Gross wrote: >>>>> On 22/09/18 21:55, Boris Ostrovsky wrote: >>>>>> Commit a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>>>>> added support for purging persistent grants when they are not in use. As >>>>>> part of the purge, the grants were removed from the grant buffer, This >>>>>> eventually causes the buffer to become empty, with BUG_ON triggered in >>>>>> get_free_grant(). This can be observed even on an idle system, within >>>>>> 20-30 minutes. >>>>>> >>>>>> We should keep the grants in the buffer when purging, and only free the >>>>>> grant ref. >>>>>> >>>>>> Fixes: a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>>>>> Signed-off-by: Boris Ostrovsky >>>>> Reviewed-by: Juergen Gross >>>> Since Konrad is out, I'm going to queue this up for 4.19. >>>> >>> Hi Boris/Juergen. >>> >>> Last week i tested a linux-4.19-rc4 kernel with xen-next and this patch >>> from Boris pulled on top. >>> Unfortunately it made a VM hang (probably because it's rootFS is shuffled >>> from under it's feet > > What do you mean by "rootFS is shuffled from under it's feet " ? Assumption that block-front getting borked and either a kernel crash or rootfs becoming mounted readonly. Didn't (try) to check though. >>> and it gave these in dom0 dmesg: >>> >>> [ 9251.696090] xen-blkback: requesting a grant already in use >>> [ 9251.705861] xen-blkback: trying to add a gref that's already in the tree >>> [ 9251.715781] xen-blkback: requesting a grant already in use >>> [ 9251.725756] xen-blkback: trying to add a gref that's already in the tree >>> [ 9251.735698] xen-blkback: requesting a grant already in use >>> [ 9251.745573] xen-blkback: trying to add a gref that's already in the tree >>> >>> The VM was a HVM with 4 vcpu's and 2 phy disks: >>> xen-blkback: backend/vbd/14/768: using 4 queues, protocol 1 (x86_64-abi) >>> persistent grants >>> xen-blkback: backend/vbd/14/832: using 4 queues, protocol 1 (x86_64-abi) >>> persistent grants >>> >>> >>> Currently i have been running 4.19-rc5 with xen-next on top and commit >>> a46b53672b2c reverted, for a couple of days. That seems to run stable >>> for me (since it's a small box so i'm not hit by what a46b53672b2c >>> tried to fix. >>> >>> If you can come up with a debug patch i can give that a spin tomorrow >>> evening or in the weekend, so we are hopefully still in time for the >>> 4.19 release. >> At this late in the game, might make more sense to simply revert the >> buggy commit. Especially since what is currently out there doesn't fix >> the issue for you. Don't know if Boris or Juergen have a hunch about the issue, if not perhaps a revert is the best. > If decision is to revert then I think the whole series needs to be > reverted. > > -boris > For Boris and Juergen: Would it make sense to have an "xen-next" branch in the xen-tip tree that is: - based on the previous stable kernel - and has the for-linus branches for the upcoming kernel release on top; - and has the pathes for net(-next) and block changes on top (since these don't go via the tree but only via mailing-list patches); (which are scattered, difficult to track and use for automated testing) - and dependency patches for the above if necessary to be able to build. So there is one branch that can be used to test ALL pending kernel related Xen patches and which could be used in OSStest without as many potential false alarms as linux-next will have ? -- Sander
Re: [Xen-devel] [PATCH] xen/blkfront: When purging persistent grants, keep them in the buffer
On 27/09/18 16:26, Jens Axboe wrote: > On 9/27/18 1:12 AM, Juergen Gross wrote: >> On 22/09/18 21:55, Boris Ostrovsky wrote: >>> Commit a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>> added support for purging persistent grants when they are not in use. As >>> part of the purge, the grants were removed from the grant buffer, This >>> eventually causes the buffer to become empty, with BUG_ON triggered in >>> get_free_grant(). This can be observed even on an idle system, within >>> 20-30 minutes. >>> >>> We should keep the grants in the buffer when purging, and only free the >>> grant ref. >>> >>> Fixes: a46b53672b2c ("xen/blkfront: cleanup stale persistent grants") >>> Signed-off-by: Boris Ostrovsky >> >> Reviewed-by: Juergen Gross > > Since Konrad is out, I'm going to queue this up for 4.19. > Hi Boris/Juergen. Last week i tested a linux-4.19-rc4 kernel with xen-next and this patch from Boris pulled on top. Unfortunately it made a VM hang (probably because it's rootFS is shuffled from under it's feet and it gave these in dom0 dmesg: [ 9251.696090] xen-blkback: requesting a grant already in use [ 9251.705861] xen-blkback: trying to add a gref that's already in the tree [ 9251.715781] xen-blkback: requesting a grant already in use [ 9251.725756] xen-blkback: trying to add a gref that's already in the tree [ 9251.735698] xen-blkback: requesting a grant already in use [ 9251.745573] xen-blkback: trying to add a gref that's already in the tree The VM was a HVM with 4 vcpu's and 2 phy disks: xen-blkback: backend/vbd/14/768: using 4 queues, protocol 1 (x86_64-abi) persistent grants xen-blkback: backend/vbd/14/832: using 4 queues, protocol 1 (x86_64-abi) persistent grants Currently i have been running 4.19-rc5 with xen-next on top and commit a46b53672b2c reverted, for a couple of days. That seems to run stable for me (since it's a small box so i'm not hit by what a46b53672b2c tried to fix. If you can come up with a debug patch i can give that a spin tomorrow evening or in the weekend, so we are hopefully still in time for the 4.19 release. -- Sander
Re: Linux 4.16-rc1: regression bisected, Debian kernel package tool make-kpkg stalls indefinitely during kernel build due to commit "kconfig: remove check_stdin()"
On 13/02/18 14:07, Ulf Magnusson wrote: > On Tue, Feb 13, 2018 at 1:35 PM, Ulf Magnusson wrote: >> On Tue, Feb 13, 2018 at 12:33:24PM +0100, Ulf Magnusson wrote: >>> On Tue, Feb 13, 2018 at 11:00:49AM +0100, Sander Eikelenboom wrote: >>>> On 13/02/18 05:09, Masahiro Yamada wrote: >>>>> 2018-02-13 12:00 GMT+09:00 Woody Suwalski : >>>>>> Sander Eikelenboom wrote: >>>>>>> >>>>>>> L.S., >>>>>>> >>>>>>> The Debian kernel-package tool make-kpkg for easy building of upstream >>>>>>> kernels on Debian fails with linux 4.16-rc1. >>>>>>> >>>>>>> The tool (perl script) while invoked with: >>>>>>> make-kpkg --initrd --append_to_version -20180212 kernel_image >>>>>>> >>>>>>> On a git tree with a .config from the previous kernel release, so new >>>>>>> KConfig questions have to be asked on new or changed options. >>>>>>> >>>>>>> The script stalls indefinitely while it seems to be excuting: >>>>>>> exec make kpkg_version=13.018+nmu1 -f >>>>>>> /usr/share/kernel-package/ruleset/minimal.mk debian >>>>>>> APPEND_TO_VERSION=-t440s-20180212 INITRD=YES >>>>>>> >>>>>>> After using ctrl-c to break out it, i get: >>>>>>> ^CFailed to create a ./debian directory: No such file or directory >>>>>>> at >>>>>>> /usr/bin/make-kpkg line 970. >>>>>>> >>>>>>> Bisection turned up as culprit: >>>>>>> commit d2a04648a5dbc3d1d043b35257364f0197d4d868 >>>>>>> kconfig: remove check_stdin() >>>>>>> Except silentoldconfig, valid_stdin is 1, so check_stdin() is >>>>>>> no-op. >>>>>>> oldconfig and silentoldconfig work almost in the same way >>>>>>> except >>>>>>> that >>>>>>> the latter generates additional files under include/. Both ask >>>>>>> users >>>>>>> for input for new symbols. >>>>>>> I do not know why only silentoldconfig requires stdio be tty. >>>>>>> $ rm -f .config; touch .config >>>>>>>$ yes "" | make oldconfig > stdout >>>>>>>$ rm -f .config; touch .config >>>>>>>$ yes "" | make silentoldconfig > stdout >>>>>>>make[1]: *** [silentoldconfig] Error 1 >>>>>>>make: *** [silentoldconfig] Error 2 >>>>>>>$ tail -n 4 stdout >>>>>>>Console input/output is redirected. Run 'make oldconfig' to >>>>>>> update >>>>>>> configuration. >>>>>>> scripts/kconfig/Makefile:40: recipe for target >>>>>>> 'silentoldconfig' failed >>>>>>>Makefile:507: recipe for target 'silentoldconfig' failed >>>>>>> Redirection is useful, for example, for testing where we want >>>>>>> to >>>>>>> give >>>>>>> particular key inputs from a test file, then check the result. >>>>>>> Signed-off-by: Masahiro Yamada >>>>>>> Reviewed-by: Ulf Magnusson >>>>>>> >>>>>>> Reverting this specific commit makes make-kpkg work again as usual. >>>>>>> >>>>>>> Version of the kernel-package used: >>>>>>> ii kernel-package >>>>>>> 13.018+nmu1 >>>>>>> >>>>>>> >>>>>>> I also cc'ed the Debian developer who maintains the kernel-package >>>>>>> package: Manoj Srivastava >>>>>>> >>>>>>> -- >>>>>>> Sander >>>>>>> >>>>>> I have noticed today the same - the kernel-build blockage was in (as I >>>>>> recall) >>>>>> srcipts/kconfig/conf -s --silentoldconfig Kbuild >>>>>> >>>>>> I have bypassed it by regenerating the .config "by hand"... >>>>> >>>>> >>>>> silentold
Re: Linux 4.16-rc1: regression bisected, Debian kernel package tool make-kpkg stalls indefinitely during kernel build due to commit "kconfig: remove check_stdin()"
On 13/02/18 05:09, Masahiro Yamada wrote: > 2018-02-13 12:00 GMT+09:00 Woody Suwalski : >> Sander Eikelenboom wrote: >>> >>> L.S., >>> >>> The Debian kernel-package tool make-kpkg for easy building of upstream >>> kernels on Debian fails with linux 4.16-rc1. >>> >>> The tool (perl script) while invoked with: >>> make-kpkg --initrd --append_to_version -20180212 kernel_image >>> >>> On a git tree with a .config from the previous kernel release, so new >>> KConfig questions have to be asked on new or changed options. >>> >>> The script stalls indefinitely while it seems to be excuting: >>> exec make kpkg_version=13.018+nmu1 -f >>> /usr/share/kernel-package/ruleset/minimal.mk debian >>> APPEND_TO_VERSION=-t440s-20180212 INITRD=YES >>> >>> After using ctrl-c to break out it, i get: >>> ^CFailed to create a ./debian directory: No such file or directory at >>> /usr/bin/make-kpkg line 970. >>> >>> Bisection turned up as culprit: >>> commit d2a04648a5dbc3d1d043b35257364f0197d4d868 >>> kconfig: remove check_stdin() >>> Except silentoldconfig, valid_stdin is 1, so check_stdin() is >>> no-op. >>> oldconfig and silentoldconfig work almost in the same way except >>> that >>> the latter generates additional files under include/. Both ask users >>> for input for new symbols. >>> I do not know why only silentoldconfig requires stdio be tty. >>> $ rm -f .config; touch .config >>>$ yes "" | make oldconfig > stdout >>>$ rm -f .config; touch .config >>>$ yes "" | make silentoldconfig > stdout >>>make[1]: *** [silentoldconfig] Error 1 >>>make: *** [silentoldconfig] Error 2 >>>$ tail -n 4 stdout >>>Console input/output is redirected. Run 'make oldconfig' to update >>> configuration. >>> scripts/kconfig/Makefile:40: recipe for target >>> 'silentoldconfig' failed >>>Makefile:507: recipe for target 'silentoldconfig' failed >>> Redirection is useful, for example, for testing where we want to >>> give >>> particular key inputs from a test file, then check the result. >>> Signed-off-by: Masahiro Yamada >>> Reviewed-by: Ulf Magnusson >>> >>> Reverting this specific commit makes make-kpkg work again as usual. >>> >>> Version of the kernel-package used: >>> ii kernel-package >>> 13.018+nmu1 >>> >>> >>> I also cc'ed the Debian developer who maintains the kernel-package >>> package: Manoj Srivastava >>> >>> -- >>> Sander >>> >> I have noticed today the same - the kernel-build blockage was in (as I >> recall) >> srcipts/kconfig/conf -s --silentoldconfig Kbuild >> >> I have bypassed it by regenerating the .config "by hand"... > > > silentoldconfig asks you values for new symbols. > So, you must answer questions to proceed. I know, but it stalls before asking the questions. > > How does 'make-kpkg' handle silentoldconfig? > > Re-direct stdio, then make it forcibly fail? I don't know, it is a bunch of perl and shell scripts that gets invoked, not the most easy to comprehend if you are not familiar with them. I'm just a user of the tool. So i would have to defer that question to the Debian package maintainer, hopefully he will chime in. -- Sander > > >
Linux 4.16-rc1: regression bisected, Debian kernel package tool make-kpkg stalls indefinitely during kernel build due to commit "kconfig: remove check_stdin()"
L.S., The Debian kernel-package tool make-kpkg for easy building of upstream kernels on Debian fails with linux 4.16-rc1. The tool (perl script) while invoked with: make-kpkg --initrd --append_to_version -20180212 kernel_image On a git tree with a .config from the previous kernel release, so new KConfig questions have to be asked on new or changed options. The script stalls indefinitely while it seems to be excuting: exec make kpkg_version=13.018+nmu1 -f /usr/share/kernel-package/ruleset/minimal.mk debian APPEND_TO_VERSION=-t440s-20180212 INITRD=YES After using ctrl-c to break out it, i get: ^CFailed to create a ./debian directory: No such file or directory at /usr/bin/make-kpkg line 970. Bisection turned up as culprit: commit d2a04648a5dbc3d1d043b35257364f0197d4d868 kconfig: remove check_stdin() Except silentoldconfig, valid_stdin is 1, so check_stdin() is no-op. oldconfig and silentoldconfig work almost in the same way except that the latter generates additional files under include/. Both ask users for input for new symbols. I do not know why only silentoldconfig requires stdio be tty. $ rm -f .config; touch .config $ yes "" | make oldconfig > stdout $ rm -f .config; touch .config $ yes "" | make silentoldconfig > stdout make[1]: *** [silentoldconfig] Error 1 make: *** [silentoldconfig] Error 2 $ tail -n 4 stdout Console input/output is redirected. Run 'make oldconfig' to update configuration. scripts/kconfig/Makefile:40: recipe for target 'silentoldconfig' failed Makefile:507: recipe for target 'silentoldconfig' failed Redirection is useful, for example, for testing where we want to give particular key inputs from a test file, then check the result. Signed-off-by: Masahiro Yamada Reviewed-by: Ulf Magnusson Reverting this specific commit makes make-kpkg work again as usual. Version of the kernel-package used: ii kernel-package 13.018+nmu1 I also cc'ed the Debian developer who maintains the kernel-package package: Manoj Srivastava -- Sander
Linux 4.14-rc6 bisected regression tun devices not working anymore in openvpn
L.S., While testing a linux 4.14-rc6 kernel i noticed OpenVPN didn't function anymore. My openvpn config uses tun devices and is pretty standard. The openvpn version is current Debian stable: openvpn 2.4.0-6+deb9u2 >From the openvpn logging: Sat Oct 28 16:03:34 2017 us=175829 TUN/TAP device opened Sat Oct 28 16:03:34 2017 us=183027 Note: Cannot set tx queue length on : No such device (errno=19) Sat Oct 28 16:03:34 2017 us=183055 do_ifconfig, tt->did_ifconfig_ipv6_setup=0 Sat Oct 28 16:03:34 2017 us=183071 /sbin/ip link set dev up mtu 1500 Cannot find device "" Sat Oct 28 16:03:34 2017 us=200445 Linux ip link set failed: external program exited with error status: 1 Sat Oct 28 16:03:34 2017 us=200482 Exiting due to fatal error Sat Oct 28 16:38:17 2017 us=923381 TCP/UDP: Closing socket Sat Oct 28 16:38:17 2017 us=925986 Closing TUN/TAP interface The offending commit is: 0ad646c81b2182f7fa67ec0c8c825e0ee165696d "tun: call dev_get_valid_name() before register_netdevice()" Reverting this commit fixes the issue for me, it's unfortunate that the commit it self seems to fix an other issue. -- Sander
Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!
On 26/10/17 19:49, Craig Bergstrom wrote: > Sander, thanks for the details, they've been very useful. > > I suspect that your host system's mem=2048M parameter is causing the > problem. Any chance you can confirm by removing the parameter and > running the guest code path? I removed it, but kept the hypervisor limiting dom0 memory to 2046M intact (in grub using the xen bootcmd: "multiboot /xen-4.10.gz dom0_mem=2048M,max:2048M ." Unfortunately that doesn't change anything, the guest still fails to start with the same errors. > More specifically, since you're telling the kernel that it's high > memory address is at 2048M and your device is at 0xfe1fe000 (~4G), the > new mmap() limits are preventing you from mapping addresses that are > explicitly disallowed by the parameter. > Which would probably mean the current patch prohibits hard limiting the dom0 memory to a certain value (below 4G) at least in combination with PCI-passthrough. So the only thing left would be to have no hard memory restriction on dom0 and rely on auto-ballooning, but I'm not a great fan of that. I don't know how KVM handles setting memory limits for the host system, but perhaps it suffers from the same issue. I also tried the patch from one of your last mails to make the check "less strict", but still get the same errors (when using the hard memory limits). -- Sander > > On Thu, Oct 26, 2017 at 10:39 AM, Ingo Molnar wrote: >> >> * Craig Bergstrom wrote: >> >>> Yes, not much time left for 4.14, it might be reasonable to pull the >>> change out since it's causing problems. [...] >> >> Ok, I'll queue up a revert tomorrow morning and send it to Linus ASAP if >> there's >> no good fix by then. In hindsight I should have queued it for v4.15 ... >> >> Thanks, >> >> Ingo
Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!
On 26/10/17 10:12, Sander Eikelenboom wrote: > On 26/10/17 10:05, Sander Eikelenboom wrote: >> On 26/10/17 00:02, Craig Bergstrom wrote: >>> Thanks for the notification, my apologies for the breakage. I'll take a >>> close look and see if I can figure out what went wrong. >>> >>> Sander, any chance you can send /proc/iomem and the inputs to the mmap call >>> that fail on your affected system? >> >> Hi Craig, >> >> The output from /proc/iomem is simple to get and attached. >> The mmap call is probably issued by qemu and will require more digging. > > Ahh grepping qemu gave a pointer, it's probably the code in: > > http://xenbits.xen.org/gitweb/?p=qemu-xen.git;a=blob;f=hw/xen/xen_pt_msi.c;h=ff9a79f5d27ad7d74a1b22297be560feb455063c;hb=5cd7ce5dde3f228b3b669ed9ca432f588947bd40 > > around line 571, that would also explain why it's only this device that > has the problem, since it's the only one trying to use MSI(-X) > interrupts. Will see it i can add some logging to that function. Attached is the qemu debug output with an extra line outputting all stuff used to calculate the arguments used by the mmap-call. -- Sander > -- > Sander > > >> >> I don't know if there is that much time left for 4.14, since we are at >> RC6 already. >> >> -- >> Sander >> >> >>> >>> >>> On Wed, Oct 25, 2017 at 2:50 PM, Boris Ostrovsky >>> wrote: >>> >>>> On 10/23/2017 10:44 PM, Fengguang Wu wrote: >>>>> Greetings, >>>>> >>>>> 0day kernel testing robot got the below dmesg and the first bad commit is >>>>> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >>>> master >>>>> >>>>> commit ce56a86e2ade45d052b3228cdfebe913a1ae7381 >>>>> Author: Craig Bergstrom >>>>> AuthorDate: Thu Oct 19 13:28:56 2017 -0600 >>>>> Commit: Ingo Molnar >>>>> CommitDate: Fri Oct 20 09:48:00 2017 +0200 >>>>> >>>>> x86/mm: Limit mmap() of /dev/mem to valid physical addresses >>>> >>>> Also note >>>> https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg02935.html >>>> >>>> -boris >>>> >>> >> > qemu-system-i386: -serial pty: char device redirected to /dev/pts/16 (label serial0) [00:05.0] xen_pt_realize: Assigning real physical device 08:00.0 to devfn 0x28 [00:05.0] xen_pt_register_regions: IO region 0 registered (size=0x2000 base_addr=0xfe1fe000 type: 0x4) [00:05.0] xen_pt_config_reg_init: Offset 0x000e mismatch! Emulated=0x0080, host=0x, syncing to 0x0080. [00:05.0] xen_pt_config_reg_init: Offset 0x0010 mismatch! Emulated=0x, host=0xfe1fe004, syncing to 0xfe1fe004. [00:05.0] xen_pt_config_reg_init: Offset 0x0052 mismatch! Emulated=0x, host=0x4803, syncing to 0x0003. [00:05.0] xen_pt_config_reg_init: Offset 0x0072 mismatch! Emulated=0x, host=0x0086, syncing to 0x0080. [00:05.0] xen_pt_config_reg_init: Offset 0x00a4 mismatch! Emulated=0x, host=0x8fc0, syncing to 0x8fc0. [00:05.0] xen_pt_config_reg_init: Offset 0x00b2 mismatch! Emulated=0x, host=0x1012, syncing to 0x1012. [00:05.0] xen_pt_msix_init: get MSI-X table BAR base 0xfe1fe000 [00:05.0] xen_pt_msix_init: table_off = 0x1000, total_entries = 8 [00:05.0] xen_pt_msix_init: table_off = 0x1000, total_entries = 8, PCI_MSIX_ENTRY_SIZE = 0x10, msix->table_offset_adjust = 0, msix->table_base = 0xfe1fe000 [00:05.0] xen_pt_msix_init: Error: Can't map physical MSI-X table: Invalid argument [00:05.0] xen_pt_msix_size_init: Error: Internal error: Invalid xen_pt_msix_init. Failed to initialize 12/15, type = 0x1, rc: -22 [00:05.0] xen_pt_msi_set_enable: disabling MSI. *** Error in `/usr/local/lib/xen/bin/qemu-system-i386': corrupted size vs. prev_size: 0x55ce13565570 *** === Backtrace: = /lib/x86_64-linux-gnu/libc.so.6(+0x70bcb)[0x7f700ab7ebcb] /lib/x86_64-linux-gnu/libc.so.6(+0x76f96)[0x7f700ab84f96] /lib/x86_64-linux-gnu/libc.so.6(+0x77388)[0x7f700ab85388] /lib/x86_64-linux-gnu/libc.so.6(+0x78dca)[0x7f700ab86dca] /lib/x86_64-linux-gnu/libc.so.6(__libc_calloc+0x27b)[0x7f700ab89b4b] /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_malloc0+0x21)[0x7f700bbbee61] /usr/local/lib/xen/bin/qemu-system-i386(+0x6d78ee)[0x55ce114298ee] /usr/local/lib/xen/bin/qemu-system-i386(+0x6d309e)[0x55ce1142509e] /usr/local/lib/xen/bin/qemu-system-i386(+0x6d316f)[0x55ce1142516f] /usr/local/lib/xen/bin/qemu-system-i386(+0x24d79b)[0x55ce10f9f79b] /usr/local/lib/xen/bin/qemu-system-i386(+0x6da8bf)[0x55ce1142c8bf] /usr/local/lib/xen/bin/qemu-system-i386(+0x70717c)[0x55ce1145917c] /usr/local/lib/xen/
Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!
On 26/10/17 00:02, Craig Bergstrom wrote: > Thanks for the notification, my apologies for the breakage. I'll take a > close look and see if I can figure out what went wrong. > > Sander, any chance you can send /proc/iomem and the inputs to the mmap call > that fail on your affected system? Hi Craig, The output from /proc/iomem is simple to get and attached. The mmap call is probably issued by qemu and will require more digging. I don't know if there is that much time left for 4.14, since we are at RC6 already. -- Sander > > > On Wed, Oct 25, 2017 at 2:50 PM, Boris Ostrovsky > wrote: > >> On 10/23/2017 10:44 PM, Fengguang Wu wrote: >>> Greetings, >>> >>> 0day kernel testing robot got the below dmesg and the first bad commit is >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >> master >>> >>> commit ce56a86e2ade45d052b3228cdfebe913a1ae7381 >>> Author: Craig Bergstrom >>> AuthorDate: Thu Oct 19 13:28:56 2017 -0600 >>> Commit: Ingo Molnar >>> CommitDate: Fri Oct 20 09:48:00 2017 +0200 >>> >>> x86/mm: Limit mmap() of /dev/mem to valid physical addresses >> >> Also note >> https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg02935.html >> >> -boris >> > -0fff : Reserved 1000-00095fff : System RAM 00096000-000963ff : RAM buffer 00096400-000f : Reserved 000a-000b : PCI Bus :00 000c-000cfdff : Video ROM 000d-000d : PCI Bus :00 000d4800-000d4bff : Adapter ROM 000f-000f : System ROM 0010-7fff : System RAM 0100-01d2a703 : Kernel code 01d2a704-025450ff : Kernel data 02b3f000-02cc1fff : Kernel bss c7f9-c7f9dfff : ACPI Tables c7f9e000-c7fd : ACPI Non-volatile Storage c7fe-c7ff : Reserved c800-dfff : PCI Bus :00 cfe0-cfef : PCI Bus :0c cfef8000-cfefbfff : :0c:00.0 cfef8000-cfefbfff : r8169 cfeff000-cfef : :0c:00.0 cfeff000-cfef : r8169 cff0-cfff : PCI Bus :0d cfff8000-cfffbfff : :0d:00.0 cfff8000-cfffbfff : r8169 c000-cfff : :0d:00.0 c000-cfff : r8169 d000-dfff : PCI Bus :0f d000-dfff : :0f:00.0 d000-d0ff : vesafb e000-efff : PCI MMCONFIG [bus 00-ff] e000-efff : pnp 00:07 f000-febf : PCI Bus :00 f600-f6003fff : Reserved f600-f6003fff : pnp 00:01 fdcf7000-fdcf7fff : :00:12.0 fdcf7000-fdcf7fff : ohci_hcd fdcf8000-fdcfbfff : :00:14.2 fdcfc000-fdcfcfff : :00:13.0 fdcfc000-fdcfcfff : ohci_hcd fdcfd000-fdcfdfff : :00:14.5 fdcfd000-fdcfdfff : ohci_hcd fdcfe000-fdcfefff : :00:16.0 fdcfe000-fdcfefff : ohci_hcd fdcff000-fdcff3ff : :00:11.0 fdcff000-fdcff3ff : ahci fdcff400-fdcff4ff : :00:12.2 fdcff400-fdcff4ff : ehci_hcd fdcff800-fdcff8ff : :00:13.2 fdcff800-fdcff8ff : ehci_hcd fdcffc00-fdcffcff : :00:16.2 fdcffc00-fdcffcff : ehci_hcd fde0-fdef : PCI Bus :04 fdef8000-fdef8fff : :04:00.0 fdef9000-fdef9fff : :04:00.1 fdefa000-fdefafff : :04:00.2 fdefb000-fdefbfff : :04:00.3 fdefc000-fdefcfff : :04:00.4 fdefd000-fdefdfff : :04:00.5 fdefe000-fdefefff : :04:00.6 fdeff000-fdef : :04:00.7 fdf0-fe1f : PCI Bus :05 fdfe-fdff : :05:00.0 fe00-fe1f : PCI Bus :06 fe00-fe0f : PCI Bus :07 fe0e-fe0e : :07:00.0 fe0ff800-fe0f : :07:00.0 fe0ff800-fe0f : ahci fe10-fe1f : PCI Bus :08 fe1fe000-fe1f : :08:00.0 fe20-fe3f : PCI Bus :09 fe20-fe3f : :09:00.0 fe40-fe4f : PCI Bus :0a fe4f8000-fe4f8fff : :0a:00.0 fe4f9000-fe4f9fff : :0a:00.1 fe4fa000-fe4fafff : :0a:00.2 fe4fb000-fe4fbfff : :0a:00.3 fe4fc000-fe4fcfff : :0a:00.4 fe4fd000-fe4fdfff : :0a:00.5 fe4fe000-fe4fefff : :0a:00.6 fe4ff000-fe4f : :0a:00.7 fe50-fe5f : PCI Bus :0b fe5fe000-fe5f : :0b:00.0 fe60-fe6f : PCI Bus :0c fe6e-fe6f : :0c:00.0 fe70-fe7f : PCI Bus :0d fe7e-fe7f : :0d:00.0 fe80-fe8f : PCI Bus :0e fe8fe000-fe8f : :0e:00.0 fe90-fe9f : PCI Bus :0f fe9e-fe9e : :0f:00.0 fe9fc000-fe9f : :0f:00.1 fe9fc000-fe9f : ICH HD audio fec0-fec00fff : Reserved fec0-fec003ff : IOAPIC 0 fec1-fec1001f : pnp 00:06 fec2-fec20fff : Reserved fec2-fec203ff : IOAPIC 1 fed0-fed003ff : HPET 2 fed0-fed003ff : PNP0103:0
Re: ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!
On 26/10/17 10:05, Sander Eikelenboom wrote: > On 26/10/17 00:02, Craig Bergstrom wrote: >> Thanks for the notification, my apologies for the breakage. I'll take a >> close look and see if I can figure out what went wrong. >> >> Sander, any chance you can send /proc/iomem and the inputs to the mmap call >> that fail on your affected system? > > Hi Craig, > > The output from /proc/iomem is simple to get and attached. > The mmap call is probably issued by qemu and will require more digging. Ahh grepping qemu gave a pointer, it's probably the code in: http://xenbits.xen.org/gitweb/?p=qemu-xen.git;a=blob;f=hw/xen/xen_pt_msi.c;h=ff9a79f5d27ad7d74a1b22297be560feb455063c;hb=5cd7ce5dde3f228b3b669ed9ca432f588947bd40 around line 571, that would also explain why it's only this device that has the problem, since it's the only one trying to use MSI(-X) interrupts. Will see it i can add some logging to that function. -- Sander > > I don't know if there is that much time left for 4.14, since we are at > RC6 already. > > -- > Sander > > >> >> >> On Wed, Oct 25, 2017 at 2:50 PM, Boris Ostrovsky >> wrote: >> >>> On 10/23/2017 10:44 PM, Fengguang Wu wrote: >>>> Greetings, >>>> >>>> 0day kernel testing robot got the below dmesg and the first bad commit is >>>> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >>> master >>>> >>>> commit ce56a86e2ade45d052b3228cdfebe913a1ae7381 >>>> Author: Craig Bergstrom >>>> AuthorDate: Thu Oct 19 13:28:56 2017 -0600 >>>> Commit: Ingo Molnar >>>> CommitDate: Fri Oct 20 09:48:00 2017 +0200 >>>> >>>> x86/mm: Limit mmap() of /dev/mem to valid physical addresses >>> >>> Also note >>> https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg02935.html >>> >>> -boris >>> >> >
ptp device strangeness
Hi I am currently using ptp on a Altera/Intel SOC with a dp8640 PHY. PTP functionality seems to be right. But i am doing timestamping with gpio0 and sometimes i loose the sync of the stamping and the events. So i would like to read out all messages. Reading O_NONBLOCK does not work so i tried polling from usermode with the below code: np = poll(&ev, 1, 0); ev.fd=ptpDev; ev.events = POLLIN; if (np>0) { if (ev.revents>0) { std::cout<<"discarded ptp event"<
4.12-RC2 BUG: scheduling while atomic: irq/47-iwlwifi
Hi, I encountered this splat with 4.12-RC2. -- Sander [ 119.021594] BUG: scheduling while atomic: irq/47-iwlwifi/517/0x0200 [ 119.021604] Modules linked in: xt_tcpudp ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_raw ip6table_security ip6table_mangle iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_security iptable_mangle ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables rfcomm bnep binfmt_misc arc4 iTCO_wdt iTCO_vendor_support uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev intel_rapl cdc_mbim iwlmvm x86_pkg_temp_thermal intel_powerclamp mac80211 media cdc_wdm btusb coretemp cdc_ncm kvm_intel usbnet mii cdc_acm iwlwifi kvm btintel joydev pcspkr serio_raw cfg80211 snd_hda_codec_hdmi [ 119.021701] bluetooth lpc_ich snd_hda_codec_realtek snd_hda_codec_generic shpchp sg ecdh_generic snd_hda_intel thinkpad_acpi snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer nvram snd soundcore evdev tpm_tis tpm_tis_core tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel rtsx_pci_sdmmc mmc_core aesni_intel aes_x86_64 crypto_simd cryptd glue_helper psmouse i2c_i801 sd_mod ehci_pci ehci_hcd e1000e rtsx_pci mfd_core ptp xhci_pci pps_core xhci_hcd [ 119.021759] CPU: 1 PID: 517 Comm: irq/47-iwlwifi Not tainted 4.12.0-rc2-t440s-20170522+ #1 [ 119.021763] Hardware name: LENOVO 20AQS03H00/20AQS03H00, BIOS GJET91WW (2.41 ) 09/21/2016 [ 119.021766] Call Trace: [ 119.021778] ? dump_stack+0x5c/0x84 [ 119.021784] ? __schedule_bug+0x4c/0x70 [ 119.021792] ? __schedule+0x496/0x5c0 [ 119.021798] ? schedule+0x2d/0x80 [ 119.021804] ? schedule_preempt_disabled+0x5/0x10 [ 119.021810] ? __mutex_lock.isra.0+0x18e/0x4c0 [ 119.021817] ? __wake_up+0x2f/0x50 [ 119.021833] ? cfg80211_sched_scan_results+0x19/0x60 [cfg80211] [ 119.021844] ? cfg80211_sched_scan_results+0x19/0x60 [cfg80211] [ 119.021859] ? iwl_mvm_rx_lmac_scan_iter_complete_notif+0x17/0x30 [iwlmvm] [ 119.021869] ? iwl_pcie_rx_handle+0x2a9/0x7e0 [iwlwifi] [ 119.021878] ? iwl_pcie_irq_handler+0x17c/0x730 [iwlwifi] [ 119.021884] ? irq_forced_thread_fn+0x60/0x60 [ 119.021887] ? irq_thread_fn+0x16/0x40 [ 119.021892] ? irq_thread+0x109/0x180 [ 119.021896] ? wake_threads_waitq+0x30/0x30 [ 119.021901] ? kthread+0xf2/0x130 [ 119.021905] ? irq_thread_dtor+0x90/0x90 [ 119.021910] ? kthread_create_on_node+0x40/0x40 [ 119.021915] ? ret_from_fork+0x26/0x40
[PATCH] i2c-designware: add i2c gpio recovery option
This patch contains much input from Phil Reid and has been tested on Intel/Altera Cyclone V SOC Hardware with Altera GPIO's for the SCL and SDA GPIO's. I am still a little unsure about the recover in the timeout case (i2c-designware-core.c:770) as i could not test this codepath. Signed-off-by: Tim Sander --- drivers/i2c/busses/i2c-designware-core.c| 14 - drivers/i2c/busses/i2c-designware-core.h| 4 ++ drivers/i2c/busses/i2c-designware-platdrv.c | 90 - 3 files changed, 104 insertions(+), 4 deletions(-) diff --git a/drivers/i2c/busses/i2c-designware-core.c b/drivers/i2c/busses/i2c-designware-core.c index 7a3faa551cf8..f955f29ff8e7 100644 --- a/drivers/i2c/busses/i2c-designware-core.c +++ b/drivers/i2c/busses/i2c-designware-core.c @@ -317,6 +317,7 @@ static void i2c_dw_release_lock(struct dw_i2c_dev *dev) dev->release_lock(dev); } + /** * i2c_dw_init() - initialize the designware i2c master hardware * @dev: device private data @@ -463,7 +464,12 @@ static int i2c_dw_wait_bus_not_busy(struct dw_i2c_dev *dev) while (dw_readl(dev, DW_IC_STATUS) & DW_IC_STATUS_ACTIVITY) { if (timeout <= 0) { dev_warn(dev->dev, "timeout waiting for bus ready\n"); - return -ETIMEDOUT; + i2c_recover_bus(&dev->adapter); + + if (dw_readl(dev, DW_IC_STATUS) & DW_IC_STATUS_ACTIVITY) + return -ETIMEDOUT; + else + return 0; } timeout--; usleep_range(1000, 1100); @@ -719,9 +725,10 @@ static int i2c_dw_handle_tx_abort(struct dw_i2c_dev *dev) for_each_set_bit(i, &abort_source, ARRAY_SIZE(abort_sources)) dev_err(dev->dev, "%s: %s\n", __func__, abort_sources[i]); - if (abort_source & DW_IC_TX_ARB_LOST) + if (abort_source & DW_IC_TX_ARB_LOST) { + i2c_recover_bus(&dev->adapter); return -EAGAIN; - else if (abort_source & DW_IC_TX_ABRT_GCALL_READ) + } else if (abort_source & DW_IC_TX_ABRT_GCALL_READ) return -EINVAL; /* wrong msgs[] data */ else return -EIO; @@ -766,6 +773,7 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num) if (!wait_for_completion_timeout(&dev->cmd_complete, adap->timeout)) { dev_err(dev->dev, "controller timed out\n"); /* i2c_dw_init implicitly disables the adapter */ + i2c_recover_bus(&dev->adapter); i2c_dw_init(dev); ret = -ETIMEDOUT; goto done; diff --git a/drivers/i2c/busses/i2c-designware-core.h b/drivers/i2c/busses/i2c-designware-core.h index d9aaf1790e0e..cedc895a795d 100644 --- a/drivers/i2c/busses/i2c-designware-core.h +++ b/drivers/i2c/busses/i2c-designware-core.h @@ -23,6 +23,7 @@ */ #include +#include #define DW_IC_DEFAULT_FUNCTIONALITY (I2C_FUNC_I2C |\ I2C_FUNC_SMBUS_BYTE | \ @@ -126,6 +127,9 @@ struct dw_i2c_dev { int (*acquire_lock)(struct dw_i2c_dev *dev); void(*release_lock)(struct dw_i2c_dev *dev); boolpm_runtime_disabled; + struct i2c_bus_recovery_info rinfo; + struct gpio_desc *gpio_sda; + struct gpio_desc *gpio_scl; }; #define ACCESS_SWAP0x0001 diff --git a/drivers/i2c/busses/i2c-designware-platdrv.c b/drivers/i2c/busses/i2c-designware-platdrv.c index 79c4b4ea0539..b2d5adc8df2b 100644 --- a/drivers/i2c/busses/i2c-designware-platdrv.c +++ b/drivers/i2c/busses/i2c-designware-platdrv.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include "i2c-designware-core.h" @@ -174,6 +175,88 @@ static void dw_i2c_set_fifo_size(struct dw_i2c_dev *dev, int id) } } +/* + * This routine does i2c bus recovery by using i2c_generic_gpio_recovery + * which is provided by I2C Bus recovery infrastructure. + */ +static void i2c_dw_prepare_recovery(struct i2c_adapter *adap) +{ + struct platform_device *pdev = to_platform_device(&adap->dev); + struct dw_i2c_dev *i_dev = platform_get_drvdata(pdev); + + i2c_dw_disable(i_dev); + reset_control_assert(i_dev->rst); + i2c_dw_plat_prepare_clk(i_dev, false); +} + +void i2c_dw_unprepare_recovery(struct i2c_adapter *adap) +{ + struct platform_device *pdev = to_platform_device(&adap->dev); + struct dw_i2c_dev *i_dev = platform_get_drvdata(pdev); + + i2c_dw_plat_prepare_clk(i_dev, true); + reset_control_deassert(i_dev->rst); + i2c_dw_init(i_dev); +} + + +static int i2c_dw_get_scl(struct i2
DWC2 USB Host Mode Lockup 4.11
Hi I am currently seeing a error with the designware driver on Intel/Altera ARM Cortex A9 Cyclone SOC V Hardware. The USB PHY is a TUSB1210 without a hw reset line connected. The error only occurs on plugging in of the device in host mode. Once the USB device is enumerated i have not seen any errors. Ocassionally i get an error that the USB Device is no longer enumerated. Even a reboot does not help to recover to normal operation. This points IMHO to the PHY as source of problem as all other components are getting a hw reset on reboot. I have not worked with USB on a driver level so my knowledge is a little thin. Nevertheless i tried to pin down the problem. I have added the patch below to the 4.11 kernel. The observation is that when the error has not been hit i see lots of "dwc2: STATUS EINPROGRESS" messages. Which means the bug_on statement i added is not hit on normal operation. The usb hw-schematic looks like this: https://rocketboards.org/foswiki/pub/Documentation/EBVSoCratesEvaluationBoard/SoCrates-Schematic.pdf So my take is that for some reason the communication between PHY and controller is broken in a way that either no request gets send to the PHY or that the PHY is sending no reply. Any idea how i can get this USB port back to normal operation? Attached below is the patch which i added to produce the two output dumps further below. The first output dump is the seldom error case, the second is the success case. Best regards Tim diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c index a73722e27d07..1c18104e432f 100644 --- a/drivers/usb/dwc2/hcd.c +++ b/drivers/usb/dwc2/hcd.c @@ -38,6 +38,8 @@ * This file contains the core HCD code, and implements the Linux hc_driver * API */ +#define DEBUG + #include #include #include @@ -4663,6 +4665,7 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, dwc2_urb->flags = tflags; dwc2_urb->interval = urb->interval; dwc2_urb->status = -EINPROGRESS; + printk("dwc2: STATUS EINPROGRESS\n"); for (i = 0; i < urb->number_of_packets; ++i) dwc2_hcd_urb_set_iso_desc_params(dwc2_urb, i, @@ -4773,6 +4776,7 @@ static int _dwc2_hcd_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, dev_dbg(hsotg->dev, "Called usb_hcd_giveback_urb()\n"); dev_dbg(hsotg->dev, " urb->status = %d\n", urb->status); + BUG_ON(urb->status <0); out: spin_unlock_irqrestore(&hsotg->lock, flags); Here is the output in the error case: [ 11.245681] usbcore: registered new interface driver usbfs [ 11.254272] usbcore: registered new interface driver hub [ 11.262479] usbcore: registered new device driver usb [ 11.346143] dwc2 ffb0.usb: mapped PA ffb0 to VA 9155 [ 11.346236] dwc2 ffb0.usb: Looking up vusb_d-supply from device tree [ 11.346254] dwc2 ffb0.usb: Looking up vusb_d-supply property in node /soc/usb@ffb0 failed [ 11.346273] dwc2 ffb0.usb: ffb0.usb supply vusb_d not found, using dummy regulator [ 11.354882] dwc2 ffb0.usb: Looking up vusb_a-supply from device tree [ 11.354897] dwc2 ffb0.usb: Looking up vusb_a-supply property in node /soc/usb@ffb0 failed [ 11.354909] dwc2 ffb0.usb: ffb0.usb supply vusb_a not found, using dummy regulator [ 11.363660] dwc2 ffb0.usb: registering common handler for irq43 [ 11.363848] dwc2 ffb0.usb: Forcing mode to host [ 11.363868] dwc2 ffb0.usb: Core Release: 2.93a (snpsid=4f54293a) [ 11.363882] dwc2 ffb0.usb: Forcing mode to host [ 11.363909] dwc2 ffb0.usb: DWC OTG HCD INIT [ 11.363921] dwc2 ffb0.usb: hcfg=0200 [ 11.363950] dwc2 ffb0.usb: dwc2_core_init(8481e010) [ 11.363962] dwc2 ffb0.usb: HS ULPI PHY selected [ 11.363974] dwc2 ffb0.usb: Internal DMA Mode [ 11.363987] dwc2 ffb0.usb: host_dma:1 dma_desc_enable:1 [ 11.363998] dwc2 ffb0.usb: Using Descriptor DMA mode [ 11.364010] dwc2 ffb0.usb: Host Mode [ 11.375756] dwc2 ffb0.usb: DWC OTG Controller [ 11.380596] dwc2 ffb0.usb: new USB bus registered, assigned bus number 1 [ 11.387883] dwc2 ffb0.usb: irq 43, io mem 0xffb0 [ 11.393368] dwc2 ffb0.usb: DWC OTG HCD START [ 11.393389] dwc2 ffb0.usb: dwc2_core_host_init(8481e010) [ 11.393403] dwc2 ffb0.usb: Initializing HCFG.FSLSPClkSel to [ 11.393417] dwc2 ffb0.usb: initial grxfsiz=2000 [ 11.393429] dwc2 ffb0.usb: new grxfsiz=0200 [ 11.393441] dwc2 ffb0.usb: initial gnptxfsiz=20002000 [ 11.393454] dwc2 ffb0.usb: new gnptxfsiz=02000200 [ 11.393465] dwc2 ffb0.usb: initial hptxfsiz=20004000 [ 11.393477] dwc2 ffb0.usb: new hptxfsiz=02000400 [ 11.393495] dwc2 ffb0.usb: Init: Port Power? op_state=9 [ 11.393502] dwc2 ffb0.usb: Init: Power Port (0) [ 11.393508] dwc2 ffb0.usb: dwc2_enable_host_interrupts() [ 11.393519] dwc2 ffb0.usb: DWC OTG HCD Has Root Hub [ 11.393979] usb usb1: Ne
Re: RFC: i2c designware gpio recovery
Good Day Phil Am Mittwoch, 3. Mai 2017, 09:30:50 CEST schrieb Phil Reid: > G'day Tim, > > On 1/05/2017 21:31, Tim Sander wrote: > > Good Day Phil > > > > Am Montag, 1. Mai 2017, 09:57:35 CEST schrieb Phil Reid: > >>> So i took a look into the device tree file socfpga.dtsi and found that > >>> the > >>> reset lines where not defined (although available in the corresponding > >>> reset manager). Is there a reason for this? Other components are > >>> connected. > >> > >> There's a few thing like that where the bootloader has been expected to > >> setup the resets etc. > > > > Yes, but if the resets are not connected in the device tree, the linux > > drivers are not going to use them? > > Yes, so they should be added. I don't think we should assume the bootloader > sets things up. But that doesn't seem to have been the assumption with the > Alter SOC's. I will prepare a patch for this. > >>> However with the patch below my previously sent patch works! > >>> > >>> If there is interest in would cleanup the patch and send it in for > >>> mainlining. I think the most unacceptable part would be this line: > >>> + ret = gpio_request_one(bri->scl_gpio, //GPIOF_OPEN_DRAIN | > >>> My gpio drivers refuse to work as output as they have no open drain > >>> mode. > >>> So i wonder how to get this solved in a clean manner. > >> > >> I thought the gpio system would emulate open drain by switching the pin > >> between an input and output driven low in this case. How are you > >> configuring the GPIO's in the FPGA? > > > > I don't switch to GPIO mode. As the I2C logic is only pulling active low, > > i only do a wired and with the gpio (implemented in the fpga) port output > > on the output enable line for the SCL output. SDA is only an additional > > input for the second in fpga gpio port. > > > > A picture should make it a clearer: > > > > gpio scl--\ > > i2c scl --&---i2c mode output pin (configured as fpga loan) > > > > In my case the original i2c pins where occupied by some other logic > > conflicting so the i2c pins had to be shifted to some other pins using > > fpga logic. So it was just a matter of adding a two port gpio port. > > I think I understand. What soft core gpio controller are you using? I am using the standard altera fpga gpios. > >> Given a couple of days I can test this on some flack i2c hardware I have > >> with a Cyclone-V SOC. I'm interested in the functionality as well. > > > > Sounds good. If you need some further input how i have configured the fpga > > drop me a line. > > > >> For i2c that are connected to the dedicated HPS pins it should be > >> possible > >> to reconfigure the pin mux controller (see system manager) in the HPS to > >> avoid the need to go thru the fpga to get direct control. The docs say > >> this > >> is "unsupport" but I've done some test and it does seem to work. > > > > As far as i know the internal jtag chain is only used in the bootloader > > and there is no linux driver? But it shouldn't be a too big problem to > > port it to linux. > > > > What i am unsure about is the fact that the internal jtag chain which > > controls the pinmuxing might wreak havoc on other pin states if you > > reconfigure it? > Have a look at the Cyclone V handbook "pin mux control Group REgister > Descriptions" From what I can see the chain is used to configure IO > standards and drive strength. But not the actual muxes Mh, there is not much to see in Volume 3. Just one paragraph and then a very encouraging closing line: "Do not modify the pin multiplexing selection registers after I/O configuration." I find the following lines in my favorite bootloader a little more enlightening: The following function: https://git.pengutronix.de/cgit/barebox/tree/arch/arm/mach-socfpga/system-manager.c get feed with data from e.g.: https://git.pengutronix.de/cgit/barebox/tree/arch/arm/boards/terasic-de0-nano-soc/pinmux_config.c which doesn't look like beeing very memory mapped? > >> I'm guess > >> the no support is in a similar vain to the emac ptp FPGA interface > >> couldn't > >> be used when the HPS pin where used. But that got changed when the user's > >> proved otherwise. There's just no pin ctrl driver yet to manage it. > > > > I am interested in this ptp solution too. Is there anything on the way to > > mainline? > This was working the last time I tried it. I submitted a couple of minor > patches for it a while ago. My hardware has a DSA switch attached to the > ethernet port and so far I haven't figured out how to enable ptp when using > the virtual lan ports on the DSA. But it worked fine when directly > connected to a phy. Thanks, will take a look. Best regards Tim
Re: RFC: i2c designware gpio recovery
Good Day Phil Am Montag, 1. Mai 2017, 09:57:35 CEST schrieb Phil Reid: > > So i took a look into the device tree file socfpga.dtsi and found that the > > reset lines where not defined (although available in the corresponding > > reset manager). Is there a reason for this? Other components are > > connected. > > There's a few thing like that where the bootloader has been expected to > setup the resets etc. Yes, but if the resets are not connected in the device tree, the linux drivers are not going to use them? > > However with the patch below my previously sent patch works! > > > > If there is interest in would cleanup the patch and send it in for > > mainlining. I think the most unacceptable part would be this line: > > + ret = gpio_request_one(bri->scl_gpio, //GPIOF_OPEN_DRAIN | > > My gpio drivers refuse to work as output as they have no open drain mode. > > So i wonder how to get this solved in a clean manner. > > I thought the gpio system would emulate open drain by switching the pin > between an input and output driven low in this case. How are you > configuring the GPIO's in the FPGA? I don't switch to GPIO mode. As the I2C logic is only pulling active low, i only do a wired and with the gpio (implemented in the fpga) port output on the output enable line for the SCL output. SDA is only an additional input for the second in fpga gpio port. A picture should make it a clearer: gpio scl--\ i2c scl --&---i2c mode output pin (configured as fpga loan) In my case the original i2c pins where occupied by some other logic conflicting so the i2c pins had to be shifted to some other pins using fpga logic. So it was just a matter of adding a two port gpio port. > Given a couple of days I can test this on some flack i2c hardware I have > with a Cyclone-V SOC. I'm interested in the functionality as well. Sounds good. If you need some further input how i have configured the fpga drop me a line. > For i2c that are connected to the dedicated HPS pins it should be possible > to reconfigure the pin mux controller (see system manager) in the HPS to > avoid the need to go thru the fpga to get direct control. The docs say this > is "unsupport" but I've done some test and it does seem to work. As far as i know the internal jtag chain is only used in the bootloader and there is no linux driver? But it shouldn't be a too big problem to port it to linux. What i am unsure about is the fact that the internal jtag chain which controls the pinmuxing might wreak havoc on other pin states if you reconfigure it? > I'm guess > the no support is in a similar vain to the emac ptp FPGA interface couldn't > be used when the HPS pin where used. But that got changed when the user's > proved otherwise. There's just no pin ctrl driver yet to manage it. I am interested in this ptp solution too. Is there anything on the way to mainline? Best regards Tim
Re: RFC: i2c designware gpio recovery
Hi After sending this mail i just found out how i could reset the i2c-1 controller manually with devmem 0xffd05014 32 0x2000 devmem 0xffd05014 32 0 So i took a look into the device tree file socfpga.dtsi and found that the reset lines where not defined (although available in the corresponding reset manager). Is there a reason for this? Other components are connected. However with the patch below my previously sent patch works! If there is interest in would cleanup the patch and send it in for mainlining. I think the most unacceptable part would be this line: + ret = gpio_request_one(bri->scl_gpio, //GPIOF_OPEN_DRAIN | My gpio drivers refuse to work as output as they have no open drain mode. So i wonder how to get this solved in a clean manner. Best regards Tim --- arch/arm/boot/dts/socfpga.dtsi | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm/boot/dts/socfpga.dtsi b/arch/arm/boot/dts/socfpga.dtsi index 2c43c4d85dee..5f28632bc88c 100644 --- a/arch/arm/boot/dts/socfpga.dtsi +++ b/arch/arm/boot/dts/socfpga.dtsi @@ -643,6 +643,7 @@ #size-cells = <0>; compatible = "snps,designware-i2c"; reg = <0xffc04000 0x1000>; + resets = <&rst I2C0_RESET>; clocks = <&l4_sp_clk>; interrupts = <0 158 0x4>; status = "disabled"; @@ -653,6 +654,7 @@ #size-cells = <0>; compatible = "snps,designware-i2c"; reg = <0xffc05000 0x1000>; + resets = <&rst I2C1_RESET>; clocks = <&l4_sp_clk>; interrupts = <0 159 0x4>; status = "disabled"; @@ -663,6 +665,7 @@ #size-cells = <0>; compatible = "snps,designware-i2c"; reg = <0xffc06000 0x1000>; + resets = <&rst I2C2_RESET>; clocks = <&l4_sp_clk>; interrupts = <0 160 0x4>; status = "disabled"; @@ -673,6 +676,7 @@ #size-cells = <0>; compatible = "snps,designware-i2c"; reg = <0xffc07000 0x1000>; + resets = <&rst I2C3_RESET>; clocks = <&l4_sp_clk>; interrupts = <0 161 0x4>; status = "disabled"; -- 2.7.4
RFC: i2c designware gpio recovery
Hi I have tried to add a gpio recovery gpio controller to the designware i2c driver. The attempt is attached below. I have a Intel(Altera) Cyclone V SOC Platform attached to a buggy power supply which gives a lockup on the i2c controller as a external device gives to much noise on the signal and destroys a clock signal on its way to a i2c device. I don't care to much about this buggy power supply but as the cable to one i2c-slave is rather long i fear that power surge conformance tests might give also some problems. So i would like to be safe than sorry and recover from this problem. I have created two gpio ports in fpga and have routed the designware pins through the fpga. I can now read SDA input status and control SCL via these gpios. The recovery gets triggered and after that i get lots of: i2c_designware ffc05000.i2c: controller timed out so i guess that my i2c_dw_unprepare_recovery does not enought to get the controller back. I have also noticed that there does not seem do be a reset controller in the standard configuration. so reset_control_(de)assert(i_dev->rst) seems to do nothing. I have verified that the recovery of the bus works and if i do a warm reboot the i2c-bus is working again. Which it doesn't without recovery. So i am pretty sure that the recovery works as far as the i2c-slave is not pulling down SDA and that my gpio pins are in the correct state that they would not interfere with the i2c-operation of the controller. Any ideas what i can do to get the controller back up running with some special treatment in i2c_dw_(un)prepare_recovery without having to resort to a warm reboot? Best regards Tim --- drivers/i2c/busses/i2c-designware-core.c| 15 ++-- drivers/i2c/busses/i2c-designware-core.h| 1 + drivers/i2c/busses/i2c-designware-platdrv.c | 60 - drivers/i2c/i2c-core.c | 10 - 4 files changed, 78 insertions(+), 8 deletions(-) diff --git a/drivers/i2c/busses/i2c-designware-core.c b/drivers/i2c/busses/i2c-designware-core.c index 7a3faa551cf8..b98fab40ce9a 100644 --- a/drivers/i2c/busses/i2c-designware-core.c +++ b/drivers/i2c/busses/i2c-designware-core.c @@ -317,6 +317,7 @@ static void i2c_dw_release_lock(struct dw_i2c_dev *dev) dev->release_lock(dev); } + /** * i2c_dw_init() - initialize the designware i2c master hardware * @dev: device private data @@ -463,7 +464,11 @@ static int i2c_dw_wait_bus_not_busy(struct dw_i2c_dev *dev) while (dw_readl(dev, DW_IC_STATUS) & DW_IC_STATUS_ACTIVITY) { if (timeout <= 0) { dev_warn(dev->dev, "timeout waiting for bus ready\n"); - return -ETIMEDOUT; + i2c_recover_bus(&dev->adapter); + + if (dw_readl(dev, DW_IC_STATUS) & DW_IC_STATUS_ACTIVITY) + return -EIO; + else return 0; } timeout--; usleep_range(1000, 1100); @@ -719,9 +724,10 @@ static int i2c_dw_handle_tx_abort(struct dw_i2c_dev *dev) for_each_set_bit(i, &abort_source, ARRAY_SIZE(abort_sources)) dev_err(dev->dev, "%s: %s\n", __func__, abort_sources[i]); - if (abort_source & DW_IC_TX_ARB_LOST) + if (abort_source & DW_IC_TX_ARB_LOST) { + i2c_recover_bus(&dev->adapter); return -EAGAIN; - else if (abort_source & DW_IC_TX_ABRT_GCALL_READ) + } else if (abort_source & DW_IC_TX_ABRT_GCALL_READ) return -EINVAL; /* wrong msgs[] data */ else return -EIO; @@ -766,6 +772,7 @@ i2c_dw_xfer(struct i2c_adapter *adap, struct i2c_msg msgs[], int num) if (!wait_for_completion_timeout(&dev->cmd_complete, adap->timeout)) { dev_err(dev->dev, "controller timed out\n"); /* i2c_dw_init implicitly disables the adapter */ + //i2c_recover_bus(&dev->adapter); i2c_dw_init(dev); ret = -ETIMEDOUT; goto done; @@ -825,7 +832,7 @@ static const struct i2c_algorithm i2c_dw_algo = { .functionality = i2c_dw_func, }; -static u32 i2c_dw_read_clear_intrbits(struct dw_i2c_dev *dev) +u32 i2c_dw_read_clear_intrbits(struct dw_i2c_dev *dev) { u32 stat; diff --git a/drivers/i2c/busses/i2c-designware-core.h b/drivers/i2c/busses/i2c-designware-core.h index d9aaf1790e0e..8bdf51e19f21 100644 --- a/drivers/i2c/busses/i2c-designware-core.h +++ b/drivers/i2c/busses/i2c-designware-core.h @@ -126,6 +126,7 @@ struct dw_i2c_dev { int (*acquire_lock)(struct dw_i2c_dev *dev); void(*release_lock)(struct dw_i2c_dev *dev); boolpm_runtime_disabled; + struct i2c_bus_recovery_info rinfo; }; #define ACCESS_SWAP0x0001 diff --git a/drivers/i2c/busses/i2c-designware-platdrv.c b/dri
4.11-rc6 and OF_DYNAMIC
Hi I have been testing the 4.11-rc6 kernel on Intel(ex Altera) Arm SOC Cyclone with dynamic Firmware loading. As i didn't know how to trigger dynamic loading from userspace i also applied the following patch: OF: DT-Overlay configfs interface https://github.com/raspberrypi/linux/commit/8f1079750ce2fce4c4c2b4f8759ea57c8fb167d3 Now i got the loading of the devicetree overlay working :-)... but only if i enable OF_UNITTEST. I find this a little strange especially as it slows boot time. I also tried adding: select OF_DYNAMIC to my OF_CONFIGFS patch but this didn't help. Also patching away "if OF_UNITTEST" config OF_DYNAMIC - bool "Support for dynamic device trees" if OF_UNITTEST + bool "Support for dynamic device trees" didn't help? Besides these nitpicks i am really happy to see this in mainline as it really makes working with dynamic fpga based hardware much cleaner! Best regards Tim
Re: [PATCH] xen/x86: Initialize per_cpu(xen_vcpu, 0) a little earlier
On 2016-10-03 00:45, Boris Ostrovsky wrote: xen_cpuhp_setup() calls mutex_lock() which, when CONFIG_DEBUG_MUTEXES is defined, ends up calling xen_save_fl(). That routine expects per_cpu(xen_vcpu, 0) to be already initialized. Signed-off-by: Boris Ostrovsky Reported-by: Sander Eikelenboom --- Sander, please see if this fixes the problem. Thanks. Hi Boris, I have tested it and it fixes the dom0 crash in early boot for me. Thanks again for investigating and the swift fix ! -- Sander arch/x86/xen/enlighten.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 366b6ae..96c2dea 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1644,7 +1644,6 @@ asmlinkage __visible void __init xen_start_kernel(void) xen_initial_gdt = &per_cpu(gdt_page, 0); xen_smp_init(); - WARN_ON(xen_cpuhp_setup()); #ifdef CONFIG_ACPI_NUMA /* @@ -1658,6 +1657,8 @@ asmlinkage __visible void __init xen_start_kernel(void) possible map and a non-dummy shared_info. */ per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0]; + WARN_ON(xen_cpuhp_setup()); + local_irq_disable(); early_boot_irqs_disabled = true;
Re: [Intel-gfx] Linux 4.8-rc?: WARNING: at drivers/gpu/drm/i915/intel_pm.c:7866 sandybridge_pcode_write Missing switch case (16) in gen6_check_mailbox_status
On 2016-09-07 16:49, Jani Nikula wrote: On Tue, 06 Sep 2016, li...@eikelenboom.it wrote: On 2016-09-06 11:25, Jani Nikula wrote: On Tue, 06 Sep 2016, li...@eikelenboom.it wrote: L.S., Since one of the last 4.8 RC's i'm getting the warning below when booting on my sandybridge based thinkpad. From what it seems the machine still works fine though. What does 'lspci -nns 2' say for you? 00:02.0 VGA compatible controller [0300]: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0126] (rev 09) Fixed in drm-intel-fixes by commit fc2780b66b15092ac68272644a522c1624c48547 Author: Chris Wilson Date: Fri Aug 26 11:59:26 2016 +0100 drm/i915: Add GEN7_PCODE_MIN_FREQ_TABLE_GT_RATIO_OUT_OF_RANGE to SNB BR, Jani. Works-for-me, thx! -- Sander
Re: [Linux 4.8-rc1 Bisected] Clock on boot Xen HVM guest starts at 31/12/1999
Friday, August 12, 2016, 7:29:37 PM, you wrote: > Hi, > On 12/08/2016 at 19:23:36 +0200, Sander Eikelenboom wrote : >> L.S., >> >> I'm seeing an issue when using a Linux 4.8-rc1 kernel in a Xen HVM guest (PV >> guests and dom0 are uneffected). The clock is always set to 31/12/1999 on >> boot >> of the guest, instead of the system clock time. >> >> Bisecting seems to point out commit: >> 463a86304cae92e10277b47180ac59cf93982e5b char/genrtc: x86: remove remnants >> of asm/rtc.h >> > Isn't that solved by http://patchwork.ozlabs.org/patch/657465/ ? Ah yes that solves it (i only looked in your git-tree to see if there was a patch already), sorry for the noise ! -- Sander
[Linux 4.8-rc1 Bisected] Clock on boot Xen HVM guest starts at 31/12/1999
L.S., I'm seeing an issue when using a Linux 4.8-rc1 kernel in a Xen HVM guest (PV guests and dom0 are uneffected). The clock is always set to 31/12/1999 on boot of the guest, instead of the system clock time. Bisecting seems to point out commit: 463a86304cae92e10277b47180ac59cf93982e5b char/genrtc: x86: remove remnants of asm/rtc.h -- Sander
Re: [PATCH v2] dts: add specific compatible type for Terasic DE0-NANO-SoC Board
Hi Dinh On Thursday 25 February 2016 10:56:28 Dinh Nguyen wrote: > On 02/25/2016 04:38 AM, Steffen Trumtrar wrote: > > Hi Tim! > > > > On Thu, Feb 25, 2016 at 11:05:05AM +0100, Tim Sander wrote: > >> From: Tim Sander > >> > >> Add a more specific compatible string:"terasic,de0-nano-soc" for > >> respective board. Background: when checking for bootspec entries, some > >> board specific fixups are not apropriate for board of the same platform > >> ("altr,socfpga-cyclone5"). The same aproach is taken with the > >> EBV-Socrates board. > >> > >> Signed-off-by: Tim Sander > >> --- > >> > >> Documentation/devicetree/bindings/vendor-prefixes.txt | 1 + > >> arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts | 2 +- > >> 2 files changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt > >> b/Documentation/devicetree/bindings/vendor-prefixes.txt index > >> 72e2c5a..d1f7803 100644 > >> --- a/Documentation/devicetree/bindings/vendor-prefixes.txt > >> +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt > >> @@ -230,6 +230,7 @@ synology Synology, Inc. > >> > >> tbs TBS Technologies > >> tcl Toby Churchill Ltd. > >> technologic Technologic Systems > >> > >> +terasic Terasic Inc. > >> > >> thine THine Electronics, Inc. > >> tiTexas Instruments > >> tlm Trusted Logic Mobility > > > > You should IMHO split this up in two patches. > > First patch: add terasic > > That's right. That patch will go through the DTS maintainer's tree. Ah well for such a simple patch it turns out more complicated than thought :-) Will do as soon as there is agreement on a name which does not seem that easy... > > >> diff --git a/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts > >> b/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts index > >> afea364..704aa9d 100644 > >> --- a/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts > >> +++ b/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts > >> @@ -18,7 +18,7 @@ > >> > >> / { > >> > >>model = "Terasic DE-0(Atlas)"; > >> > >> - compatible = "altr,socfpga-cyclone5", "altr,socfpga"; > >> + compatible = "terasic,de0-nano-soc","altr,socfpga-cyclone5", > >> "altr,socfpga"; > So perhaps, "terasic,de0-sockit"? > > > Second patch: this. > > And I can take this one. > > >>chosen { > >> > >>bootargs = "earlyprintk"; > > > > The naming of this board still confuses me though. > > > > It has 3 different names now: > > - de0_sockit.dts > > - Terasic DE-0(Atlas) > > - de0-nano-soc > > > > And according to Terasic DE0-Nano-SoC is the same as Atlas-SoC with a > > different software?! So all three names are actually correct ?! Weird. > > I had a hard time understanding this myself. But from what I gather > from[1], I just name the file de0_sockit. As far as i remember there are different de0 and different sockit boards, so the name does not seem to be as concise? I don't care but i would say that de0-nano-soc is the most concise and easier to search for than atlas which might turn up more false postives? But as long as there is a more selective name than cyclone5 everthing is fine with me. Best regards Tim
[PATCH v2] dts: add specific compatible type for Terasic DE0-NANO-SoC Board
From: Tim Sander Add a more specific compatible string:"terasic,de0-nano-soc" for respective board. Background: when checking for bootspec entries, some board specific fixups are not apropriate for board of the same platform ("altr,socfpga-cyclone5"). The same aproach is taken with the EBV-Socrates board. Signed-off-by: Tim Sander --- Documentation/devicetree/bindings/vendor-prefixes.txt | 1 + arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index 72e2c5a..d1f7803 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -230,6 +230,7 @@ synologySynology, Inc. tbsTBS Technologies tclToby Churchill Ltd. technologicTechnologic Systems +terasicTerasic Inc. thine THine Electronics, Inc. ti Texas Instruments tlmTrusted Logic Mobility diff --git a/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts b/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts index afea364..704aa9d 100644 --- a/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts +++ b/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts @@ -18,7 +18,7 @@ / { model = "Terasic DE-0(Atlas)"; - compatible = "altr,socfpga-cyclone5", "altr,socfpga"; + compatible = "terasic,de0-nano-soc","altr,socfpga-cyclone5", "altr,socfpga"; chosen { bootargs = "earlyprintk"; -- 1.9.1
Bisect results for 4.4.1-rt[4,5]
Hi Sebastian Am Freitag, 12. Februar 2016, 10:07:59 schrieben Sie: ... > What about rt4? It is only the stable update so you should see here the > numbers from rt3. If that is true and your numbers are stable it should > be easy to run git bisect between rt4 and rt5. And looking at > https://git.kernel.org/rt/linux-rt-devel/h/v4.4.1-rt5 > the only non-cosmetic change in -rt5 that should affect you is the > migrate-disable fixup from Mike. I have done a bisect run, its a rather innocent looking on liner which seems to cause the problems. The numbers where reasonably stable so i am pretty confident that this is the patch giving ~26µs additional latency on the Altera SOC plattform: eec2bf477ac674583a7d73b9d00f47c528b7266d is the first bad commit commit eec2bf477ac674583a7d73b9d00f47c528b7266d Author: Sebastian Andrzej Siewior Date: Thu Feb 4 16:38:10 2016 +0100 kernel/perf: mark perf_cpu_context's timer as irqsafe Otherwise we get a WARN_ON() backtrace and some events are reported as "not counted". Cc: stable...@vger.kernel.org Reported-by: Yang Shi Signed-off-by: Sebastian Andrzej Siewior Here are the numbers of the bisect run for reference: ==> g0dd3bdd <== # Total: 1 09829 # Min Latencies: 00010 00010 # Avg Latencies: 00020 00021 # Max Latencies: 00084 00101 # Histogram Overflows: 0 0 ==> gbbc7819 <== # Total: 1 09798 # Min Latencies: 00010 00010 # Avg Latencies: 00021 00021 # Max Latencies: 00086 00091 # Histogram Overflows: 0 0 ==> geec2bf4 <== # Total: 08713 1 # Min Latencies: 00010 00010 # Avg Latencies: 00020 00021 # Max Latencies: 00113 00070 # Histogram Overflows: 0 0 Best Regards Tim
Re: [ANNOUNCE] 4.1.5-rt5 meant to reply to 4.4.1-rt5
Hi Sebastian As you got correctly i was talking about 4.4.1-rt5 and not 4.1 i replied to by accident. Am Freitag, 12. Februar 2016, 10:07:59 schrieb Sebastian Andrzej Siewior: > On 02/12/2016 09:28 AM, Tim Sander wrote: > > Hi Sebastian > > Hi Tim, > > > Am Sonntag, 16. August 2015, 15:56:30 schrieb Sebastian Andrzej Siewior: > >> I'm pleased to announce the v4.1.5-rt5 patch set. > > > > I have just tested it with a Altera SoC ARM v7. The latencies seem to have > > gotten a little bit worse with each release. The first core has always > > been > > worse (presumably due to interrupt load) but now it dropped to 111µs (rt5) > > from 76µs(rt3) and 54µs(rt2). > > in -rt2 we had bug in migrate disable code which means each task was > running on CPU0. This got partly fixed in -rt3. In -rt3 the scheduler > could assign a task to CPU1 but the task should stay there for ever. > This little detail was fixed in -rt5. > This is one thing that comes to mind. > Lazy-preempt should have been fixed in -rt3, too. This should not give > you higher latencies but higher throughput. > > What about rt4? It is only the stable update so you should see here the > numbers from rt3. If that is true and your numbers are stable it should > be easy to run git bisect between rt4 and rt5. And looking at > https://git.kernel.org/rt/linux-rt-devel/h/v4.4.1-rt5 > the only non-cosmetic change in -rt5 that should affect you is the > migrate-disable fixup from Mike. Ok, each run takes a couple of hours so bisecting should take quite some time but i will give it a try. I started a test with 4.4.1-rt4, if the numbers are within the 70µs ballpark bisecting seems the way to go. If the numbers are higher i suspect that stable update might have a play here. But we will see. Best regards Tim
[PATCH] dts: add specific compatible type for Terasic DE0-NANO-SoC Board
From: Tim Sander Add a more specific compatible string:"terasic,de0-nano-soc" for respective board. Background: when checking for bootspec entries, some board specific fixups are not apropriate for board of the same platform ("altr,socfpga-cyclone5"). The same aproach is taken with the EBV-Socrates board. Signed-off-by: Tim Sander --- arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts b/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts index 555e9caf21e1..3a427423168e 100644 --- a/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts +++ b/arch/arm/boot/dts/socfpga_cyclone5_de0_sockit.dts @@ -18,7 +18,7 @@ / { model = "Terasic DE-0(Atlas)"; - compatible = "altr,socfpga-cyclone5", "altr,socfpga"; + compatible = "terasic,de0-nano-soc"," altr,socfpga-cyclone5", "altr,socfpga"; chosen { bootargs = "earlyprintk"; -- 1.9.1
Re: [ANNOUNCE] 4.1.5-rt5
00 000310 00 00 000311 00 00 000312 00 00 000313 00 00 000314 00 00 000315 00 00 000316 00 00 000317 00 00 000318 00 00 000319 00 00 000320 00 00 000321 00 00 000322 00 00 000323 00 00 000324 00 00 000325 00 00 000326 00 00 000327 00 00 000328 00 00 000329 00 00 000330 00 00 000331 00 00 000332 00 00 000333 00 00 000334 00 00 000335 00 00 000336 00 00 000337 00 00 000338 00 00 000339 00 00 000340 00 00 000341 00 00 000342 00 00 000343 00 00 000344 00 00 000345 00 00 000346 00 00 000347 00 00 000348 00 00 000349 00 00 000350 00 00 000351 00 00 000352 00 00 000353 00 00 000354 00 00 000355 00 00 000356 00 00 000357 00 00 000358 00 00 000359 00 00 000360 00 00 000361 00 00 000362 00 00 000363 00 00 000364 00 00 000365 00 00 000366 00 00 000367 00 00 000368 00 00 000369 00 00 000370 00 00 000371 00 00 000372 00 00 000373 00 00 000374 00 00 000375 00 00 000376 00 00 000377 00 00 000378 00 00 000379 00 00 000380 00 00 000381 00 00 000382 00 00 000383 00 00 000384 00 00 000385 00 00 000386 00 00 000387 00 00 000388 00 00 000389 00 00 000390 00 00 000391 00 00 000392 00 00 000393 00 00 000394 00 00 000395 00 00 000396 00 00 000397 00 00 000398 00 00 000399 00 00 # Total: 10 99647 # Min Latencies: 9 9 # Avg Latencies: 00017 00010 # Max Latencies: 00054 00037 # Histogram Overflows: 0 0 # Histogram Overflow at cycle number: # Thread 0: # Thread 1: sander@dabox:~/work/cp52-firmware$ cat 4.4-rt2_latency_1000_hackbench.txt #cyclictest -l10 -m -Sp99 -i200 -h400 -q # /dev/cpu_dma_latency set to 0us # Histogram 00 00 00 01 00 00 02 00 00 03 00 00 04 00 00 05 00 00 06 00 00 07 00 00 08 00 00 09 05 007986 10 231454 534988149 11 272449 365823273 12 187411 19724532 13 887425 7283747 14 5033801 5758908 15 29213484 10969358 16 13701430016483272 17 27971830311341821 18 2668805387889248 19 1555086375564401 20 63667078 3308210 21 25975207 2639064 22 14948747 2684771 23 9837945 2829949 24 5699296 1889347 25 2868412 636802 26 1257029 137374 27 477219 028261 28 177551 006650 29 075267 002358 30 036509 001261 31 018444 000578 32 008159 000228 33 003103 64 34 001208 22 35 000530 11 36 000259 01 37 000129 01 38 48 00 39 28 00 40 18 00 41 03 00 42 00 00 43 00 00 44 00 00 45 01 00 46 00 00 47 01 00 48 00 00 49 00 00 50 01 00 51 00 00 52 00 00 53 00 00 54 01 00 55 00 00 56 00 00 57 00 00 58 00 00 59 00 00 60 00 00 61 00 00 62 00 00 63 00 00 64 00 00 65 00 00 66 00 00 67 00 00 68 00 00 69 00 00 70 00 00 71 00 00 72 00 00 73 00 00 74 00 00 75 00 00 76 00 00 77 00 00 78 00 00 79 00 00 80 00 00 81 00 00 82 00 00 83 00 00 84 00 00 85 00 00 86 00 00 87 00 00 88 00 00 89 00 00 90 00 00 91 00 00 92 00 00 93 00 00 94 00 00 95 00 00 96 00 00 97 00 00 98 00 00 99 00 00 000100 00 00 000101 00 00 000102 00 00 000103 00 00 000104 00 00 000105 00 00 000106 00 00 000107 00 0
Re: [ANNOUNCE] 4.4-rc6-rt1
Hi Sebastian Thanks for your christmas present :-). Am Mittwoch, 23. Dezember 2015, 23:57:55 schrieb Sebastian Andrzej Siewior: > Please don't continue reading before christmas eve (or morning, > depending on your schedule). If you don't celebrate christmas, > well go ahead. Ok, i have to admit i am a little late to the party. > Dear RT folks! > > I'm pleased to announce the v4.4-rc6-rt1 patch set. I tested it on my > AMD A10, 64bit. Nothing exploded so far, filesystem is still there. > I haven't tested it on anything else. Before someone asks: this does not > mean it does *not* work on ARM I simply did not try it. With the trivial compile patch below it is working on ARM: Specifically two Cortex A9 on a CycloneV from Altera. The performance without load looks good: # Total: 1 1 # Min Latencies: 9 9 # Avg Latencies: 00010 00010 # Max Latencies: 00022 00033 A short run with hackbench load reveals an latency "island" from 54-69µs on the first core. There are no timer ticks with 34 to 53 µs delay. # Total: 00100 000999714 # Min Latencies: 00010 9 # Avg Latencies: 00017 00010 # Max Latencies: 00069 00029 I will test further and report if i find strange occurences. > If you are brave then download it, install it and have fun. If something > breaks, please report it. If your machine starts blinking like a > christmas tree while using the patch then *please* send a photo. Sorry no photos, no special blinking. Best regards Tim Signed-off-by: Tim Sander --- linux-4.4-rc6/kernel/time/hrtimer.c.orig2016-01-06 16:56:32.573527206 +0100 +++ linux-4.4-rc6/kernel/time/hrtimer.c 2016-01-06 16:56:48.213215320 +0100 @@ -1435,6 +1435,7 @@ #endif +static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer); static void __
[PATCH] PCI: Add quirk for Lite-On IT Corp. / Plextor M6e PCI Express
Hi Please consider this patch for the next release. It won't recognize my Plextor M6e PCIE disk without it. Please cc as i am not on the list. Signed-off-by: Tim Sander PCI: Add quirk for Lite-On IT Corp. / Plextor M6e PCI Express SSD [Marvell 88SS9183] (rev 14) --- drivers/pci/quirks.c| 4 include/linux/pci_ids.h | 3 +++ 2 files changed, 7 insertions(+) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 7e32730..93ec5a02 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3620,6 +3620,10 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TTI, 0x0642, DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_JMICRON, PCI_DEVICE_ID_JMICRON_JMB388_ESD, quirk_dma_func1_alias); +/* https://bugzilla.kernel.org/show_bug.cgi?id=42679 */ +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LITE_ON, +PCI_DEVICE_ID_PLEXTOR_M6E, +quirk_dma_func1_alias); /* * Some devices DMA with the wrong devfn, not just the wrong function. diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index d9ba49c..01d8041 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -2501,6 +2501,9 @@ #define PCI_VENDOR_ID_ASMEDIA 0x1b21 +#define PCI_VENDOR_ID_LITE_ON 0x1c28 +#define PCI_DEVICE_ID_PLEXTOR_M6E 0x0122 + #define PCI_VENDOR_ID_CIRCUITCO0x1cc8 #define PCI_SUBSYSTEM_ID_CIRCUITCO_MINNOWBOARD 0x0001 -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nf_unregister_net_hook: hook not found!
On 2015-12-30 03:39, ebied...@xmission.com wrote: Pablo Neira Ayuso writes: On Mon, Dec 28, 2015 at 09:05:03PM +0100, Sander Eikelenboom wrote: Hi, Running a 4.4.0-rc6 kernel i encountered the warning below. Cc'ing Eric Biederman. @Sander, could you provide a way to reproduce this? I am on vacation until the new year, but if this is reproducible we should be able to print out reg, reg->pf, reg->hooknum, reg->hook to figure out which hook is having something very weird happen to it. This is happening in some network namespace exit. Eric Unfortunately i have found no way to reproduce, 13 seconds implies it was at boot, but i only have seen this once. -- Sander Thanks. [ 13.740472] ip_tables: (C) 2000-2006 Netfilter Core Team [ 13.936237] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 13.945391] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 13.947434] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0 [ 14.223990] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 14.232065] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 14.233570] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0 [ 14.328141] systemd-logind[2485]: Failed to start user service: Unknown unit: user@117.service [ 14.356634] systemd-logind[2485]: New session c1 of user lightdm. [ 14.357320] [ cut here ] [ 14.357327] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 netfilter_net_exit+0x25/0x50() [ 14.357328] nf_unregister_net_hook: hook not found! [ 14.357371] Modules linked in: iptable_security(+) iptable_raw iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel [ 14.357380] ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore usb_common pps_core [ 14.357383] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U 4.4.0-rc6-x220-20151224+ #1 [ 14.357384] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW (1.39 ) 07/18/2013 [ 14.357390] Workqueue: netns cleanup_net [ 14.357393] 81a27dfd 81359c69 88030e7cbd40 81060297 [ 14.357395] 88030e820d80 88030e7cbd90 81c962d8 81c962e0 [ 14.357397] 88030e7cbdf8 81060317 81a2c010 88030018 [ 14.357398] Call Trace: [ 14.357405] [] ? dump_stack+0x40/0x57 [ 14.357408] [] ? warn_slowpath_common+0x77/0xb0 [ 14.357410] [] ? warn_slowpath_fmt+0x47/0x50 [ 14.357416] [] ? mutex_lock+0x9/0x30 [ 14.357418] [] ? netfilter_net_exit+0x25/0x50 [ 14.357421] [] ? ops_exit_list.isra.6+0x2e/0x60 [ 14.357424] [] ? cleanup_net+0x1ab/0x280 [ 14.357427] [] ? process_one_work+0x133/0x330 [ 14.357429] [] ? worker_thread+0x60/0x470 [ 14.357430] [] ? process_one_work+0x330/0x330 [ 14.357434] [] ? kthread+0xca/0xe0 [ 14.357436] [] ? kthread_create_on_node+0x170/0x170 [ 14.357439] [] ? ret_from_fork+0x3f/0x70 [ 14.357441] [] ? kthread_create_on_node+0x170/0x170 [ 14.357443] ---[ end trace 9984cc4b0e89f818 ]--- [ 14.357443] [ cut here ] [ 14.357446] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 netfilter_net_exit+0x25/0x50() [ 14.357446] nf_unregister_net_hook: hook not found! [ 14.357472] Modules linked in: iptable_security(+) iptable_raw iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel [ 14.357478] ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore usb_common pps_core [ 14.357480] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U W
nf_unregister_net_hook: hook not found!
Hi, Running a 4.4.0-rc6 kernel i encountered the warning below. -- Sander [ 13.740472] ip_tables: (C) 2000-2006 Netfilter Core Team [ 13.936237] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 13.945391] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 13.947434] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0 [ 14.223990] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 14.232065] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 14.233570] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0 [ 14.328141] systemd-logind[2485]: Failed to start user service: Unknown unit: user@117.service [ 14.356634] systemd-logind[2485]: New session c1 of user lightdm. [ 14.357320] [ cut here ] [ 14.357327] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 netfilter_net_exit+0x25/0x50() [ 14.357328] nf_unregister_net_hook: hook not found! [ 14.357371] Modules linked in: iptable_security(+) iptable_raw iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel [ 14.357380] ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore usb_common pps_core [ 14.357383] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U 4.4.0-rc6-x220-20151224+ #1 [ 14.357384] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW (1.39 ) 07/18/2013 [ 14.357390] Workqueue: netns cleanup_net [ 14.357393] 81a27dfd 81359c69 88030e7cbd40 81060297 [ 14.357395] 88030e820d80 88030e7cbd90 81c962d8 81c962e0 [ 14.357397] 88030e7cbdf8 81060317 81a2c010 88030018 [ 14.357398] Call Trace: [ 14.357405] [] ? dump_stack+0x40/0x57 [ 14.357408] [] ? warn_slowpath_common+0x77/0xb0 [ 14.357410] [] ? warn_slowpath_fmt+0x47/0x50 [ 14.357416] [] ? mutex_lock+0x9/0x30 [ 14.357418] [] ? netfilter_net_exit+0x25/0x50 [ 14.357421] [] ? ops_exit_list.isra.6+0x2e/0x60 [ 14.357424] [] ? cleanup_net+0x1ab/0x280 [ 14.357427] [] ? process_one_work+0x133/0x330 [ 14.357429] [] ? worker_thread+0x60/0x470 [ 14.357430] [] ? process_one_work+0x330/0x330 [ 14.357434] [] ? kthread+0xca/0xe0 [ 14.357436] [] ? kthread_create_on_node+0x170/0x170 [ 14.357439] [] ? ret_from_fork+0x3f/0x70 [ 14.357441] [] ? kthread_create_on_node+0x170/0x170 [ 14.357443] ---[ end trace 9984cc4b0e89f818 ]--- [ 14.357443] [ cut here ] [ 14.357446] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 netfilter_net_exit+0x25/0x50() [ 14.357446] nf_unregister_net_hook: hook not found! [ 14.357472] Modules linked in: iptable_security(+) iptable_raw iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel [ 14.357478] ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore usb_common pps_core [ 14.357480] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U W 4.4.0-rc6-x220-20151224+ #1 [ 14.357481] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW (1.39 ) 07/18/2013 [ 14.357484] Workqueue: netns cleanup_net [ 14.357486] 81a27dfd 81359c69 88030e7cbd40 81060297 [ 14.357488] 88030e820db8 88030e7cbd90 81c962d8 81c962e0 [ 14.357489] 88030e7cbdf8 81060317 81a2c010 88030018 [ 14.357490] Call Trace: [ 14.357493] [] ? dump_stack+0x40/0x57 [ 14.357495] [] ? warn_slowpath_common+0x77/0xb0 [ 14.357497] [] ? warn_slowpath_fmt+0x47/0x50 [ 14.357499
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu
On 2015-12-14 20:48, Eric Shelton wrote: Please note that the same issue appears to have been introduced in the recent 4.2.7 kernel. It perhaps has to do with b4ff8389ed14b849354b59ce9b360bdefcdbf99c having a matching commit e8d097151d309eb71f750bbf34e6a7ef6256da7e in linux-stable.git. The below patch to arch/x86/kernel/rtc.c was also effective for 4.2.7. Eric Hi Eric, Yeah it's unfortunate the patch patching the other patches destined for stable didn't make it in time for stable :(. Any how the chosen solution wasn't ideal so there now is a V2 patch by Boris. It hasn't been picked up yet, but hopefully will be anytime soon (for the patch see http://lkml.iu.edu/hypermail/linux/kernel/1512.1/03504.html) -- Sander On 2015-12-02 18:30, Sander Eikelenboom wrote: On 2015-12-02 15:55, David Vrabel wrote: > On 28/11/15 15:47, Sander Eikelenboom wrote: >> genirq: Flags mismatch irq 8. (hvc_console) vs. >> (rtc0) > > We shouldn't register an rtc_cmos device because its legacy irq > conflicts with the irq needed for hvc0. For a multi VCPU guest irq 8 > is > in use for the pv spinlocks and this gets requested first, preventing > the rtc device from probing. > > Does this patch fix it for you? > > David It does, thanks. Reported-and-tested-by: Sander Eikelenboom -- Sander > 8< > x86: rtc_cmos platform device requires legacy irqs > > Adding the rtc platform device when there are no legacy irqs (no > legacy PIC) causes a conflict with other devices that end up using the > same irq number. > > In a single VCPU PV guest we should have: > > /proc/interrupts: >CPU0 > 0: 4934 xen-percpu-virq timer0 > 1: 0 xen-percpu-ipi spinlock0 > 2: 0 xen-percpu-ipi resched0 > 3: 0 xen-percpu-ipi callfunc0 > 4: 0 xen-percpu-virq debug0 > 5: 0 xen-percpu-ipi callfuncsingle0 > 6: 0 xen-percpu-ipi irqwork0 > 7:321 xen-dyn-event xenbus > 8: 90 xen-dyn-event hvc_console > ... > > But hvc_console cannot get its interrupt because it is already in use > by rtc0 and the console does not work. > > genirq: Flags mismatch irq 8. (hvc_console) vs. > (rtc0) > > The rtc_cmos device requires a particular legacy irq so don't add it > if there are no legacy irqs. > > Signed-off-by: David Vrabel > --- > arch/x86/kernel/rtc.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c > index cd96852..07c70f1 100644 > --- a/arch/x86/kernel/rtc.c > +++ b/arch/x86/kernel/rtc.c > @@ -14,6 +14,7 @@ > #include > #include > #include > +#include > > #ifdef CONFIG_X86_32 > /* > @@ -200,6 +201,10 @@ static __init int add_rtc_cmos(void) > } > #endif > > + /* RTC uses legacy IRQs. */ > + if (!nr_legacy_irqs()) > + return -ENODEV; > + > platform_device_register(&rtc_device); > dev_info(&rtc_device.dev, >"registered platform RTC device (no PNP device found)\n"); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] x86: Xen PV guests don't have the rtc_cmos platform device
On 2015-12-09 15:42, Jan Beulich wrote: On 09.12.15 at 15:32, wrote: --- a/arch/x86/kernel/rtc.c +++ b/arch/x86/kernel/rtc.c @@ -200,6 +200,9 @@ static __init int add_rtc_cmos(void) } #endif + if (paravirt_enabled()) + return -ENODEV; What about Xen Dom0? Jan Checked that in my testing and that still worked: [ 16.733837] rtc_cmos 00:02: RTC can wake from S4 [ 16.734030] rtc_cmos 00:02: rtc core: registered rtc_cmos as rtc0 [ 16.734087] rtc_cmos 00:02: alarms up to one month, y3k, 114 bytes nvram [ 17.760329] rtc_cmos 00:02: setting system clock to 2015-12-09 08:43:48 UTC (1449650628) and /dev/rtc and /dev/rtc0 both exist. But i don't know the nitty gritty details about why ... -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-12-02 15:55, David Vrabel wrote: On 28/11/15 15:47, Sander Eikelenboom wrote: genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) We shouldn't register an rtc_cmos device because its legacy irq conflicts with the irq needed for hvc0. For a multi VCPU guest irq 8 is in use for the pv spinlocks and this gets requested first, preventing the rtc device from probing. Does this patch fix it for you? David It does, thanks. Reported-and-tested-by: Sander Eikelenboom -- Sander 8< x86: rtc_cmos platform device requires legacy irqs Adding the rtc platform device when there are no legacy irqs (no legacy PIC) causes a conflict with other devices that end up using the same irq number. In a single VCPU PV guest we should have: /proc/interrupts: CPU0 0: 4934 xen-percpu-virq timer0 1: 0 xen-percpu-ipi spinlock0 2: 0 xen-percpu-ipi resched0 3: 0 xen-percpu-ipi callfunc0 4: 0 xen-percpu-virq debug0 5: 0 xen-percpu-ipi callfuncsingle0 6: 0 xen-percpu-ipi irqwork0 7:321 xen-dyn-event xenbus 8: 90 xen-dyn-event hvc_console ... But hvc_console cannot get its interrupt because it is already in use by rtc0 and the console does not work. genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) The rtc_cmos device requires a particular legacy irq so don't add it if there are no legacy irqs. Signed-off-by: David Vrabel --- arch/x86/kernel/rtc.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c index cd96852..07c70f1 100644 --- a/arch/x86/kernel/rtc.c +++ b/arch/x86/kernel/rtc.c @@ -14,6 +14,7 @@ #include #include #include +#include #ifdef CONFIG_X86_32 /* @@ -200,6 +201,10 @@ static __init int add_rtc_cmos(void) } #endif + /* RTC uses legacy IRQs. */ + if (!nr_legacy_irqs()) + return -ENODEV; + platform_device_register(&rtc_device); dev_info(&rtc_device.dev, "registered platform RTC device (no PNP device found)\n"); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-12-02 00:41, Boris Ostrovsky wrote: On 12/01/2015 06:30 PM, Sander Eikelenboom wrote: On 2015-12-02 00:19, Boris Ostrovsky wrote: On 12/01/2015 06:00 PM, Sander Eikelenboom wrote: On 2015-12-01 23:47, Boris Ostrovsky wrote: On 11/30/2015 05:55 PM, Sander Eikelenboom wrote: On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R 16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. One of these patches also fixes a bug i was having with a pci-passthrough device in a HVM that wasn't working (depending on which dom0-kernel i was using (4.3 or 4.4)), but didn't report yet. Fingers crossed but i think this pv-guest single vcpu issue is the last i'm troubled by for now ;) I could not reproduce this, including with your kernel config file. Hmm that's unpleasant :-\ Hmm other strange thing is it doesn't seem to affect dom0 (which is also a PV guest), but only unprivileged ones All unprivileged pv-guests seem to have the irq issue, but only with a single vcpu i see to get the stuck kworker thread that got my attention, with a 2 vcpu that doesn't seem to happen, but you still get the dmesg output and warnings about hvc) Could it be that: arch/x86/include/asm/i8259.h static inline int nr_legacy_irqs(void) { return legacy_pic->nr_legacy_irqs; } returns something different in some circumstances ? It should return 16 pre-8c058b0b9c34d8c8d7912880956543769323e2d8 and 0 after that commit. This is the last number that you see in NR_IRQS:4352 nr_irqs:48 0 line. I think you should be able to safely revert both b4ff8389ed14b849354b59ce9b360bdefcdbf99c and 8c058b0b9c34d8c8d7912880956543769323e2d8 and see if it makes any difference. -boris That was already underway compiling :) And it does reveal that reverting both fixes the issue, no stuck kworker thread .. and no: genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) hvc_open: request_irq failed with rc -16. Let me try it again tomorrow. Can you post your guest config file, Xen version and host HW (Intel or AMD)? 'xl info' maybe? -boris Hi Boris, A fresh new day .. a fresh new thought. If i look at the /proc/interrupts from a broken and a kernel with both commits the thing that catches the eye is irq8, just as the dmesg message was telling. In my PV guest rtc0 now seems to try and take irq8 that was already assigned to HVC ? Sounds like some assumptions around the legacy range are broken somewhere. What is the benefit of not just reserving the legacy range ? Attached the /proc/interrupts from both boots. -- Sander What i did get was an conflict reverting b4ff8389ed14b849354b59ce9b360bdefcdbf99c: arch/arm64/include/asm/irq.h, although that shouldn't matter because we are on x86 and not on arm. -- Sander -- Sander -boris ___ Xen-devel mailing list xen-de...@lists.xen.org http://lists.xen.org/xen-devel CPU0 16: 315536 xen-percpu-virq timer0 17: 0 xen-percpu-ipi spinlock0 18: 0 xen-percpu-ipi resched0 19: 0 xen-percpu-ipi callfunc0 20: 0 xen-percpu-virq debug0 21: 0 xen-percpu-ipi callfuncsingle0 22: 0 xen-percpu-ipi irqwork0 23:346 xen-dyn-event xenbus 24:134 xen-dyn-event hvc_console 25: 11464 xen-dyn-event blkif 26: 28710 xen-dyn-event eth0-q0-tx 27: 40136 xen-dyn-event eth0-q0-rx NMI: 0 Non-maskable interrupts LOC: 0 Local timer interrupts SPU: 0 Spurious interrupts PMI: 0 Performance monitoring interrupts IWI: 0 IRQ work interrupts
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-12-02 00:41, Boris Ostrovsky wrote: On 12/01/2015 06:30 PM, Sander Eikelenboom wrote: On 2015-12-02 00:19, Boris Ostrovsky wrote: On 12/01/2015 06:00 PM, Sander Eikelenboom wrote: On 2015-12-01 23:47, Boris Ostrovsky wrote: On 11/30/2015 05:55 PM, Sander Eikelenboom wrote: On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R 16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. One of these patches also fixes a bug i was having with a pci-passthrough device in a HVM that wasn't working (depending on which dom0-kernel i was using (4.3 or 4.4)), but didn't report yet. Fingers crossed but i think this pv-guest single vcpu issue is the last i'm troubled by for now ;) I could not reproduce this, including with your kernel config file. Hmm that's unpleasant :-\ Hmm other strange thing is it doesn't seem to affect dom0 (which is also a PV guest), but only unprivileged ones All unprivileged pv-guests seem to have the irq issue, but only with a single vcpu i see to get the stuck kworker thread that got my attention, with a 2 vcpu that doesn't seem to happen, but you still get the dmesg output and warnings about hvc) Could it be that: arch/x86/include/asm/i8259.h static inline int nr_legacy_irqs(void) { return legacy_pic->nr_legacy_irqs; } returns something different in some circumstances ? It should return 16 pre-8c058b0b9c34d8c8d7912880956543769323e2d8 and 0 after that commit. This is the last number that you see in NR_IRQS:4352 nr_irqs:48 0 line. I think you should be able to safely revert both b4ff8389ed14b849354b59ce9b360bdefcdbf99c and 8c058b0b9c34d8c8d7912880956543769323e2d8 and see if it makes any difference. -boris That was already underway compiling :) And it does reveal that reverting both fixes the issue, no stuck kworker thread .. and no: genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) hvc_open: request_irq failed with rc -16. Let me try it again tomorrow. Can you post your guest config file, Xen version and host HW (Intel or AMD)? 'xl info' maybe? -boris Guest config file == dom0 config file == the one i send you earlier. Host is an AMD Phenom X6. # xl info host : serveerstertje release: 4.4.0-rc3-20151201-linus-doflr-boris+ version: #1 SMP Tue Dec 1 19:02:58 CET 2015 machine: x86_64 nr_cpus: 6 max_cpu_id : 5 nr_nodes : 1 cores_per_socket : 6 threads_per_core : 1 cpu_mhz: 3200 hw_caps: 178bf3ff:efd3fbff::00011300:00802001::37ff: virt_caps : hvm hvm_directio total_memory : 20479 free_memory: 7745 sharing_freed_memory : 0 sharing_used_memory: 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 7 xen_extra : -unstable xen_version: 4.7-unstable xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params: virt_start=0x8000 xen_changeset : Thu Nov 26 20:58:13 2015 +0100 git:5252636-dirty xen_commandline: dom0_mem=1536M,max:1536M loglvl=all loglvl_guest=all console_timestamps=datems vga=gfx-1280x1024x32 cpuidle cpufreq=xen com1=38400,8n1 console=vga,com1 ivrs_ioapic[6]=00:14.0 iommu=on,verbose,debug,amd-iommu-debug conring_size=128k ucode=-1 cc_compiler: gcc-4.9.real (Debian 4.9.2-10) 4.9.2 cc_compile_by : root cc_compile_domain : dyndns.org cc_compile_date: Thu Nov 26 21:18:41 CET 201
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-12-02 00:19, Boris Ostrovsky wrote: On 12/01/2015 06:00 PM, Sander Eikelenboom wrote: On 2015-12-01 23:47, Boris Ostrovsky wrote: On 11/30/2015 05:55 PM, Sander Eikelenboom wrote: On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R 16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. One of these patches also fixes a bug i was having with a pci-passthrough device in a HVM that wasn't working (depending on which dom0-kernel i was using (4.3 or 4.4)), but didn't report yet. Fingers crossed but i think this pv-guest single vcpu issue is the last i'm troubled by for now ;) I could not reproduce this, including with your kernel config file. Hmm that's unpleasant :-\ Hmm other strange thing is it doesn't seem to affect dom0 (which is also a PV guest), but only unprivileged ones All unprivileged pv-guests seem to have the irq issue, but only with a single vcpu i see to get the stuck kworker thread that got my attention, with a 2 vcpu that doesn't seem to happen, but you still get the dmesg output and warnings about hvc) Could it be that: arch/x86/include/asm/i8259.h static inline int nr_legacy_irqs(void) { return legacy_pic->nr_legacy_irqs; } returns something different in some circumstances ? It should return 16 pre-8c058b0b9c34d8c8d7912880956543769323e2d8 and 0 after that commit. This is the last number that you see in NR_IRQS:4352 nr_irqs:48 0 line. I think you should be able to safely revert both b4ff8389ed14b849354b59ce9b360bdefcdbf99c and 8c058b0b9c34d8c8d7912880956543769323e2d8 and see if it makes any difference. -boris That was already underway compiling :) And it does reveal that reverting both fixes the issue, no stuck kworker thread .. and no: genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) hvc_open: request_irq failed with rc -16. What i did get was an conflict reverting b4ff8389ed14b849354b59ce9b360bdefcdbf99c: arch/arm64/include/asm/irq.h, although that shouldn't matter because we are on x86 and not on arm. -- Sander -- Sander -boris ___ Xen-devel mailing list xen-de...@lists.xen.org http://lists.xen.org/xen-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-12-01 23:47, Boris Ostrovsky wrote: On 11/30/2015 05:55 PM, Sander Eikelenboom wrote: On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. One of these patches also fixes a bug i was having with a pci-passthrough device in a HVM that wasn't working (depending on which dom0-kernel i was using (4.3 or 4.4)), but didn't report yet. Fingers crossed but i think this pv-guest single vcpu issue is the last i'm troubled by for now ;) I could not reproduce this, including with your kernel config file. Hmm that's unpleasant :-\ Hmm other strange thing is it doesn't seem to affect dom0 (which is also a PV guest), but only unprivileged ones All unprivileged pv-guests seem to have the irq issue, but only with a single vcpu i see to get the stuck kworker thread that got my attention, with a 2 vcpu that doesn't seem to happen, but you still get the dmesg output and warnings about hvc) Could it be that: arch/x86/include/asm/i8259.h static inline int nr_legacy_irqs(void) { return legacy_pic->nr_legacy_irqs; } returns something different in some circumstances ? -- Sander -boris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. Thanks :) -- Sander Between 4.3 and 4.4-single: -NR_IRQS:4352 nr_irqs:32 16 +Using NULL legacy PIC +NR_IRQS:4352 nr_irqs:32 0 This is fine, as long as you have b4ff8389ed14b849354b59ce9b360bdefcdbf99c. -cpu 0 spinlock event irq 17 +cpu 0 spinlock event irq 1 This is strange. I wouldn't expect spinlocks to use legacy irqs. Could it be .. that with your fixup: xen/events: Always allocate legacy interrupts on PV guests (b4ff8389ed14b849354b59ce9b360bdefcdbf99c) for commit: x86/irq: Probe for PIC presence before allocating descs for legacy IRQs (8c058b0b9c34d8c8d7912880956543769323e2d8) that we now have the situation described in the commit message of 8c058b0b9c, but now for Xen PV instead of Hyper-V ? (seems both Xen and Hyper-V want to achieve the same but have different competing implementations ?) (BTW 8c058b0b9c has a CC for stable ... so could be destined to cause more trouble). -- Sander and later on: -hctosys: unable to open rtc device (rtc0) +rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock +genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) +hvc_open: request_irq failed with rc -16. +Warning: unable to open an initial console. between 4.4-single and 4.4-multi: Using NULL legacy PIC -NR_IRQS:4352 nr_irqs:32 0 +NR_IRQS:4352 nr_irqs:48 0 This is probably OK too since nr_irqs depend on number of CPUs. I think something is messed up with IRQ. I saw last week something from setup_irq() generating a stack dump (warninig) for rtc_cmos but it appeared harmless at that time and now I don't see it anymore. -boris and later on: -rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock +hctosys: unable to open rtc device (rtc0) -genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) -hvc_open: request_irq failed with rc -16. -Warning: unable to open an initial console. attached: - dmesg with 4.3 kernel with 1 vcpu - dmesg with 4.4 kernel with 1 vpcu - dmesg with 4.4 kernel with 2 vpcus - .config of the 4.4 kernel is attached. -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.
On 2015-11-30 23:54, Boris Ostrovsky wrote: On 11/30/2015 04:46 PM, Sander Eikelenboom wrote: On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote: On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote: Hi all, I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree pulled on top. Running this kernel under Xen on PV-guests with multiple vcpus goes well (on idle < 10% cpu usage), but a guest with only a single vcpu doesn't idle at all, it seems a kworker thread is stuck: root 569 98.0 0.0 0 0 ?R16:02 12:47 [kworker/0:1] Running a 4.3 kernel works fine with a single vpcu, bisecting would probably quite painful since there were some breakages this merge window with respect to Xen pv-guests. There are some differences in the diff's from booting a 4.3, 4.4-single, 4.4-multi cpu boot: Boris has been tracking a bunch of them. I am attaching the latest set of patches I've to carry on top of v4.4-rc3. Hi Konrad, i will test those, see if it fixes all my issues and report back They shouldn't help you ;-( (and I just saw a message from you confirming this) The first one fixes a 32-bit bug (on bare metal too). The second fixes a fatal bug for 32-bit PV guests. The other two are code improvements/cleanup. One of these patches also fixes a bug i was having with a pci-passthrough device in a HVM that wasn't working (depending on which dom0-kernel i was using (4.3 or 4.4)), but didn't report yet. Fingers crossed but i think this pv-guest single vcpu issue is the last i'm troubled by for now ;) -- Sander Thanks :) -- Sander Between 4.3 and 4.4-single: -NR_IRQS:4352 nr_irqs:32 16 +Using NULL legacy PIC +NR_IRQS:4352 nr_irqs:32 0 This is fine, as long as you have b4ff8389ed14b849354b59ce9b360bdefcdbf99c. -cpu 0 spinlock event irq 17 +cpu 0 spinlock event irq 1 This is strange. I wouldn't expect spinlocks to use legacy irqs. and later on: -hctosys: unable to open rtc device (rtc0) +rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock +genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) +hvc_open: request_irq failed with rc -16. +Warning: unable to open an initial console. between 4.4-single and 4.4-multi: Using NULL legacy PIC -NR_IRQS:4352 nr_irqs:32 0 +NR_IRQS:4352 nr_irqs:48 0 This is probably OK too since nr_irqs depend on number of CPUs. I think something is messed up with IRQ. I saw last week something from setup_irq() generating a stack dump (warninig) for rtc_cmos but it appeared harmless at that time and now I don't see it anymore. -boris and later on: -rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock +hctosys: unable to open rtc device (rtc0) -genirq: Flags mismatch irq 8. (hvc_console) vs. (rtc0) -hvc_open: request_irq failed with rc -16. -Warning: unable to open an initial console. attached: - dmesg with 4.3 kernel with 1 vcpu - dmesg with 4.4 kernel with 1 vpcu - dmesg with 4.4 kernel with 2 vpcus - .config of the 4.4 kernel is attached. -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-4.4-mw] Regression: cx25821: Oops: no 32bit PCI DMA
On 2015-11-15 13:56, Christoph Hellwig wrote: Hi Saner, this is my fault. Please see the patch which I already sent out to Andrew and lkml. Hi Christoph, Thanks for the pointer, just tested and it works fine again. -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
Thursday, November 5, 2015, 2:53:40 PM, you wrote: > On 11/05/2015 04:13 AM, Sander Eikelenboom wrote: >> >> It makes "cat /sys/kernel/debug/kernel_page_tables" work and >> prevents a kernel with CONFIG_DEBUG_WX=y from crashing at boot. > Great. Our nightly runs also failed spectacularly due to this bug. >> >> It now does give a warning about an insecure W+X mapping, so >> CONFIG_DEBUG_WX=y >> seems to be working. No idea how to interpret it though (and if it's a >> legit >> warning). >> >> -- >> Sander >> >> [ 19.034706] Freeing unused kernel memory: 1104K (822fc000 - >> 8241) >> [ 19.041339] Write protecting the kernel read-only data: 18432k >> [ 19.052596] Freeing unused kernel memory: 1144K (880001ae2000 - >> 880001c0) >> [ 19.060285] Freeing unused kernel memory: 1560K (88000207a000 - >> 88000220) >> [ 19.067079] [ cut here ] >> [ 19.073931] WARNING: CPU: 5 PID: 1 at >> arch/x86/mm/dump_pagetables.c:225 note_page+0x619/0x7e0() > Yes, this apparently is a known issue: https://lkml.org/lkml/2015/11/4/476 > -boris Ah thx for the pointer :) -- Sander -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
On 2015-11-05 00:13, Boris Ostrovsky wrote: On 11/04/2015 03:02 PM, Sander Eikelenboom wrote: On 2015-11-04 19:47, Stephen Smalley wrote: On 11/04/2015 01:28 PM, Sander Eikelenboom wrote: On 2015-11-04 16:52, Stephen Smalley wrote: On 11/04/2015 06:55 AM, Sander Eikelenboom wrote: Hi All, I just tried to boot with the current linus mergewindow tree under Xen. It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX" option enabled. Disabling it makes the kernel boot fine. The splat: [ 18.424241] Freeing unused kernel memory: 1104K (822fc000 - 8241) [ 18.430314] Write protecting the kernel read-only data: 18432k [ 18.441054] Freeing unused kernel memory: 1144K (880001ae2000 - 880001c0) [ 18.447966] Freeing unused kernel memory: 1560K (88000207a000 - 88000220) [ 18.453947] BUG: unable to handle kernel paging request at 88055c883000 [ 18.459943] IP: [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.465847] PGD 2212067 PUD 0 [ 18.471564] Oops: [#1] SMP [ 18.477248] Modules linked in: [ 18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-mw-20151104-linus-doflr+ #1 [ 18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 18.494778] task: 880059b9 ti: 880059b98000 task.ti: 880059b98000 [ 18.500852] RIP: e030:[] [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.507102] RSP: e02b:880059b9be48 EFLAGS: 00010296 [ 18.513351] RAX: 88055c883000 RBX: 81ae2000 RCX: 8800 [ 18.519733] RDX: 0067 RSI: 880059b9be98 RDI: 88001000 [ 18.526129] RBP: 880059b9bf00 R08: R09: [ 18.532522] R10: 88005fd0e790 R11: 0001 R12: 88008000 [ 18.538891] R13: cfff R14: 880059b9be98 R15: [ 18.545247] FS: () GS:88005f68() knlGS: [ 18.551708] CS: e033 DS: ES: CR0: 8005003b [ 18.558153] CR2: 88055c883000 CR3: 02211000 CR4: 0660 [ 18.564686] Stack: [ 18.571106] 000159b9be50 82211000 88055c884000 0800 [ 18.577704] 8000 88055c883000 0007 88005fd0e790 [ 18.584291] 880059b9bed8 81156ace 0001 [ 18.590916] Call Trace: [ 18.597458] [] ? free_reserved_area+0x11e/0x120 [ 18.604180] [] ptdump_walk_pgd_level_checkwx+0x12/0x20 [ 18.611014] [] mark_rodata_ro+0xe9/0xf0 [ 18.617819] [] ? rest_init+0x80/0x80 [ 18.624512] [] kernel_init+0x18/0xe0 [ 18.631095] [] ret_from_fork+0x3f/0x70 [ 18.637650] [] ? rest_init+0x80/0x80 [ 18.644178] Code: 70 ff ff ff 48 3b 85 58 ff ff ff 0f 84 c0 fe ff ff 48 8b 85 68 ff ff ff 48 c1 e0 10 48 c1 f8 10 48 89 45 b0 48 8b 85 70 ff ff ff <48> 8b 38 48 85 ff 0f 85 4e ff ff ff b9 02 00 00 00 31 d2 4c 89 [ 18.658246] RIP [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.665211] RSP [ 18.672073] CR2: 88055c883000 [ 18.678852] ---[ end trace d84e34461c40637a ]--- [ 18.685641] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [ 18.685641] [ 18.699520] Kernel Offset: disable What's your .config? Does cat /sys/kernel/debug/kernel_page_tables produce a similar fault even with CONFIG_DEBUG_WX=n? .config is attached Hmm that sysfs file doesn't seem to exist then: # cat /sys/kernel/debug/kernel_page_tables cat: /sys/kernel/debug/kernel_page_tables: No such file or directory Needs CONFIG_X86_PTDUMP=y. Also assumes you have debugfs mounted there. Recompiled, and the result is that it also blows up: Can you try this: diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index 1bf417e..b534216 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -362,8 +362,13 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd, bool checkwx) { #ifdef CONFIG_X86_64 +/* 8000 - 87ff is reserved for hypervisor */ +#define is_hypervisor_range(idx) (paravirt_enabled() && \ + ((idx >= pgd_index(__PAGE_OFFSET) - 16) && \ + (idx < pgd_index(__PAGE_OFFSET pgd_t *start = (pgd_t *) &init_level4_pgt; #else +#define is_hypervisor_range(idx) 0 pgd_t *start = swapper_pg_dir; #endif pgprotval_t prot; @@ -381,7 +386,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd, for (i = 0; i < PTRS_PER_PGD; i++) { st.current_address = normalize_addr(i * PGD_LEVEL_MULT); -if (!pgd_none(*start)) { +if (!pgd_none(*start) && !is_hypervisor_range(i)) { if (pgd_large(*start) || !pgd_present(*start)) { prot = pgd_flags(*start); note_page(m, &st, __
Re: Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
On 2015-11-04 19:47, Stephen Smalley wrote: On 11/04/2015 01:28 PM, Sander Eikelenboom wrote: On 2015-11-04 16:52, Stephen Smalley wrote: On 11/04/2015 06:55 AM, Sander Eikelenboom wrote: Hi All, I just tried to boot with the current linus mergewindow tree under Xen. It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX" option enabled. Disabling it makes the kernel boot fine. The splat: [ 18.424241] Freeing unused kernel memory: 1104K (822fc000 - 8241) [ 18.430314] Write protecting the kernel read-only data: 18432k [ 18.441054] Freeing unused kernel memory: 1144K (880001ae2000 - 880001c0) [ 18.447966] Freeing unused kernel memory: 1560K (88000207a000 - 88000220) [ 18.453947] BUG: unable to handle kernel paging request at 88055c883000 [ 18.459943] IP: [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.465847] PGD 2212067 PUD 0 [ 18.471564] Oops: [#1] SMP [ 18.477248] Modules linked in: [ 18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-mw-20151104-linus-doflr+ #1 [ 18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 18.494778] task: 880059b9 ti: 880059b98000 task.ti: 880059b98000 [ 18.500852] RIP: e030:[] [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.507102] RSP: e02b:880059b9be48 EFLAGS: 00010296 [ 18.513351] RAX: 88055c883000 RBX: 81ae2000 RCX: 8800 [ 18.519733] RDX: 0067 RSI: 880059b9be98 RDI: 88001000 [ 18.526129] RBP: 880059b9bf00 R08: R09: [ 18.532522] R10: 88005fd0e790 R11: 0001 R12: 88008000 [ 18.538891] R13: cfff R14: 880059b9be98 R15: [ 18.545247] FS: () GS:88005f68() knlGS: [ 18.551708] CS: e033 DS: ES: CR0: 8005003b [ 18.558153] CR2: 88055c883000 CR3: 02211000 CR4: 0660 [ 18.564686] Stack: [ 18.571106] 000159b9be50 82211000 88055c884000 0800 [ 18.577704] 8000 88055c883000 0007 88005fd0e790 [ 18.584291] 880059b9bed8 81156ace 0001 [ 18.590916] Call Trace: [ 18.597458] [] ? free_reserved_area+0x11e/0x120 [ 18.604180] [] ptdump_walk_pgd_level_checkwx+0x12/0x20 [ 18.611014] [] mark_rodata_ro+0xe9/0xf0 [ 18.617819] [] ? rest_init+0x80/0x80 [ 18.624512] [] kernel_init+0x18/0xe0 [ 18.631095] [] ret_from_fork+0x3f/0x70 [ 18.637650] [] ? rest_init+0x80/0x80 [ 18.644178] Code: 70 ff ff ff 48 3b 85 58 ff ff ff 0f 84 c0 fe ff ff 48 8b 85 68 ff ff ff 48 c1 e0 10 48 c1 f8 10 48 89 45 b0 48 8b 85 70 ff ff ff <48> 8b 38 48 85 ff 0f 85 4e ff ff ff b9 02 00 00 00 31 d2 4c 89 [ 18.658246] RIP [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.665211] RSP [ 18.672073] CR2: 88055c883000 [ 18.678852] ---[ end trace d84e34461c40637a ]--- [ 18.685641] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [ 18.685641] [ 18.699520] Kernel Offset: disable What's your .config? Does cat /sys/kernel/debug/kernel_page_tables produce a similar fault even with CONFIG_DEBUG_WX=n? .config is attached Hmm that sysfs file doesn't seem to exist then: # cat /sys/kernel/debug/kernel_page_tables cat: /sys/kernel/debug/kernel_page_tables: No such file or directory Needs CONFIG_X86_PTDUMP=y. Also assumes you have debugfs mounted there. Recompiled, and the result is that it also blows up: [ 902.389247] BUG: unable to handle kernel paging request at 88055c883000 [ 902.402749] IP: [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 902.416261] PGD 2212067 PUD 0 [ 902.427768] Oops: [#1] SMP [ 902.438137] Modules linked in: [ 902.448299] CPU: 2 PID: 21951 Comm: cat Not tainted 4.3.0-mw-20151104-linus-doflr-nodebugwx-withptdump+ #1 [ 902.458581] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 902.468850] task: 88004b49e300 ti: 88005928c000 task.ti: 88005928c000 [ 902.479133] RIP: e030:[] [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 902.489536] RSP: e02b:88005928fd20 EFLAGS: 00010296 [ 902.499692] RAX: 88055c883000 RBX: RCX: 8800 [ 902.509755] RDX: 0067 RSI: 88005928fd70 RDI: 88001000 [ 902.519680] RBP: 88005928fdd8 R08: 1000 R09: [ 902.529555] R10: R11: 0246 R12: 88005928ff20 [ 902.539349] R13: cfff R14: 88005928fd70 R15: 880033c773c0 [ 902.549081] FS: 7f56b07d4700() GS:88005f68() knlGS: [ 902.558690] CS: e033 DS: ES: CR0: 8005003b [ 902.568111] CR2: 88055c883000 CR3: 4563f000 CR4: 0660 [ 902.57
Re: Linux 4.4 MW: Boot under Xen fails with CONFIG_DEBUG_WX enabled: RIP: ptdump_walk_pgd_level_core
On 2015-11-04 16:52, Stephen Smalley wrote: On 11/04/2015 06:55 AM, Sander Eikelenboom wrote: Hi All, I just tried to boot with the current linus mergewindow tree under Xen. It fails with a kernel panic at boot with the new "CONFIG_DEBUG_WX" option enabled. Disabling it makes the kernel boot fine. The splat: [ 18.424241] Freeing unused kernel memory: 1104K (822fc000 - 8241) [ 18.430314] Write protecting the kernel read-only data: 18432k [ 18.441054] Freeing unused kernel memory: 1144K (880001ae2000 - 880001c0) [ 18.447966] Freeing unused kernel memory: 1560K (88000207a000 - 88000220) [ 18.453947] BUG: unable to handle kernel paging request at 88055c883000 [ 18.459943] IP: [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.465847] PGD 2212067 PUD 0 [ 18.471564] Oops: [#1] SMP [ 18.477248] Modules linked in: [ 18.482918] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.3.0-mw-20151104-linus-doflr+ #1 [ 18.488804] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [ 18.494778] task: 880059b9 ti: 880059b98000 task.ti: 880059b98000 [ 18.500852] RIP: e030:[] [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.507102] RSP: e02b:880059b9be48 EFLAGS: 00010296 [ 18.513351] RAX: 88055c883000 RBX: 81ae2000 RCX: 8800 [ 18.519733] RDX: 0067 RSI: 880059b9be98 RDI: 88001000 [ 18.526129] RBP: 880059b9bf00 R08: R09: [ 18.532522] R10: 88005fd0e790 R11: 0001 R12: 88008000 [ 18.538891] R13: cfff R14: 880059b9be98 R15: [ 18.545247] FS: () GS:88005f68() knlGS: [ 18.551708] CS: e033 DS: ES: CR0: 8005003b [ 18.558153] CR2: 88055c883000 CR3: 02211000 CR4: 0660 [ 18.564686] Stack: [ 18.571106] 000159b9be50 82211000 88055c884000 0800 [ 18.577704] 8000 88055c883000 0007 88005fd0e790 [ 18.584291] 880059b9bed8 81156ace 0001 [ 18.590916] Call Trace: [ 18.597458] [] ? free_reserved_area+0x11e/0x120 [ 18.604180] [] ptdump_walk_pgd_level_checkwx+0x12/0x20 [ 18.611014] [] mark_rodata_ro+0xe9/0xf0 [ 18.617819] [] ? rest_init+0x80/0x80 [ 18.624512] [] kernel_init+0x18/0xe0 [ 18.631095] [] ret_from_fork+0x3f/0x70 [ 18.637650] [] ? rest_init+0x80/0x80 [ 18.644178] Code: 70 ff ff ff 48 3b 85 58 ff ff ff 0f 84 c0 fe ff ff 48 8b 85 68 ff ff ff 48 c1 e0 10 48 c1 f8 10 48 89 45 b0 48 8b 85 70 ff ff ff <48> 8b 38 48 85 ff 0f 85 4e ff ff ff b9 02 00 00 00 31 d2 4c 89 [ 18.658246] RIP [] ptdump_walk_pgd_level_core+0x20e/0x440 [ 18.665211] RSP [ 18.672073] CR2: 88055c883000 [ 18.678852] ---[ end trace d84e34461c40637a ]--- [ 18.685641] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [ 18.685641] [ 18.699520] Kernel Offset: disable What's your .config? Does cat /sys/kernel/debug/kernel_page_tables produce a similar fault even with CONFIG_DEBUG_WX=n? .config is attached Hmm that sysfs file doesn't seem to exist then: # cat /sys/kernel/debug/kernel_page_tables cat: /sys/kernel/debug/kernel_page_tables: No such file or directory -- Sander # # Automatically generated file; DO NOT EDIT. # Linux/x86_64 4.3.0-mw-20151104-linus-doflr Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_PERF_EVENTS_INTEL_UNCORE=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_64_SMP=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=4 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_INIT_ENV_ARG